gaborgsomogyi commented on issue #25853: [SPARK-21869][SS] Apply Apache Commons 
Pool to Kafka producer
URL: https://github.com/apache/spark/pull/25853#issuecomment-554939486
 
 
   @zsxwing Since this PR is a tradeoff it's questionable because not everybody 
has the same threshold such cases.
   
   The copied recommendation is considered when filed the PR. Since the 
recommendation is not specific enough we've made the measurements to lower the 
uncertainty. The performance tests are given, the resource allocation part is 
not specifically written down. One producer allocates a little bit more than 
it's `buffer.memory` which is ~33Mb by default.
   
   > it's pretty common that an executor has 10-100 cores today.
   
   This is true but not all the cores involved. The mentioned additional 
resource consumption applies only when cores are writing the exact same 
TopicPartition at the same time. This can be a number between 0 and the number 
of cores. Use-cases can be abused of course but hardly can believe that it can 
be efficient to write a single partition concurrently with 4+ cores. The 
additional allocation is not static so if a producer not used it times out and 
will be freed from cache allowing other computations to take over that memory.
   
   That said at the beginning this is a tradeoff and may not worth. If that 
would be the conclusion + considering the following:
   * https://github.com/apache/spark/pull/19096 was considered risky because of 
manual thread handling and reference counting
   * your assumption related producers given in 
https://github.com/apache/spark/pull/26470
   > Hence I would assume it can self-heal.
   
   I would question that we need to invalidate Kafka producers at all. Maybe 
removing timeout on producer side is the simplest solution?!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to