zsxwing commented on issue #25853: [SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer URL: https://github.com/apache/spark/pull/25853#issuecomment-555190489 > One producer allocates a little bit more than it's `buffer.memory` which is ~33Mb by default. Each producer also potentially can connect to all brokers of a Kafka cluster. That will be a lot of connections. Summarizing the options we have: - Option 1: Keep this patch. - Pros: - Fix SPARK-21869. - Cons: - Change the producer sharing strategy and may have potential stability regression because of the increase of resource usage. - Option 2: Revert this and fix SPARK-21869 in #19096. - Pros: - Resource usage should be the same. - Cons: - The fix may be error prone. - Option 3: Revert this and don't fix SPARK-21869. - Pros: - No regression. - Cons: - The user may still hit SPARK-21869 but the chance is pretty low. They can increase the cache timeout as a workaround. Let me also add more context about SPARK-21869 since I created it. In the real workload I found out this bug, the root cause was actually not SPARK-21869. This workload had some memory issue and triggered large GC pauses. Writing zero messages in 10 minutes is actually because of GC pressure. After fixing the memory issue, this workload no longer hits SPARK-21869. IMO, writing zero messages in 10 minutes is usually not expected by the user and indicates some other issues. This makes me feel fixing SPARK-21869 is not worth if the fix requires potential increase of resource usage. Hence, I prefer option 3 right now.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
