zsxwing commented on issue #25853: [SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer URL: https://github.com/apache/spark/pull/25853#issuecomment-554493096 @gaborgsomogyi Thanks a lot for your contribution. Also thanks @HeartSaVioR for your benchmark result. However, I don't think it addressed my concerns. Sharing producer in one JVM is actually recommended by [Kafka API doc](https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html). Copying its statement here for people have not yet read it. > The producer is thread safe and sharing a single producer instance across threads will generally be faster than having multiple instances. IMO, a Kafka producer is a heavy object. It includes threads, connections, and buffers, etc. If we change the cache to an object pool and not reuse it across tasks, we potentially increase a lot of pressure to executors and the Kafka clusters when an executor has lots of cores. For example, it's pretty common that an executor has 10-100 cores today. Before this patch, our strategy is sharing a producer for tasks that are using the same Kafka parameter. I don't think we should change this strategy. @gaborgsomogyi what do you think about my concerns?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
