gaborgsomogyi commented on issue #25853: [SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer URL: https://github.com/apache/spark/pull/25853#issuecomment-534519947 > What happens to the old instance? If none of the tasks are using the old instance, then after the eviction time elapsed it will be closed and removed from cache. > Is there a case, where we would use more than one kafka producer at a time ? Not sure I understand your question exactly. Let me try to answer my interpretation. Kafka connection pooling solves mainly one problem, namely it can spare the construction time of consumer/producer instances. This creation time can be significant when kerberos and SSL encryption is enabled which would happen in every micro-batch. As you've noted producers are thread safe so as a further optimization, instances can be shared between threads without any harm. Since Apache Commons Pool doesn't support this it can't be done here. If multiple threads are using a producer with the same Kafka params then multiple instances will be created (same happens just like the consumer side). This is a trade-off what I think makes sense comparing it with the main advantages listed in the PR description. If you mean something else, please clarify and we can discuss it...
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
