ScrapCodes commented on issue #25853: [SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer URL: https://github.com/apache/spark/pull/25853#issuecomment-534488064 Thanks for your interest in redoing the entire patch from the start, you had to re-write the test suites as well to fit the new pool API used. Commons pool seems to wrap each pooled object with a PooledObect wrapper and keeps all the “inuse tracking” information inside that per object wrapper. Since guava cache does not do it, we had to add this tracking ourself earlier(in my previous PR). So definitely, this is better than using guava for this tracking. Producers are cached by the kafka parameters they are created with, in other words, if kafka params change, we get a fresh instance of KafkaProducer from the cache. What happens to the old instance? It sits in the cache till the object expires and gets evicted thereafter. Since Kafka Producer is thread safe, it is shared across all the threads on the executor. Q. Is there a case, where we would use more than one kafka producer at a time ? if no, then why do we need object pooling? If yes, when would that happen?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
