HeartSaVioR commented on issue #25853: [SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer URL: https://github.com/apache/spark/pull/25853#issuecomment-555538298 Looks like Gabor has a good point about SPARK-21869. Elaborating the point a bit more, the root problem of producer cache is, it will be expired after 10 mins if some other task doesn't access the instance within new timeout, "regardless of the usage of producer". That doesn't work like heartbeat - producer instance will be expired even if the task gets the instance and writes the messages continuously more than 10 mins. If there's a luck, other task could access the same instance and extend the timeout, but if not, it's going to be timed out and task will be broken due to suddenly closed of producer instance. Things will get worse if there's some path which shares the producer instance directly to avoid getting instance from cache. So the cache doesn't look to be working as intuitive way - as it doesn't only expire the producer instance if there's no activity during timeout. A task spending 10 mins is abnormal case for streaming, so the behavior of cache won't bring actual issues for streaming cases, but I guess the expectations might be different for batch cases.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
