HeartSaVioR commented on issue #25853: [SPARK-21869][SS] Apply Apache Commons 
Pool to Kafka producer
URL: https://github.com/apache/spark/pull/25853#issuecomment-555538298
 
 
   Looks like Gabor has a good point about SPARK-21869.
   
   Elaborating the point a bit more, the root problem of producer cache is, it 
will be expired after 10 mins if some other task doesn't access the instance 
within new timeout, "regardless of the usage of producer". That doesn't work 
like heartbeat - producer instance will be expired even if the task gets the 
instance and writes the messages continuously more than 10 mins. If there's a 
luck, other task could access the same instance and extend the timeout, but if 
not, it's going to be timed out and task will be broken due to suddenly closed 
of producer instance. Things will get worse if there's some path which shares 
the producer instance directly to avoid getting instance from cache.
   
   So the cache doesn't look to be working as intuitive way - as it doesn't 
only expire the producer instance if there's no activity during timeout. A task 
spending 10 mins is abnormal case for streaming, so the behavior of cache won't 
bring actual issues for streaming cases, but I guess the expectations might be 
different for batch cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to