HeartSaVioR commented on issue #19096: [SPARK-21869][SS] A cached Kafka 
producer should not be closed if any task is using it - adds inuse tracking.
URL: https://github.com/apache/spark/pull/19096#issuecomment-477846977
 
 
   I'm really sorry to say on the PR which already goes through pretty long 
discussions and actually I asked to resurrect, so blame me please.
   
   First of all, I guess the patch is addressing the issue properly in current 
caching approach, and we should adopt this patch if we stick with current 
approach. (That's why I said the patch is still needed and asked to resurrect 
this.)
   
   Revisiting the current caching approach based on the fact, the caching 
approach now requires concerning about thread-safety, and to handle it we are 
also adding flags and counters to handle, making cached producer instances 
being stateful (except the things Kafka producer itself may have), which would 
make us not easy to understand and maintain the code.
   
   If I remember it correctly, in SPARK-25151 (#22138 - the issue is still open 
and valid I think) someone (I guess Gabor) asked me "why don't you apply 
commons-pool to producer as well", I answered producer is thread-safe so can 
apply current cache approach whereas consumer is not thread-safe so it should 
have to be isolated (good to apply general connection pool implementation).
   
   Now I feel it's worth to consider applying it to producer as well (isolating 
per thread) which keeps producer to be stateless (same here, except the things 
Kafka producer itself may have) - it should require more connections to Kafka 
if there're concurrent usages, but once the number of necessary producers are 
pooled it would not hurt, which we expect for long running query like streaming 
one.
   
   Would like to hear everyone's voices. Thanks in advance!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to