gaborgsomogyi commented on a change in pull request #27146: 
[SPARK-21869][SS][DOCS][FOLLOWUP] Document Kafka producer pool configuration
URL: https://github.com/apache/spark/pull/27146#discussion_r365161056
 
 

 ##########
 File path: docs/structured-streaming-kafka-integration.md
 ##########
 @@ -802,6 +802,31 @@ df.selectExpr("topic", "CAST(key AS STRING)", "CAST(value 
AS STRING)") \
 </div>
 </div>
 
+### Producer Caching
+
+Given Kafka producer instance is designed to be thread-safe, Spark initializes 
a Kafka producer instance and co-use across tasks for same caching key.
+
+The caching key is built up from the following information:
 
 Review comment:
   @dongjoon-hyun really good point! If delegation token used then time to time 
new producer must be created and the old must be evicted otherwise the query 
will fail. There are multiple ways to reach that (not yet analyzed how it's 
done in the latest change made by @HeartSaVioR but I'm on it):
   * Either the cache key contains authentication information (dynamic jaas 
config). This way the new producer creation and old eviction would be 
automatic. Without super deep consideration that's my suggested way.
   * Or the cache key NOT contains authentication information (dynamic jaas 
config). This ways additional logic must be added to handle this scenario. At 
the first place I have the feeling it would just add complexity increase and 
would make this part of code brittle.
   
   As I understand from @HeartSaVioR comment the first approach is implemented 
at the moment. If that so then I'm fine with that but I would mention 2 things 
here:
   * The key may contain authentication information
   * There could be situations where more than one producer is instantiated. 
This is important because producers are consuming significant amount of memory 
as @zsxwing pointed out.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to