HeartSaVioR commented on PR #41525:
URL: https://github.com/apache/spark/pull/41525#issuecomment-1588699211

   > Another alternative is to log the timing every time poll() is called. 
Won't it be potentially to spamming?
   
   That was what I've been thinking through, but I agree that that could be 
shown as spamming, depending on how many records Kafka would give per fetch.
   
   What about providing statistics for all poll() happened in a single 
microbatch (more clearly, one cycle of the consumer, from borrow to 
close/return)? If there are multiple polls happened we can calculate simple (or 
slightly richer) stats and provide that at the end like we do in current 
change.  If there is no poll happened in a cycle at all, we can simply log that 
no record has been fetched hence no stats.
   
   That way I think is much better than expecting end users to add up these 
metrics among microbatches by themselves. Many end users won't know about the 
implementation details. 
   
   
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
   
   > Along with consumers, Spark pools the records fetched from Kafka 
separately, to let Kafka consumers stateless in point of Spark’s view, and 
maximize the efficiency of pooling. It leverages same cache key with Kafka 
consumers pool. Note that it doesn’t leverage Apache Commons Pool due to the 
difference of characteristics.
   
   This is pretty much everything we talked about cache for fetched data. I 
think many users even don't get to the point till they encounter the problem 
and have to look at the guide doc.
   
   This is definitely a step forward on debuggability but we will need a way to 
provide the better visibility on operational perspective.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to