HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log 
elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-588592849
 
 
   > I think the information which prints out is not necessary for the users 
   
   I'm not sure I can agree with. The information is pretty much similar with 
what InMemoryFileIndex provides the information for listing leaf files in 
InMemoryFileIndex, which level is set to INFO if I remember correctly.
   
   For streaming workloads, latency is the first class consideration. End users 
would have no idea why the overall latency suddenly increases per N batches 
unless they know about the details of metadata on FileStreamSource / 
FileStreamSink. This is completely different user experience they would 
experience with Kafka streaming source and sink - they may struggle to find the 
root cause from another spots like their query or so.
   
   But I'd agree that the information may not be necessary for the users if the 
latency being added here is not considerable. We could set a threshold (like 1s 
or 2s?) and only print when the latency exceeds the threshold (still print it 
with DEBUG level even it doesn't reach threshold), but then that would deserve 
to have higher severity, WARN.
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to