HeartSaVioR opened a new pull request #27557: [SPARK-30804] Measure and log 
elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557
 
 
   ### What changes were proposed in this pull request?
   
   This patch adds some log messages to log elapsed time for "compact" 
operation in FileStreamSourceLog and FileStreamSinkLog (added in 
CompactibleFileStreamLog) to help investigating the mysterious latency spike 
during the batch run.
   
   ### Why are the changes needed?
   
   Tracking latency is a critical aspect of streaming query. While "compact" 
operation may bring nontrivial latency (it's even synchronous, adding all the 
latency to the batch run), it's not measured and end users have to guess.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   N/A for UT. Manual test with streaming query using file source & file sink.
   
   > grep "for compact batch" <driver log>
   
   ```
   ...
   20/02/12 21:00:59 INFO FileStreamSourceLog: It took 527 ms to load 116003 
entries (33948200 bytes) for compact batch 20199.
   20/02/12 21:00:59 INFO FileStreamSourceLog: It took 469 ms to write 116003 
entries for compact batch 20199.
   20/02/12 21:01:16 INFO FileStreamSinkLog: It took 9523 ms to load 1010000 
entries (368291864 bytes) for compact batch 20199.
   20/02/12 21:01:23 INFO FileStreamSinkLog: It took 6568 ms to write 1010000 
entries for compact batch 20199.
   ...
   ```
   
   NOTE: The output may be a bit different from the code, as I used the patch 
while debugging in Spark 2.4.5 and I have been adjusting the message several 
times.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to