HeartSaVioR opened a new pull request #27557: [SPARK-30804] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog URL: https://github.com/apache/spark/pull/27557 ### What changes were proposed in this pull request? This patch adds some log messages to log elapsed time for "compact" operation in FileStreamSourceLog and FileStreamSinkLog (added in CompactibleFileStreamLog) to help investigating the mysterious latency spike during the batch run. ### Why are the changes needed? Tracking latency is a critical aspect of streaming query. While "compact" operation may bring nontrivial latency (it's even synchronous, adding all the latency to the batch run), it's not measured and end users have to guess. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A for UT. Manual test with streaming query using file source & file sink. > grep "for compact batch" <driver log> ``` ... 20/02/12 21:00:59 INFO FileStreamSourceLog: It took 527 ms to load 116003 entries (33948200 bytes) for compact batch 20199. 20/02/12 21:00:59 INFO FileStreamSourceLog: It took 469 ms to write 116003 entries for compact batch 20199. 20/02/12 21:01:16 INFO FileStreamSinkLog: It took 9523 ms to load 1010000 entries (368291864 bytes) for compact batch 20199. 20/02/12 21:01:23 INFO FileStreamSinkLog: It took 6568 ms to write 1010000 entries for compact batch 20199. ... ``` NOTE: The output may be a bit different from the code, as I used the patch while debugging in Spark 2.4.5 and I have been adjusting the message several times.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
