HeartSaVioR opened a new pull request #25996: [SPARK-29322][CORE] Enable closeFrameOnFlush on ZstdOutputStream for event log file URL: https://github.com/apache/spark/pull/25996 ### What changes were proposed in this pull request? This patch proposes to enable `closeFrameOnFlush` for ZstdOutputStream specific to event logger, so that continuous input stream of zstd is not stuck when reading "inprogress" event log file. The issue seems to be introduced from [SPARK-26283](https://issues.apache.org/jira/browse/SPARK-26283) which addressed some bug via reading event log file with enabling continuous mode, but it changed the behavior of input stream to read open frame, which seem to wait for frame to be closed. Enabling `closeFrameOnFlush` would close frame whenever flush is called, so input stream could read the frame sooner. As a pair of `compressedContinuousInputStream`, this patch adds `compressedContinuousOutputStream` which will be only used for event logging. ### Why are the changes needed? Without this patch, the reader thread in SHS is stuck on reading inprogress event log file compressed with zstd until the application is finished. ### Does this PR introduce any user-facing change? It might bring some overhead on each flush when writing zstd compressed event log, so some sort of performance hit could be introduced. I've restricted the case to only event logging. ### How was this patch tested? Manually tested, via setting Spark configuration as below: ``` spark.eventLog.enabled true spark.eventLog.compress true spark.eventLog.compression.codec zstd ``` and start Spark application. While the application is running, load the application in SHS webpage. Before this patch, it may succeed to replay the event log, but high likely it will be stuck and loading page will be also stuck. After this patch, SHS can properly reads the inprogress event log file.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
