HeartSaVioR opened a new pull request #25996: [SPARK-29322][CORE] Enable 
closeFrameOnFlush on ZstdOutputStream for event log file
URL: https://github.com/apache/spark/pull/25996
 
 
   ### What changes were proposed in this pull request?
   
   This patch proposes to enable `closeFrameOnFlush` for ZstdOutputStream 
specific to event logger, so that continuous input stream of zstd is not stuck 
when reading "inprogress" event log file.
   
   The issue seems to be introduced from 
[SPARK-26283](https://issues.apache.org/jira/browse/SPARK-26283) which 
addressed some bug via reading event log file with enabling continuous mode, 
but it changed the behavior of input stream to read open frame, which seem to 
wait for frame to be closed. Enabling `closeFrameOnFlush` would close frame 
whenever flush is called, so input stream could read the frame sooner.
   
   As a pair of `compressedContinuousInputStream`, this patch adds 
`compressedContinuousOutputStream` which will be only used for event logging.
   
   ### Why are the changes needed?
   
   Without this patch, the reader thread in SHS is stuck on reading inprogress 
event log file compressed with zstd until the application is finished.  
   
   ### Does this PR introduce any user-facing change?
   
   It might bring some overhead on each flush when writing zstd compressed 
event log, so some sort of performance hit could be introduced. I've restricted 
the case to only event logging.
   
   ### How was this patch tested?
   
   Manually tested, via setting Spark configuration as below:
   
   ```
   spark.eventLog.enabled                     true
   spark.eventLog.compress                  true
   spark.eventLog.compression.codec zstd
   ```
   
   and start Spark application. While the application is running, load the 
application in SHS webpage. 
   
   Before this patch, it may succeed to replay the event log, but high likely 
it will be stuck and loading page will be also stuck. After this patch, SHS can 
properly reads the inprogress event log file.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to