gaborgsomogyi commented on issue #25670: [SPARK-28869][CORE] Roll over event 
log files
URL: https://github.com/apache/spark/pull/25670#issuecomment-550356572
 
 
   @HeartSaVioR just gone through this PR and plan to join later developments. 
You can ping me if this feature goes forward. One question what I have now. Do 
I see correctly that the actual implementation measures the events size before 
compression? If yes maybe my suggestion can be considered.
   
   Namely I can see 2 possibilities to overcame this but both cases the basic 
idea is the same. 
[Dstream](https://github.com/apache/spark/blob/8353000b47e41d46fba68e2288769ef8ba77bf47/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala#L93-L100)
 variable can be wrapped with `CountingOutputStream` which would measure file 
size after compression.
   
   1. If we can NOT treat `spark.eventLog.rolling.maxFileSize` as soft 
threshold we can `listen` at this point (with custom `CountingOutputStream`) 
for writes and initiate rolling there.
   2. If we can treat `spark.eventLog.rolling.maxFileSize` as soft threshold we 
can just use `dstream.getBytesWritten()` in the actual condition.
   
   I've tested the second approach with lz4, lzf, snappy, zstd and only lz4 
didn't flush the buffer immediately. Of course this doesn't mean 2nd approach 
is advised, just wanted to give more info...
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to