gaborgsomogyi commented on issue #25670: [SPARK-28869][CORE] Roll over event log files URL: https://github.com/apache/spark/pull/25670#issuecomment-550356572 @HeartSaVioR just gone through this PR and plan to join later developments. You can ping me if this feature goes forward. One question what I have now. Do I see correctly that the actual implementation measures the events size before compression? If yes maybe my suggestion can be considered. Namely I can see 2 possibilities to overcame this but both cases the basic idea is the same. [Dstream](https://github.com/apache/spark/blob/8353000b47e41d46fba68e2288769ef8ba77bf47/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala#L93-L100) variable can be wrapped with `CountingOutputStream` which would measure file size after compression. 1. If we can NOT treat `spark.eventLog.rolling.maxFileSize` as soft threshold we can `listen` at this point (with custom `CountingOutputStream`) for writes and initiate rolling there. 2. If we can treat `spark.eventLog.rolling.maxFileSize` as soft threshold we can just use `dstream.getBytesWritten()` in the actual condition. I've tested the second approach with lz4, lzf, snappy, zstd and only lz4 didn't flush the buffer immediately. Of course this doesn't mean 2nd approach is advised, just wanted to give more info...
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
