HeartSaVioR commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-558929106 Just updated the commit reflecting the idea, as well as description of the PR. Quoting the addition of description again here: > Note that compaction is very effective on streaming query since each batch will launch and finish job(s) shortly - we don't expect a batch in streaming query will run for 10s of minutes, meaning each job wouldn't be gigantic - we expect most of events will be filtered out "per" each event log file if we apply the compaction. In other words, compaction doesn't work on batch query and just add huge overheads on compaction with no value. > > To remedy the issue above, SHS will do dry-run of compaction for "first" available event log file to determine the rate of events being filtered in, and compact the event log only if the rate is low. The dry-run will only run once per application, since SHS will store the result into LogInfo.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
