HeartSaVioR commented on issue #26416: [SPARK-29779][CORE] Compact old event 
log files and cleanup
URL: https://github.com/apache/spark/pull/26416#issuecomment-558929106
 
 
   Just updated the commit reflecting the idea, as well as description of the 
PR.
   
   Quoting the addition of description again here:
   
   > Note that compaction is very effective on streaming query since each batch 
will launch and finish job(s) shortly - we don't expect a batch in streaming 
query will run for 10s of minutes, meaning each job wouldn't be gigantic - we 
expect most of events will be filtered out "per" each event log file if we 
apply the compaction. In other words, compaction doesn't work on batch query 
and just add huge overheads on compaction with no value.
   > 
   > To remedy the issue above, SHS will do dry-run of compaction for "first" 
available event log file to determine the rate of events being filtered in, and 
compact the event log only if the rate is low. The dry-run will only run once 
per application, since SHS will store the result into LogInfo.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to