HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE]
Compact old event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#discussion_r355822917
##########
File path: docs/configuration.md
##########
@@ -1023,6 +1023,24 @@ Apart from these, the following properties are also
available, and may be useful
The max size of event log file before it's rolled over.
</td>
</tr>
+<tr>
+ <td><code>spark.eventLog.rolling.maxFilesToRetain</code></td>
+ <td>Int.MaxValue</td>
+ <td>
+ The maximum number of event log files which will be retained as
non-compacted.
+ By default, all event log files will be retained. Please set the
configuration and
+ <code>spark.eventLog.rolling.maxFileSize</code> accordingly if you want to
control
+ the overall size of event log files. The event log files older than these
retained
+ files will be compacted into single file and deleted afterwards.<br/>
+ NOTE 1: Compaction will happen in Spark History Server, which means the
same value
+ will be applied across applications which are being loaded in Spark
History Server,
+ as well as compaction and cleanup would require running Spark History
Server.<br/>
+ NOTE 2: Spark History Server may not compact the old event log files if it
figures
+ out compaction on event log for such application won't reduce the size at
expected
+ rate threshold. For streaming query (including Structured Streaming) we
normally
+ expect compaction will run, but for batch query compaction won't run in
most cases.
Review comment:
No I don't expect compaction will run for batch query in most cases, as we
measure the acceptance rate and don't run compaction if the rate is low.
(That's a new change reflecting your suggestion.)
It might be possible if there're multiple "short" batch queries being run in
same driver process, but except jobserver-like one, I'm not sure it's the one
of major cases for batch query.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]