HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] 
Compact old event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#discussion_r355822917
 
 

 ##########
 File path: docs/configuration.md
 ##########
 @@ -1023,6 +1023,24 @@ Apart from these, the following properties are also 
available, and may be useful
     The max size of event log file before it's rolled over.
   </td>
 </tr>
+<tr>
+  <td><code>spark.eventLog.rolling.maxFilesToRetain</code></td>
+  <td>Int.MaxValue</td>
+  <td>
+    The maximum number of event log files which will be retained as 
non-compacted.
+    By default, all event log files will be retained. Please set the 
configuration and
+    <code>spark.eventLog.rolling.maxFileSize</code> accordingly if you want to 
control
+    the overall size of event log files. The event log files older than these 
retained
+    files will be compacted into single file and deleted afterwards.<br/>
+    NOTE 1: Compaction will happen in Spark History Server, which means the 
same value
+    will be applied across applications which are being loaded in Spark 
History Server,
+    as well as compaction and cleanup would require running Spark History 
Server.<br/>
+    NOTE 2: Spark History Server may not compact the old event log files if it 
figures
+    out compaction on event log for such application won't reduce the size at 
expected
+    rate threshold. For streaming query (including Structured Streaming) we 
normally
+    expect compaction will run, but for batch query compaction won't run in 
most cases.
 
 Review comment:
   No I don't expect compaction will run for batch query in most cases, as we 
measure the acceptance rate only once, and don't run compaction if the rate is 
low. (That's a new change reflecting your suggestion.)
   It might be possible if there're multiple "short" batch queries being run in 
same driver process, but except jobserver-like one, I'm not sure it's the one 
of major cases for batch query.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to