HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE]
Compact old event log files and cleanup
URL: https://github.com/apache/spark/pull/26416#discussion_r357460749
##########
File path:
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##########
@@ -963,8 +1007,15 @@ private[history] class FsHistoryProvider(conf:
SparkConf, clock: Clock)
} replayBus.addListener(listener)
try {
+ val eventLogFiles = reader.listEventLogFiles
+ val newEventLogFiles = if (compactible.contains(true)) {
+ logInfo(s"Compacting ${reader.rootPath}...")
+ fileCompactor.compact(eventLogFiles)
Review comment:
> I think tying this to the UI rebuild code is not a great idea, since it
means that logs may never be compacted.
You're right. I wanted to simplify the thing first given it would couple
with thread-safety while the code change already goes beyond 2000+ lines, but I
have been also feeling that it should be fixed.
> So perhaps that PR that allows tasks to run in parallel in the SHS really
can help here (then you can compact during the checkLogs() tasks).
Yes but given #25797 is still in reviewing I'm not sure how I can leverage
it. Pick the PR up and try to finalize that first?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]