HeartSaVioR commented on a change in pull request #27208: [SPARK-30481][CORE] 
Integrate event log compactor into Spark History Server
URL: https://github.com/apache/spark/pull/27208#discussion_r368276761
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
 ##########
 @@ -661,26 +691,33 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
       reader: EventLogFileReader,
       scanTime: Long,
       enableOptimizations: Boolean): Unit = {
+    val rootPath = reader.rootPath
     try {
+      val (shouldReload, lastCompactionIndex) = compact(reader)
 
 Review comment:
   I guess what you suggest is separating two tasks and applying lock 
(processing) for each task, especially let all listing tasks submitted first 
and all compaction tasks submitted later. 
   
   That would work, but we may also want to consider the difference between 
cleaning logs and compaction - cleaning logs have its own interval and 
triggered independently, but compaction is triggered conditionally, only when 
the return value of shouldReload is true. That means we may want to always do 
compaction for compaction task, irrelevant of status of `processing` - we 
wouldn't want to skip, but it's not ideal if we make the task be stuck and wait 
for `processing`. Maybe has to resubmit at the end of task if the task cannot 
process due to `processing`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to