thejdeep commented on code in PR #38983:
URL: https://github.com/apache/spark/pull/38983#discussion_r1045377745


##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -272,17 +272,17 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
 
     // Disable the background thread during tests.
     if (!conf.contains(IS_TESTING)) {
-      // A task that periodically checks for event log updates on disk.
-      logDebug(s"Scheduling update thread every $UPDATE_INTERVAL_S seconds")
-      pool.scheduleWithFixedDelay(
-        getRunner(() => checkForLogs()), 0, UPDATE_INTERVAL_S, 
TimeUnit.SECONDS)
-
       if (conf.get(CLEANER_ENABLED)) {
         // A task that periodically cleans event logs on disk.
         pool.scheduleWithFixedDelay(
           getRunner(() => cleanLogs()), 0, CLEAN_INTERVAL_S, TimeUnit.SECONDS)
       }
 
+      // A task that periodically checks for event log updates on disk.

Review Comment:
   We can only control this interleaving to a certain extent. For example, when 
`UPDATE_INTERVAL_S` is configured to be less than `CLEAN_INTERVAL_S` (which is 
usually the case), we are still going to be performing the `checkForLogs` 
before we `cleanLogs`. 



##########
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala:
##########
@@ -977,6 +977,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
         listing.delete(classOf[LogInfo], log.logPath)
       }
     }
+    // Delete log files that don't exist in listing database and exceed the 
configured max age.
+    Option(fs.listStatus(new Path(logDir))).map(_.toSeq).getOrElse(Nil)

Review Comment:
   I think the problem with this is - what would happen if new event logs are 
added into the event logging directory and the `cleanLogs` thread runs before 
the `checkForLogs` thread runs, then we end up deleting the newly added event 
logs. Is my understanding right ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to