Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19770#discussion_r154817100
  
    --- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -643,6 +633,44 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
         } finally {
           iterator.foreach(_.close())
         }
    +
    +    // Clean corrupt or empty files that may have accumulated.
    +    if (AGGRESSIVE_CLEANUP) {
    +      var untracked: Option[KVStoreIterator[LogInfo]] = None
    +      try {
    +        untracked = Some(listing.view(classOf[LogInfo])
    +          .index("lastModifiedTime")
    --- End diff --
    
    Because this is based on `FileStatus.getModificationTime`, couldn't this 
end up deleting event log files for long-running applications (i.e. those that 
run for over the configured cleanup period)? That will probably cause weird 
issues if those applications are actually running.
    
    You refer to this issue in your tests, but the code doesn't seem to be 
testing that explicit case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to