Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19770#discussion_r154817100
--- Diff:
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -643,6 +633,44 @@ private[history] class FsHistoryProvider(conf:
SparkConf, clock: Clock)
} finally {
iterator.foreach(_.close())
}
+
+ // Clean corrupt or empty files that may have accumulated.
+ if (AGGRESSIVE_CLEANUP) {
+ var untracked: Option[KVStoreIterator[LogInfo]] = None
+ try {
+ untracked = Some(listing.view(classOf[LogInfo])
+ .index("lastModifiedTime")
--- End diff --
Because this is based on `FileStatus.getModificationTime`, couldn't this
end up deleting event log files for long-running applications (i.e. those that
run for over the configured cleanup period)? That will probably cause weird
issues if those applications are actually running.
You refer to this issue in your tests, but the code doesn't seem to be
testing that explicit case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]