Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19770#discussion_r154814962
--- Diff:
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -643,6 +633,44 @@ private[history] class FsHistoryProvider(conf:
SparkConf, clock: Clock)
} finally {
iterator.foreach(_.close())
}
+
+ // Clean corrupt or empty files that may have accumulated.
+ if (AGGRESSIVE_CLEANUP) {
+ var untracked: Option[KVStoreIterator[LogInfo]] = None
+ try {
+ untracked = Some(listing.view(classOf[LogInfo])
--- End diff --
So I spent some time reading my own patch and it's covering a slightly
different case. My patch covers deleting SHS state when files are deleted, this
covers deleting files that the SHS decides are broken. I still think that some
code / state can be saved by handling both similarly - still playing with my
code, though.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]