Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/21895#discussion_r205948923
--- Diff:
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -973,6 +973,38 @@ private[history] object FsHistoryProvider {
private[history] val CURRENT_LISTING_VERSION = 1L
}
+private[history] trait CachedFileSystemHelper extends Logging {
+ protected def fs: FileSystem
+
+ /**
+ * Cache containing the result for the already checked files.
+ */
+ // Visible for testing.
+ private[history] val cache = new mutable.HashMap[String, Boolean]
--- End diff --
For long running history server in busy clusters (particularly where
`spark.history.fs.cleaner.maxAge` is configured to be low), this Map will cause
OOM.
Either an LRU cache or a disk backed map with periodic cleanup (based on
maxAge) might be better ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]