Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21895#discussion_r205948923
  
    --- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -973,6 +973,38 @@ private[history] object FsHistoryProvider {
       private[history] val CURRENT_LISTING_VERSION = 1L
     }
     
    +private[history] trait CachedFileSystemHelper extends Logging {
    +  protected def fs: FileSystem
    +
    +  /**
    +   * Cache containing the result for the already checked files.
    +   */
    +  // Visible for testing.
    +  private[history] val cache = new mutable.HashMap[String, Boolean]
    --- End diff --
    
    For long running history server in busy clusters (particularly where 
`spark.history.fs.cleaner.maxAge` is configured to be low), this Map will cause 
OOM.
    Either an LRU cache or a disk backed map with periodic cleanup (based on 
maxAge) might be better ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to