LantaoJin commented on a change in pull request #29378:
URL: https://github.com/apache/spark/pull/29378#discussion_r467675243



##########
File path: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala
##########
@@ -46,18 +46,35 @@ private[spark] class DiskBlockManager(conf: SparkConf, 
deleteFilesOnStop: Boolea
     System.exit(ExecutorExitCode.DISK_STORE_FAILED_TO_CREATE_DIR)
   }
 
+  private def containerDirEnabled: Boolean = 
Utils.isRunningInYarnContainer(conf)
+
+  /* Create container directories on YARN to persist the temporary files.
+   * (temp_local, temp_shuffle)
+   * These files have no opportunity to be cleaned before application end on 
YARN.
+   * This is a real issue, especially for long-lived Spark application like 
Spark thrift-server.
+   * So we persist these files in YARN container directories which could be 
cleaned by YARN when
+   * the container exists. */
+  private[spark] val containerDirs: Array[File] =
+    if (containerDirEnabled) createContainerDirs(conf) else Array.empty[File]
+
   private[spark] val localDirsString: Array[String] = localDirs.map(_.toString)
 
   // The content of subDirs is immutable but the content of subDirs(i) is 
mutable. And the content
   // of subDirs(i) is protected by the lock of subDirs(i)
   private val subDirs = Array.fill(localDirs.length)(new 
Array[File](subDirsPerLocalDir))
 
+  private val subContainerDirs = if (containerDirEnabled) {
+    Array.fill(containerDirs.length)(new Array[File](subDirsPerLocalDir))
+  } else {
+    Array.empty[Array[File]]
+  }
+
   private val shutdownHook = addShutdownHook()
 
-  /** Looks up a file by hashing it into one of our local subdirectories. */
   // This method should be kept in sync with
   // org.apache.spark.network.shuffle.ExecutorDiskUtils#getFile().

Review comment:
       So the `getFile` in `DiskBlockManager` has 4 parameters:
   ```scala
   private def getFile(localDirs: Array[File], subDirs: Array[Array[File]],
         subDirsPerLocalDir: Int, filename: String): File
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to