Stove-hust commented on code in PR #40412:
URL: https://github.com/apache/spark/pull/40412#discussion_r1198448075


##########
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala:
##########
@@ -273,7 +273,7 @@ private[spark] class DiskBlockManager(
       Utils.getConfiguredLocalDirs(conf).foreach { rootDir =>
         try {
           val mergeDir = new File(rootDir, mergeDirName)
-          if (!mergeDir.exists()) {
+          if (!mergeDir.exists() || mergeDir.listFiles().length < 
subDirsPerLocalDir) {

Review Comment:
   It is very common to use multiple Executors for the same Application on the 
same machine, and as you say, the `listFiles` operation is almost inevitable, 
so I think it is necessary to think about what you are saying.
   First of all, it would be possible to cache the list of subdirectories in 
memory after `listFiles`, and then check the existence of the subdirectories 
when they are created, but this would use some memory (albeit small, and we 
could manually set it to null after creating the directory)
   I'm not sure what your concern is, but I can try to understand it, is it 
that there are dirty directories under the merge directory? Because that would 
cause some of the subdirectories not to be created. But I think this worry is 
not necessary, because under normal circumstances, due to permission related 
issues, there will be no dirty directories in the merge directory, so there is 
no problem to determine the number of subdirectories directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to