Stove-hust commented on code in PR #40412:
URL: https://github.com/apache/spark/pull/40412#discussion_r1198448075
##########
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala:
##########
@@ -273,7 +273,7 @@ private[spark] class DiskBlockManager(
Utils.getConfiguredLocalDirs(conf).foreach { rootDir =>
try {
val mergeDir = new File(rootDir, mergeDirName)
- if (!mergeDir.exists()) {
+ if (!mergeDir.exists() || mergeDir.listFiles().length <
subDirsPerLocalDir) {
Review Comment:
It is very common to use multiple Executors for the same Application on the
same machine, and as you say, the `listFiles` operation is almost inevitable,
so I think it is necessary to think about what you are saying.
First of all, it would be possible to cache the list of subdirectories in
memory after `listFiles`, and then check the existence of the subdirectories
when they are created, but this would use some memory (albeit small, and we
could manually set it to null after creating the directory)
I'm not sure what your concern is, but I can try to understand it, is it
that there are dirty directories under the merge directory? Because that would
cause some of the subdirectories not to be created. But I think this worry is
not necessary, because under normal circumstances, due to permission related
issues, there will be no dirty directories in the merge directory, so there is
no problem to determine the number of subdirectories directly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]