otterc commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r512427399



##########
File path: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -94,6 +95,9 @@
   static final String STOP_ON_FAILURE_KEY = "spark.yarn.shuffle.stopOnFailure";
   private static final boolean DEFAULT_STOP_ON_FAILURE = false;
 
+  // Used by shuffle merge manager to create merged shuffle files.
+  protected static final String APP_BASE_RELATIVE_PATH = 
"usercache/%s/appcache/%s/";

Review comment:
       While making changes to support sub-dir, I realized that 
`RemoteBlockPushResolver` doesn't really need this 
   `usercache/%s/appcache/%s` format.
   In one of our older versions when we were **not** using the  
`executorShuffleInfo` from the `registerExecutor` message, we added this. 
However, now it uses the `localDirs` from `executorShuffleInfo` which are paths 
of the `blockManagerDirs`. The current implementation finds the root local dir 
by `localDir.substring(0, "usercache/{userId}/appcache/{appId/")` which seems 
un-necessary. What we need is to find the parent directory of each 
`blockManagerDir` and construct the path of each `merge_manager` directory, 
which is a sibling of `blockManagerDir`. Even though this is not that clean but 
it is similar to what we have now and doesn't change the `registerExecutor` 
protocol.
   
   Although this may increase the memory footprint of `appPathsInfo` map 
slightly.
   In an `appPathInfo`, instead of saving the complete local paths, for example 
   `[localDir1/usercache/{userId}/appcache/{appId}/merge_manager, 
localDir2/usercache/{userId}/appcache/{appId}/merge_manager/]`, currently we 
only save
   `[localDir1, localDir2]`.  While constructing the target file path we add 
the sub-part `usercache/{userId}/appcache/{appId}`.
   The target file path would be constructing using these parts: 
`{localDir}/{usercache/%s/appcache/%s/merge_manager}/filename`. 
   I do think this increase will not be that significant. 
   It would be much better to remove this assumption on the path format.
   
   @Victsm @zhouyejoe  @tgravescs @Ngone51 @attilapiros @mridulm 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to