otterc opened a new pull request #33378: URL: https://github.com/apache/spark/pull/33378
### What changes were proposed in this pull request? Below 2 bugs were introduced with https://github.com/apache/spark/pull/32140 1. Instead of requesting the local-dirs for push-merged-local blocks from the ESS, `PushBasedFetchHelper` requests it from other executors. Push-based shuffle is only enabled when the ESS is enabled so it should always fetch the dirs from the ESS and not from other executors which is not yet supported. 2. The size of the push-merged blocks is logged incorrectly. ### Why are the changes needed? This fixes the above mentioned bugs and is needed for push-based shuffle to work properly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested this by running an application on the cluster. The UTs mock the call `hostLocalDirManager.getHostLocalDirs` which is why didn't catch (1) with the UT. However, the fix is trivial and checking this in the UT will require a lot more effort so I haven't modified it in the UT. Logs of the executor with the bug ``` 21/07/15 15:42:46 WARN ExternalBlockStoreClient: Error while trying to get the host local dirs for [shuffle-push-merger] 21/07/15 15:42:46 WARN PushBasedFetchHelper: Error while fetching the merged dirs for push-merged-local blocks: shuffle_0_-1_13. Fetch the original blocks instead java.lang.RuntimeException: java.lang.IllegalStateException: Invalid executor id: shuffle-push-merger, expected 92. at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:130) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163) ``` After the fix, the executors were able to fetch the push-merged blocks: ``` 21/07/15 16:37:36 INFO PushBasedFetchHelper: Received the meta of push-merged block for (0, 4993) from ltx1-hcl3406.grid.linkedin.com:7338 21/07/15 16:37:40 INFO PushBasedFetchHelper: Received the meta of push-merged block for (0, 4991) from ltx1-hcl3406.grid.linkedin.com:7338 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
