otterc opened a new pull request #33378:
URL: https://github.com/apache/spark/pull/33378


   ### What changes were proposed in this pull request?
   Below 2 bugs were introduced with https://github.com/apache/spark/pull/32140
   1. Instead of requesting the local-dirs for push-merged-local blocks from 
the ESS, `PushBasedFetchHelper` requests it from other executors. Push-based 
shuffle is only enabled when the ESS is enabled so it should always fetch the 
dirs from the ESS and not from other executors which is not yet supported.
   2. The size of the push-merged blocks is logged incorrectly.
   
   ### Why are the changes needed?
   This fixes the above mentioned bugs and is needed for push-based shuffle to 
work properly.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Tested this by running an application on the cluster. The UTs mock the call 
`hostLocalDirManager.getHostLocalDirs` which is why didn't catch (1) with the 
UT. However, the fix is trivial and checking this in the UT will require a lot 
more effort so I haven't modified it in the UT.
   Logs of the executor with the bug
   ```
   21/07/15 15:42:46 WARN ExternalBlockStoreClient: Error while trying to get 
the host local dirs for [shuffle-push-merger]
   21/07/15 15:42:46 WARN PushBasedFetchHelper: Error while fetching the merged 
dirs for push-merged-local blocks: shuffle_0_-1_13. Fetch the original blocks 
instead
   java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
executor id: shuffle-push-merger, expected 92.
        at 
org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:130)
        at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:163)
   ```
   After the fix, the executors were able to fetch the push-merged blocks:
   ```
   21/07/15 16:37:36 INFO PushBasedFetchHelper: Received the meta of 
push-merged block for (0, 4993)  from ltx1-hcl3406.grid.linkedin.com:7338
   21/07/15 16:37:40 INFO PushBasedFetchHelper: Received the meta of 
push-merged block for (0, 4991)  from ltx1-hcl3406.grid.linkedin.com:7338
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to