mridulm edited a comment on pull request #34043:
URL: https://github.com/apache/spark/pull/34043#issuecomment-924492762


   Thanks for the jira and stack trace - that was really helpful !
   Looking at it more, the issue here is that we are relying on 
`BlockManagerMasterEndpoint` to update state in `MapOutputTracker` (which is 
usually done via `DAGScheduler`).
   
   Given there is no state mod for shuffle blocks when `UpdateBlockInfo` is 
received, it would simply be better to move just that codepath to a different 
thread - instead of trying to add fine grained locking for rest of 
`BlockManagerMasterEndpoint` (the change is insufficient to do that 
unfortunately).
   
   I am still testing locally, and making sure there are no issues - but the 
gist is:
   
   * `updateBlockInfo` returns `Future[Boolean]`
   * When `blockId` is a shuffle block - return a `Future { }` around the 
existing `blockId match`
   * For all other return in that method, use `Future.successful` to complete 
immediately.
   * As with the current PR, chain the future to return response.
   
   For shuffle blocks, it will should executed outside of the 
`BlockManagerMasterEndpoint`, while for the rest, it will get executed within 
the lock - and so no need to change rest of the state's MT safety.
   
   
   Thoughts ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to