zhouyejoe opened a new pull request #35906:
URL: https://github.com/apache/spark/pull/35906


   ### What changes were proposed in this pull request?
   This PR adds the capability of storing the required information into LevelDB 
for push based shuffle.
   
   ### Why are the changes needed?
   Without this PR, all the information is currently only stored in memory for 
push based shuffle in shuffle services. During NodeManager restarts, all these 
information will be lost. Either all the former merged shuffle data won't be 
able to serve the fetch requests, nor the shuffle services cannot merge any new 
push blocks from existing applications. After this patch, those information 
will be stored in LevelDB, and all the information will be recovered during 
NodeManager restarts.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Unit testing.
   Deployed to clusters, and restart NMs in 3 different scenarios:
   While a Spark shell application is running, restart NMs in magnet partition 
in three different scenarios:
   1. After Spark-shell starts, but no scripts running, this will test the NM 
restart after application/executors register with NMs
   2. While there is on-going shuffle push to shuffle services
   3. While there is on-going merged shuffle fetch from shuffle services.
   Results of the large shuffle testing scripts is identical to the case when 
there is no NM restart.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to