davidnavas opened a new pull request #24616: [SPARK-27726] [Core] Fix 
performance of ElementTrackingStore deletes when using InMemoryStore under high 
loads
URL: https://github.com/apache/spark/pull/24616
 
 
   ## What changes were proposed in this pull request?
   
   The details of the PR are explored in-depth in the sub-tasks of the umbrella 
jira SPARK-27726.
   Briefly:
     1. Stop issuing asynchronous requests to cleanup elements in the tracking 
store when a request is already pending
     2. Fix a couple of thread-safety issues (mutable state and mis-ordered 
updates)
     3. Move Summary deletion outside of Stage deletion loop like Tasks already 
are
     4. Reimplement multi-delete in a removeAllKeys call which allows 
InMemoryStore to implement it in a performant manner.
     5. Some generic typing and exception handling cleanup
   
   We see about five orders of magnitude improvement in the deletion code, 
which for us is the difference between a server that needs restarting daily, 
and one that is stable over weeks.
   
   ## How was this patch tested?
   
   Unit tests for the fire-once asynchronous code and the removeAll calls in 
both LevelDB and InMemoryStore are supplied.  It was noted that the testing 
code for the LevelDB and InMemoryStore is highly repetitive, and should 
probably be merged, but we did not attempt that in this PR.
   
   A version of this code was run in our production 2.3.3 and we were able to 
sustain higher throughput without going into GC overload (which was happening 
on a daily basis some weeks ago).
   
   A version of this code was also put under a purpose-built Performance Suite 
of tests to verify performance under both types of Store implementations for 
both before and after code streams and for both total and partial delete cases 
(this code is not included in this PR).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to