taroplus opened a new pull request #34090:
URL: https://github.com/apache/spark/pull/34090


   ### What changes were proposed in this pull request?
   This PR fixes a performance issue in `AppStatusListener.cleanupStages`. When 
there are large number of stages in store, this logic below runs like N*M order.
   
   ```
       val stageIds = stages.map { s =>
         val key = Array(s.info.stageId, s.info.attemptId)
         kvstore.delete(s.getClass(), key)
   
         // Check whether there are remaining attempts for the same stage. If 
there aren't, then
         // also delete the RDD graph data.
         val remainingAttempts = kvstore.view(classOf[StageDataWrapper])
           .index("stageId")
           .first(s.info.stageId)
           .last(s.info.stageId)
           .closeableIterator()
           ...
   ```
   Instead of accessing the view for checking remaining task per stage, this 
change is to move the logic after removing stages. Then it only needs to access 
the view(`kvstore.view(classOf[StageDataWrapper])`) once.
   
   ### Why are the changes needed?
   When there are more than ideal number of stages kept inside the memory, the 
clean up process is unable to catch up with the speed of incoming stages 
because of this perf issue, that leads to a behavior which looks like a memory 
leak.  Eventually it causes OutOfMemoryError.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   The behavior should be identical before and after the change, and the 
existing tests should verify that. This change has been applied to the 
environment where constant memory leak was observed. With the same load, now 
services are running perfectly healthy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to