craigcondit opened a new pull request #375:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/375


   ### What is this PR for?
   Fix leaks and inconsistencies in the shim scheduler cache.
   
   Many of these issues have been shown to lead to scheduling failures (such as 
hitting max Pod limits) because we don't properly cleanup allocations.
   
   - Made Add/Update/Remove Pod/Node handlers idempotent and removed error 
returns wherever possible
   - Ensure RemoveNode properly removes any associated Pod allocations
   - Added debugging hooks to pre/post handlers to dump state of scheduler cache
   - Handle terminated pods in a similar fashion to removed Pods
   - Fix leaking of assumed pods
   - Fix duplicate addition of pods to NodeInfo.Pods structure
   - Track pod -> node assignments separately from NodeInfo as NodeInfo stores 
pods in a list and doesn't check for duplicates
   - Refactored Add/Update methods to share logic wherever possible
   - Log both podKey and podName when updates are made
   
   ### What type of PR is it?
   * [x] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/YUNIKORN-1100
   
   ### How should this be tested?
   Unit tests updated and debug logs show that we are no longer leaking objects.
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to