yangwwei opened a new pull request, #411:
URL: https://github.com/apache/yunikorn-core/pull/411

   With the latest 1.0 version, we observed that occasionally scheduler crashed 
and restarted. This was due to the health checker having unsafe access to the 
application map of the partition context, a concurrent read/write error may 
occur when an app gets added or deleted in/from the partition at the same time. 
We need to use a read lock to access the map.
   
   ### What is this PR for?
   A few sentences describing the overall goals of the pull request's commits.
   First time? Check out the contributing guide - 
http://yunikorn.apache.org/community/how_to_contribute   
   
   
   ### What type of PR is it?
   * [v] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### What is the Jira issue?
   * https://issues.apache.org/jira/browse/YUNIKORN-1218
   
   ### How should this be tested?
   A patch to reproduce this issue has been attached in JIRA 
https://issues.apache.org/jira/browse/YUNIKORN-1218. Without the patch, it 
panics after running a while; with the patch, the crash won't happen. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to