ycr-oss opened a new pull request #276:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/276


   ### What is this PR for?
   When upgrading YuniKorn with `helm upgrade`, the admission controller is 
deleted, an example of what happens:
   ```
   $ k get po -w
   NAME                                             READY   STATUS    RESTARTS  
 AGE
   yunikorn-admission-controller-6f6b9b8dff-4pqv6   1/1     Running   0         
 2d5h
   yunikorn-scheduler-64f8746476-4mqjl              2/2     Running   0         
 2d5h
   yunikorn-scheduler-dcdd789b4-rs4km               0/2     Pending   0         
 0s
   yunikorn-scheduler-dcdd789b4-rs4km               0/2     Pending   0         
 0s
   yunikorn-scheduler-dcdd789b4-rs4km               0/2     ContainerCreating   
0          0s
   yunikorn-scheduler-dcdd789b4-rs4km               2/2     Running             
0          6s
   yunikorn-scheduler-64f8746476-4mqjl              2/2     Terminating         
0          2d5h
   yunikorn-admission-controller-6f6b9b8dff-4pqv6   1/1     Terminating         
0          2d5h
   yunikorn-admission-controller-6f6b9b8dff-4pqv6   0/1     Terminating         
0          2d5h
   yunikorn-admission-controller-6f6b9b8dff-4pqv6   0/1     Terminating         
0          2d5h
   yunikorn-admission-controller-6f6b9b8dff-4pqv6   0/1     Terminating         
0          2d5h
   yunikorn-scheduler-64f8746476-4mqjl              0/2     Terminating         
0          2d5h
   yunikorn-scheduler-64f8746476-4mqjl              0/2     Terminating         
0          2d5h
   yunikorn-scheduler-64f8746476-4mqjl              0/2     Terminating         
0          2d5h
   ```
   As shown above, when the upgrade is complete, only the scheduler pod remains 
but the admission controller pod is gone. This is due to the postStop hook 
deleting the admission controller. This PR added a simple check in this 
deletion step: if the scheduler is still healthy and running, don't delete.
   
   Note that after this change, commands like
   ```
   kubectl scale deployment yunikorn-scheduler --replicas=[0|1]
   ```
   still work exactly as before
   
   ### What type of PR is it?
   * [ ] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/YUNIKORN-696
   
   ### How should this be tested?
   Install YuniKorn, then upgrade it with `helm upgrade`, check the right pods 
are running.
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to