Re: Evacuation of pods and scheduling

Skarbek, John Thu, 09 Jun 2016 04:49:22 -0700

Good Morning,

I’ve continued some research into how this probably could’ve happened but I’m 
still left with one remaining question.


What I can’t seem to find, is information about how the replication controller 
and the evacuate command interact. If I mimic what the evac command does via 
this awesome bash line:

oadm manage-node node-002.ose.bld.f4tech.com --list-pods -o json | tail -n +4 | 
jq '.items[].metadata.name' | xargs oc delete pod


I’m able to recreate the problem. This makes me think that when a lot of 
commands are executed, the replication controller is not able to keep up with 
the needs of the application. Something I found in the events log during this 
scenario is a little nerving.

7:35:23 AM  sample-jvm-app-30-xunju Pod Normal  Scheduled   Successfully 
assigned sample-jvm-app-30-xunju to node-001.ose.bld.f4tech.com
7:35:23 AM  sample-jvm-app-30-g4drx Pod Normal  Scheduled   Successfully 
assigned sample-jvm-app-30-g4drx to node-003.ose.bld.f4tech.com
7:35:22 AM  sample-jvm-app-30-362hb Pod Normal  Scheduled   Successfully 
assigned sample-jvm-app-30-362hb to node-003.ose.bld.f4tech.com
7:35:19 AM  sample-jvm-app-30-qn5nt Pod Normal  Killing     Killing container 
with docker id 99a673abe7e3: Need to kill pod.
7:35:19 AM  sample-jvm-app-30-xo9w6 Pod Normal  Killing     Killing container 
with docker id 33c23ef1e7ac: Need to kill pod.
7:35:19 AM  sample-jvm-app-30-pcxlr Pod Normal  Killing     Killing container 
with docker id f1b3ce10a5c1: Need to kill pod.
7:34:22 AM  sample-jvm-app-30-362hb Pod Warning Failed scheduling   node 
'node-002.ose.bld.f4tech.com' is not in cache
7 times in the last 2 minutes
7:34:22 AM  sample-jvm-app-30-xunju Pod Warning Failed scheduling   node 
'node-002.ose.bld.f4tech.com' is not in cache
7 times in the last 2 minutes
7:34:22 AM  sample-jvm-app-30-g4drx Pod Warning Failed scheduling   node 
'node-002.ose.bld.f4tech.com' is not in cache
7 times in the last 2 minutes


As seen from the above, the newly created pods appear first to be wanted to be 
placed on node–002, but node–002 is not found in the cache, which suggests he’s 
failing to pass through the predicate search of available nodes. Which should 
be understandable as he’s been marked unschedulable. What I don’t understand is 
that during this period of time, node–001 and node–003 are available and more 
than willing to accept these pods. I ponder if the replication controller 
doesn’t have updated information regarding the availability of nodes until 
after the pods are finally killed off.

I’m still researching how I can prevent all three pods from ending up on a 
single node.


--
John Skarbek


On June 7, 2016 at 16:05:12, Skarbek, John 
([email protected]<mailto:[email protected]>) wrote:

Good Morning,

I’d like to ask a question regarding the use of evacuating pods and how 
openshift/kubernetes schedules the replacement.

We have 3 nodes configured to run applications, and we went through a cycle of 
applying patches. So we’ve created an ansible playbook that goes through, 
evacuates the pods and restarts that node, one node at a time.

Prior to starting, we had an application running 3 pods, one one each node. 
When node1 was forced to evac the pods, kubernetes scheduled the replacement 
pod on node3. Node2 was next in line, when ansible forced the evac of pods, the 
final pod was placed on node3. So at this point, all pods were on the same 
physical node.

When ansible forced the evac of pods on node3, I then had an outage. The three 
pods were put in a “terminating” state, while 3 others were in a “pending” 
state. It took approximately 30 seconds to terminate the pods. The new 
‘pending’ pods sat pending for about 65 seconds, after which they were finally 
scheduled on nodes 1 and 2 and X time to start the containers.

Is this expected behavior? I was hoping that the replication controller woud 
recognize this behavior a bit better for scheduling nodes to ensure pods don’t 
get shifted to the same physical box when there’s two boxes available. I’m also 
hoping that before pods are term’ed, replacements are brought online.


--
John Skarbek

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Evacuation of pods and scheduling

Reply via email to