Re: Evacuation of pods and scheduling

Clayton Coleman Thu, 09 Jun 2016 07:09:36 -0700

I wonder if this is related to the cache issue abhisheck was working on
fixing.


On Jun 9, 2016, at 7:47 AM, Skarbek, John <[email protected]> wrote:

Good Morning,

I’ve continued some research into how this probably could’ve happened but
I’m still left with one remaining question.

What I can’t seem to find, is information about how the replication
controller and the evacuate command interact. If I mimic what the evac
command does via this awesome bash line:

oadm manage-node node-002.ose.bld.f4tech.com --list-pods -o json |
tail -n +4 | jq '.items[].metadata.name' | xargs oc delete pod

I’m able to recreate the problem. This makes me think that when a lot of
commands are executed, the replication controller is not able to keep up
with the needs of the application. Something I found in the events log
during this scenario is a little nerving.

7:35:23 AM  sample-jvm-app-30-xunju Pod Normal  Scheduled
Successfully assigned sample-jvm-app-30-xunju to
node-001.ose.bld.f4tech.com
7:35:23 AM  sample-jvm-app-30-g4drx Pod Normal  Scheduled
Successfully assigned sample-jvm-app-30-g4drx to
node-003.ose.bld.f4tech.com
7:35:22 AM  sample-jvm-app-30-362hb Pod Normal  Scheduled
Successfully assigned sample-jvm-app-30-362hb to
node-003.ose.bld.f4tech.com
7:35:19 AM  sample-jvm-app-30-qn5nt Pod Normal  Killing     Killing
container with docker id 99a673abe7e3: Need to kill pod.
7:35:19 AM  sample-jvm-app-30-xo9w6 Pod Normal  Killing     Killing
container with docker id 33c23ef1e7ac: Need to kill pod.
7:35:19 AM  sample-jvm-app-30-pcxlr Pod Normal  Killing     Killing
container with docker id f1b3ce10a5c1: Need to kill pod.
7:34:22 AM  sample-jvm-app-30-362hb Pod Warning Failed scheduling
node 'node-002.ose.bld.f4tech.com' is not in cache
7 times in the last 2 minutes
7:34:22 AM  sample-jvm-app-30-xunju Pod Warning Failed scheduling
node 'node-002.ose.bld.f4tech.com' is not in cache
7 times in the last 2 minutes
7:34:22 AM  sample-jvm-app-30-g4drx Pod Warning Failed scheduling
node 'node-002.ose.bld.f4tech.com' is not in cache
7 times in the last 2 minutes

As seen from the above, the newly created pods appear first to be wanted to
be placed on node–002, but node–002 is not found in the cache, which
suggests he’s failing to pass through the predicate search of available
nodes. Which should be understandable as he’s been marked unschedulable.
What I don’t understand is that during this period of time, node–001 and
node–003 are available and more than willing to accept these pods. I ponder
if the replication controller doesn’t have updated information regarding
the availability of nodes until after the pods are finally killed off.

I’m still researching how I can prevent all three pods from ending up on a
single node.



-- 
John Skarbek

On June 7, 2016 at 16:05:12, Skarbek, John ([email protected]) wrote:

Good Morning,

I’d like to ask a question regarding the use of evacuating pods and how
openshift/kubernetes schedules the replacement.

We have 3 nodes configured to run applications, and we went through a cycle
of applying patches. So we’ve created an ansible playbook that goes
through, evacuates the pods and restarts that node, one node at a time.

Prior to starting, we had an application running 3 pods, one one each node.
When node1 was forced to evac the pods, kubernetes scheduled the
replacement pod on node3. Node2 was next in line, when ansible forced the
evac of pods, the final pod was placed on node3. So at this point, all pods
were on the same physical node.

When ansible forced the evac of pods on node3, I then had an outage. The
three pods were put in a “terminating” state, while 3 others were in a
“pending” state. It took approximately 30 seconds to terminate the pods.
The new ‘pending’ pods sat pending for about 65 seconds, after which they
were finally scheduled on nodes 1 and 2 and X time to start the containers.

Is this expected behavior? I was hoping that the replication controller
woud recognize this behavior a bit better for scheduling nodes to ensure
pods don’t get shifted to the same physical box when there’s two boxes
available. I’m also hoping that before pods are term’ed, replacements are
brought online.


-- 
John Skarbek

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Evacuation of pods and scheduling

Reply via email to