Re: Node report OK but every pod marked unready

Andrew Lau Thu, 20 Apr 2017 16:37:33 -0700

Thanks! Hopefully we don't hit this too much until 1.5.0 is released

On Fri, 21 Apr 2017 at 01:26 Patrick Tescher <[email protected]>
wrote:


> We upgraded to 1.5.0 and that error went away.
>
> --
> Patrick Tescher
>
> On Apr 19, 2017, at 10:59 PM, Andrew Lau <[email protected]> wrote:
>
> thin_ls has been happening for quite some time
> https://github.com/openshift/origin/issues/10940
>
> On Thu, 20 Apr 2017 at 15:55 Tero Ahonen <[email protected]> wrote:
>
>> It seems that error is related to docker storage on that vm
>>
>> .t
>>
>> Sent from my iPhone
>>
>> On 20 Apr 2017, at 8.53, Andrew Lau <[email protected]> wrote:
>>
>> Unfortunately I did not. I dumped the logs and just removed the node in
>> order to quickly restore the current containers on another node.
>>
>> At the exact time it failed I saw a lot of the following:
>>
>> ===
>> thin_pool_watcher.go:72] encountered error refreshing thin pool watcher:
>> error performing thin_ls on metadata device
>> /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls
>> --no-headers -m -o DEV,
>> EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127
>>
>> failed (failure): rpc error: code = 2 desc = shim error: context deadline
>> exceeded#015
>>
>> Error running exec in container: rpc error: code = 2 desc = shim error:
>> context deadline exceeded
>> ===
>>
>> Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212
>>
>>
>> On Thu, 20 Apr 2017 at 15:41 Tero Ahonen <[email protected]> wrote:
>>
>>> Hi
>>>
>>> Did u try to ssh to that node and execute sudo docker run to some
>>> container?
>>>
>>> .t
>>>
>>> Sent from my iPhone
>>>
>>> > On 20 Apr 2017, at 8.18, Andrew Lau <[email protected]> wrote:
>>> >
>>> > I'm trying to debug a weird scenario where a node has had every pod
>>> crash with the error:
>>> > "rpc error: code = 2 desc = shim error: context deadline exceeded"
>>> >
>>> > The pods stayed in the state Ready 0/1
>>> > The docker daemon was responding and the kublet and all it's services
>>> were running. The node was reporting with the OK status.
>>> >
>>> > No resource limits were hit with CPU almost idle and memory at 25%
>>> utilisation.
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > [email protected]
>>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>
>> _______________________________________________
> dev mailing list
> [email protected]
> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Node report OK but every pod marked unready

Reply via email to