Re: Node report OK but every pod marked unready

2017-04-20 Thread Andrew Lau
Thanks! Hopefully we don't hit this too much until 1.5.0 is released

On Fri, 21 Apr 2017 at 01:26 Patrick Tescher 
wrote:

> We upgraded to 1.5.0 and that error went away.
>
> --
> Patrick Tescher
>
> On Apr 19, 2017, at 10:59 PM, Andrew Lau  wrote:
>
> thin_ls has been happening for quite some time
> https://github.com/openshift/origin/issues/10940
>
> On Thu, 20 Apr 2017 at 15:55 Tero Ahonen  wrote:
>
>> It seems that error is related to docker storage on that vm
>>
>> .t
>>
>> Sent from my iPhone
>>
>> On 20 Apr 2017, at 8.53, Andrew Lau  wrote:
>>
>> Unfortunately I did not. I dumped the logs and just removed the node in
>> order to quickly restore the current containers on another node.
>>
>> At the exact time it failed I saw a lot of the following:
>>
>> ===
>> thin_pool_watcher.go:72] encountered error refreshing thin pool watcher:
>> error performing thin_ls on metadata device
>> /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls
>> --no-headers -m -o DEV,
>> EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127
>>
>> failed (failure): rpc error: code = 2 desc = shim error: context deadline
>> exceeded#015
>>
>> Error running exec in container: rpc error: code = 2 desc = shim error:
>> context deadline exceeded
>> ===
>>
>> Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212
>>
>>
>> On Thu, 20 Apr 2017 at 15:41 Tero Ahonen  wrote:
>>
>>> Hi
>>>
>>> Did u try to ssh to that node and execute sudo docker run to some
>>> container?
>>>
>>> .t
>>>
>>> Sent from my iPhone
>>>
>>> > On 20 Apr 2017, at 8.18, Andrew Lau  wrote:
>>> >
>>> > I'm trying to debug a weird scenario where a node has had every pod
>>> crash with the error:
>>> > "rpc error: code = 2 desc = shim error: context deadline exceeded"
>>> >
>>> > The pods stayed in the state Ready 0/1
>>> > The docker daemon was responding and the kublet and all it's services
>>> were running. The node was reporting with the OK status.
>>> >
>>> > No resource limits were hit with CPU almost idle and memory at 25%
>>> utilisation.
>>> >
>>> >
>>> >
>>> >
>>> > ___
>>> > users mailing list
>>> > us...@lists.openshift.redhat.com
>>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>
>> ___
> dev mailing list
> dev@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Node report OK but every pod marked unready

2017-04-20 Thread Patrick Tescher
We upgraded to 1.5.0 and that error went away. 

--
Patrick Tescher

> On Apr 19, 2017, at 10:59 PM, Andrew Lau  wrote:
> 
> thin_ls has been happening for quite some time 
> https://github.com/openshift/origin/issues/10940
> 
>> On Thu, 20 Apr 2017 at 15:55 Tero Ahonen  wrote:
>> It seems that error is related to docker storage on that vm
>> 
>> .t
>> 
>> Sent from my iPhone
>> 
>>> On 20 Apr 2017, at 8.53, Andrew Lau  wrote:
>>> 
>>> Unfortunately I did not. I dumped the logs and just removed the node in 
>>> order to quickly restore the current containers on another node.
>>> 
>>> At the exact time it failed I saw a lot of the following:
>>> 
>>> ===
>>> thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: 
>>> error performing thin_ls on metadata device 
>>> /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls 
>>> --no-headers -m -o DEV,
>>> EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127
>>> 
>>> failed (failure): rpc error: code = 2 desc = shim error: context deadline 
>>> exceeded#015
>>> 
>>> Error running exec in container: rpc error: code = 2 desc = shim error: 
>>> context deadline exceeded
>>> ===
>>> 
>>> Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212
>>> 
>>> 
 On Thu, 20 Apr 2017 at 15:41 Tero Ahonen  wrote:
 Hi
 
 Did u try to ssh to that node and execute sudo docker run to some 
 container?
 
 .t
 
 Sent from my iPhone
 
 > On 20 Apr 2017, at 8.18, Andrew Lau  wrote:
 >
 > I'm trying to debug a weird scenario where a node has had every pod 
 > crash with the error:
 > "rpc error: code = 2 desc = shim error: context deadline exceeded"
 >
 > The pods stayed in the state Ready 0/1
 > The docker daemon was responding and the kublet and all it's services 
 > were running. The node was reporting with the OK status.
 >
 > No resource limits were hit with CPU almost idle and memory at 25% 
 > utilisation.
 >
 >
 >
 >
 > ___
 > users mailing list
 > us...@lists.openshift.redhat.com
 > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> ___
> dev mailing list
> dev@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Node report OK but every pod marked unready

2017-04-20 Thread Andrew Lau
thin_ls has been happening for quite some time
https://github.com/openshift/origin/issues/10940

On Thu, 20 Apr 2017 at 15:55 Tero Ahonen  wrote:

> It seems that error is related to docker storage on that vm
>
> .t
>
> Sent from my iPhone
>
> On 20 Apr 2017, at 8.53, Andrew Lau  wrote:
>
> Unfortunately I did not. I dumped the logs and just removed the node in
> order to quickly restore the current containers on another node.
>
> At the exact time it failed I saw a lot of the following:
>
> ===
> thin_pool_watcher.go:72] encountered error refreshing thin pool watcher:
> error performing thin_ls on metadata device
> /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls
> --no-headers -m -o DEV,
> EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127
>
> failed (failure): rpc error: code = 2 desc = shim error: context deadline
> exceeded#015
>
> Error running exec in container: rpc error: code = 2 desc = shim error:
> context deadline exceeded
> ===
>
> Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212
>
>
> On Thu, 20 Apr 2017 at 15:41 Tero Ahonen  wrote:
>
>> Hi
>>
>> Did u try to ssh to that node and execute sudo docker run to some
>> container?
>>
>> .t
>>
>> Sent from my iPhone
>>
>> > On 20 Apr 2017, at 8.18, Andrew Lau  wrote:
>> >
>> > I'm trying to debug a weird scenario where a node has had every pod
>> crash with the error:
>> > "rpc error: code = 2 desc = shim error: context deadline exceeded"
>> >
>> > The pods stayed in the state Ready 0/1
>> > The docker daemon was responding and the kublet and all it's services
>> were running. The node was reporting with the OK status.
>> >
>> > No resource limits were hit with CPU almost idle and memory at 25%
>> utilisation.
>> >
>> >
>> >
>> >
>> > ___
>> > users mailing list
>> > us...@lists.openshift.redhat.com
>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Node report OK but every pod marked unready

2017-04-19 Thread Andrew Lau
Unfortunately I did not. I dumped the logs and just removed the node in
order to quickly restore the current containers on another node.

At the exact time it failed I saw a lot of the following:

===
thin_pool_watcher.go:72] encountered error refreshing thin pool watcher:
error performing thin_ls on metadata device
/dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls
--no-headers -m -o DEV,
EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127

failed (failure): rpc error: code = 2 desc = shim error: context deadline
exceeded#015

Error running exec in container: rpc error: code = 2 desc = shim error:
context deadline exceeded
===

Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212


On Thu, 20 Apr 2017 at 15:41 Tero Ahonen  wrote:

> Hi
>
> Did u try to ssh to that node and execute sudo docker run to some
> container?
>
> .t
>
> Sent from my iPhone
>
> > On 20 Apr 2017, at 8.18, Andrew Lau  wrote:
> >
> > I'm trying to debug a weird scenario where a node has had every pod
> crash with the error:
> > "rpc error: code = 2 desc = shim error: context deadline exceeded"
> >
> > The pods stayed in the state Ready 0/1
> > The docker daemon was responding and the kublet and all it's services
> were running. The node was reporting with the OK status.
> >
> > No resource limits were hit with CPU almost idle and memory at 25%
> utilisation.
> >
> >
> >
> >
> > ___
> > users mailing list
> > us...@lists.openshift.redhat.com
> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Node report OK but every pod marked unready

2017-04-19 Thread Andrew Lau
I'm trying to debug a weird scenario where a node has had every pod crash
with the error:
"rpc error: code = 2 desc = shim error: context deadline exceeded"

The pods stayed in the state Ready 0/1
The docker daemon was responding and the kublet and all it's services were
running. The node was reporting with the OK status.

No resource limits were hit with CPU almost idle and memory at 25%
utilisation.
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev