Re: Node report OK but every pod marked unready
Thanks! Hopefully we don't hit this too much until 1.5.0 is released On Fri, 21 Apr 2017 at 01:26 Patrick Tescherwrote: > We upgraded to 1.5.0 and that error went away. > > -- > Patrick Tescher > > On Apr 19, 2017, at 10:59 PM, Andrew Lau wrote: > > thin_ls has been happening for quite some time > https://github.com/openshift/origin/issues/10940 > > On Thu, 20 Apr 2017 at 15:55 Tero Ahonen wrote: > >> It seems that error is related to docker storage on that vm >> >> .t >> >> Sent from my iPhone >> >> On 20 Apr 2017, at 8.53, Andrew Lau wrote: >> >> Unfortunately I did not. I dumped the logs and just removed the node in >> order to quickly restore the current containers on another node. >> >> At the exact time it failed I saw a lot of the following: >> >> === >> thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: >> error performing thin_ls on metadata device >> /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls >> --no-headers -m -o DEV, >> EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127 >> >> failed (failure): rpc error: code = 2 desc = shim error: context deadline >> exceeded#015 >> >> Error running exec in container: rpc error: code = 2 desc = shim error: >> context deadline exceeded >> === >> >> Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212 >> >> >> On Thu, 20 Apr 2017 at 15:41 Tero Ahonen wrote: >> >>> Hi >>> >>> Did u try to ssh to that node and execute sudo docker run to some >>> container? >>> >>> .t >>> >>> Sent from my iPhone >>> >>> > On 20 Apr 2017, at 8.18, Andrew Lau wrote: >>> > >>> > I'm trying to debug a weird scenario where a node has had every pod >>> crash with the error: >>> > "rpc error: code = 2 desc = shim error: context deadline exceeded" >>> > >>> > The pods stayed in the state Ready 0/1 >>> > The docker daemon was responding and the kublet and all it's services >>> were running. The node was reporting with the OK status. >>> > >>> > No resource limits were hit with CPU almost idle and memory at 25% >>> utilisation. >>> > >>> > >>> > >>> > >>> > ___ >>> > users mailing list >>> > us...@lists.openshift.redhat.com >>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>> >> ___ > dev mailing list > dev@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/dev > > ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
Re: Node report OK but every pod marked unready
We upgraded to 1.5.0 and that error went away. -- Patrick Tescher > On Apr 19, 2017, at 10:59 PM, Andrew Lauwrote: > > thin_ls has been happening for quite some time > https://github.com/openshift/origin/issues/10940 > >> On Thu, 20 Apr 2017 at 15:55 Tero Ahonen wrote: >> It seems that error is related to docker storage on that vm >> >> .t >> >> Sent from my iPhone >> >>> On 20 Apr 2017, at 8.53, Andrew Lau wrote: >>> >>> Unfortunately I did not. I dumped the logs and just removed the node in >>> order to quickly restore the current containers on another node. >>> >>> At the exact time it failed I saw a lot of the following: >>> >>> === >>> thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: >>> error performing thin_ls on metadata device >>> /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls >>> --no-headers -m -o DEV, >>> EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127 >>> >>> failed (failure): rpc error: code = 2 desc = shim error: context deadline >>> exceeded#015 >>> >>> Error running exec in container: rpc error: code = 2 desc = shim error: >>> context deadline exceeded >>> === >>> >>> Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212 >>> >>> On Thu, 20 Apr 2017 at 15:41 Tero Ahonen wrote: Hi Did u try to ssh to that node and execute sudo docker run to some container? .t Sent from my iPhone > On 20 Apr 2017, at 8.18, Andrew Lau wrote: > > I'm trying to debug a weird scenario where a node has had every pod > crash with the error: > "rpc error: code = 2 desc = shim error: context deadline exceeded" > > The pods stayed in the state Ready 0/1 > The docker daemon was responding and the kublet and all it's services > were running. The node was reporting with the OK status. > > No resource limits were hit with CPU almost idle and memory at 25% > utilisation. > > > > > ___ > users mailing list > us...@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > ___ > dev mailing list > dev@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/dev ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
Re: Node report OK but every pod marked unready
thin_ls has been happening for quite some time https://github.com/openshift/origin/issues/10940 On Thu, 20 Apr 2017 at 15:55 Tero Ahonenwrote: > It seems that error is related to docker storage on that vm > > .t > > Sent from my iPhone > > On 20 Apr 2017, at 8.53, Andrew Lau wrote: > > Unfortunately I did not. I dumped the logs and just removed the node in > order to quickly restore the current containers on another node. > > At the exact time it failed I saw a lot of the following: > > === > thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: > error performing thin_ls on metadata device > /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls > --no-headers -m -o DEV, > EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127 > > failed (failure): rpc error: code = 2 desc = shim error: context deadline > exceeded#015 > > Error running exec in container: rpc error: code = 2 desc = shim error: > context deadline exceeded > === > > Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212 > > > On Thu, 20 Apr 2017 at 15:41 Tero Ahonen wrote: > >> Hi >> >> Did u try to ssh to that node and execute sudo docker run to some >> container? >> >> .t >> >> Sent from my iPhone >> >> > On 20 Apr 2017, at 8.18, Andrew Lau wrote: >> > >> > I'm trying to debug a weird scenario where a node has had every pod >> crash with the error: >> > "rpc error: code = 2 desc = shim error: context deadline exceeded" >> > >> > The pods stayed in the state Ready 0/1 >> > The docker daemon was responding and the kublet and all it's services >> were running. The node was reporting with the OK status. >> > >> > No resource limits were hit with CPU almost idle and memory at 25% >> utilisation. >> > >> > >> > >> > >> > ___ >> > users mailing list >> > us...@lists.openshift.redhat.com >> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> > ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
Re: Node report OK but every pod marked unready
Unfortunately I did not. I dumped the logs and just removed the node in order to quickly restore the current containers on another node. At the exact time it failed I saw a lot of the following: === thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: error performing thin_ls on metadata device /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls --no-headers -m -o DEV, EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127 failed (failure): rpc error: code = 2 desc = shim error: context deadline exceeded#015 Error running exec in container: rpc error: code = 2 desc = shim error: context deadline exceeded === Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212 On Thu, 20 Apr 2017 at 15:41 Tero Ahonenwrote: > Hi > > Did u try to ssh to that node and execute sudo docker run to some > container? > > .t > > Sent from my iPhone > > > On 20 Apr 2017, at 8.18, Andrew Lau wrote: > > > > I'm trying to debug a weird scenario where a node has had every pod > crash with the error: > > "rpc error: code = 2 desc = shim error: context deadline exceeded" > > > > The pods stayed in the state Ready 0/1 > > The docker daemon was responding and the kublet and all it's services > were running. The node was reporting with the OK status. > > > > No resource limits were hit with CPU almost idle and memory at 25% > utilisation. > > > > > > > > > > ___ > > users mailing list > > us...@lists.openshift.redhat.com > > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
Node report OK but every pod marked unready
I'm trying to debug a weird scenario where a node has had every pod crash with the error: "rpc error: code = 2 desc = shim error: context deadline exceeded" The pods stayed in the state Ready 0/1 The docker daemon was responding and the kublet and all it's services were running. The node was reporting with the OK status. No resource limits were hit with CPU almost idle and memory at 25% utilisation. ___ dev mailing list dev@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/dev