Thanks! Hopefully we don't hit this too much until 1.5.0 is released On Fri, 21 Apr 2017 at 01:26 Patrick Tescher <[email protected]> wrote:
> We upgraded to 1.5.0 and that error went away. > > -- > Patrick Tescher > > On Apr 19, 2017, at 10:59 PM, Andrew Lau <[email protected]> wrote: > > thin_ls has been happening for quite some time > https://github.com/openshift/origin/issues/10940 > > On Thu, 20 Apr 2017 at 15:55 Tero Ahonen <[email protected]> wrote: > >> It seems that error is related to docker storage on that vm >> >> .t >> >> Sent from my iPhone >> >> On 20 Apr 2017, at 8.53, Andrew Lau <[email protected]> wrote: >> >> Unfortunately I did not. I dumped the logs and just removed the node in >> order to quickly restore the current containers on another node. >> >> At the exact time it failed I saw a lot of the following: >> >> === >> thin_pool_watcher.go:72] encountered error refreshing thin pool watcher: >> error performing thin_ls on metadata device >> /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls >> --no-headers -m -o DEV, >> EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127 >> >> failed (failure): rpc error: code = 2 desc = shim error: context deadline >> exceeded#015 >> >> Error running exec in container: rpc error: code = 2 desc = shim error: >> context deadline exceeded >> === >> >> Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212 >> >> >> On Thu, 20 Apr 2017 at 15:41 Tero Ahonen <[email protected]> wrote: >> >>> Hi >>> >>> Did u try to ssh to that node and execute sudo docker run to some >>> container? >>> >>> .t >>> >>> Sent from my iPhone >>> >>> > On 20 Apr 2017, at 8.18, Andrew Lau <[email protected]> wrote: >>> > >>> > I'm trying to debug a weird scenario where a node has had every pod >>> crash with the error: >>> > "rpc error: code = 2 desc = shim error: context deadline exceeded" >>> > >>> > The pods stayed in the state Ready 0/1 >>> > The docker daemon was responding and the kublet and all it's services >>> were running. The node was reporting with the OK status. >>> > >>> > No resource limits were hit with CPU almost idle and memory at 25% >>> utilisation. >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > users mailing list >>> > [email protected] >>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>> >> _______________________________________________ > dev mailing list > [email protected] > http://lists.openshift.redhat.com/openshiftmm/listinfo/dev > >
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
