Re: [ceph-users] Not timing out watcher

2017-12-21 Thread Ilya Dryomov
On Thu, Dec 21, 2017 at 3:04 PM, Serguei Bezverkhi (sbezverk) wrote: > Hi Ilya, > > Here you go, no k8s services running this time: > > sbezverk@kube-4:~$ sudo rbd map raw-volume --pool kubernetes --id admin -m > 192.168.80.233 --key=AQCeHO1ZILPPDRAA7zw3d76bplkvTwzoosybvA==

Re: [ceph-users] Not timing out watcher

2017-12-21 Thread Serguei Bezverkhi (sbezverk)
Hi Ilya, Here you go, no k8s services running this time: sbezverk@kube-4:~$ sudo rbd map raw-volume --pool kubernetes --id admin -m 192.168.80.233 --key=AQCeHO1ZILPPDRAA7zw3d76bplkvTwzoosybvA== /dev/rbd0 sbezverk@kube-4:~$ sudo rbd status raw-volume --pool kubernetes --id admin -m

Re: [ceph-users] Not timing out watcher

2017-12-21 Thread Ilya Dryomov
On Wed, Dec 20, 2017 at 6:20 PM, Serguei Bezverkhi (sbezverk) wrote: > It took 30 minutes for the Watcher to time out after ungraceful restart. Is > there a way limit it to something a bit more reasonable? Like 1-3 minutes? > > On 2017-12-20, 12:01 PM, "Serguei Bezverkhi

Re: [ceph-users] Not timing out watcher

2017-12-21 Thread Ilya Dryomov
On Wed, Dec 20, 2017 at 6:56 PM, Jason Dillaman wrote: > ... looks like this watch "timeout" was introduced in the kraken > release [1] so if you don't see this issue with a Jewel cluster, I > suspect that's the cause. > > [1] https://github.com/ceph/ceph/pull/11378 Strictly

Re: [ceph-users] Not timing out watcher

2017-12-20 Thread Serguei Bezverkhi (sbezverk)
On 2017-12-20, 11:17 AM, "Jason Dillaman" wrote: On Wed, Dec 20, 2017 at 11:01 AM, Serguei Bezverkhi (sbezverk) wrote: > Hello Jason, thank you for your prompt reply. > > My setup is very simple, I have 1 Centos 7.4 VM which is a

Re: [ceph-users] Not timing out watcher

2017-12-20 Thread Jason Dillaman
... looks like this watch "timeout" was introduced in the kraken release [1] so if you don't see this issue with a Jewel cluster, I suspect that's the cause. [1] https://github.com/ceph/ceph/pull/11378 On Wed, Dec 20, 2017 at 12:53 PM, Jason Dillaman wrote: > The OSDs will

Re: [ceph-users] Not timing out watcher

2017-12-20 Thread Jason Dillaman
The OSDs will optionally take a "timeout" parameter on the watch request [1][2]. However, the kernel doesn't have this timeout field in its watch op [3] so perhaps it's defaulting to a random value. Ilya? [1] https://github.com/ceph/ceph/blob/v12.2.2/src/osd/PrimaryLogPG.cc#L6034 [2]

Re: [ceph-users] Not timing out watcher

2017-12-20 Thread Serguei Bezverkhi (sbezverk)
It took 30 minutes for the Watcher to time out after ungraceful restart. Is there a way limit it to something a bit more reasonable? Like 1-3 minutes? On 2017-12-20, 12:01 PM, "Serguei Bezverkhi (sbezverk)" wrote: Ok, here is what I found out. If I gracefully kill a

Re: [ceph-users] Not timing out watcher

2017-12-20 Thread Serguei Bezverkhi (sbezverk)
Ok, here is what I found out. If I gracefully kill a pod then watcher gets properly cleared, but if it is done ungracefully, without “rbd unmap” then even after a node reboot Watcher stays up for a long time, it has been more than 20 minutes and it is still active (no any kubernetes services

Re: [ceph-users] Not timing out watcher

2017-12-20 Thread Jason Dillaman
On Wed, Dec 20, 2017 at 11:01 AM, Serguei Bezverkhi (sbezverk) wrote: > Hello Jason, thank you for your prompt reply. > > My setup is very simple, I have 1 Centos 7.4 VM which is a storage node which > is running latest 12.2.2 Luminous and 2nd VM is Ubuntu 16.04.3

Re: [ceph-users] Not timing out watcher

2017-12-20 Thread Serguei Bezverkhi (sbezverk)
Hello Jason, thank you for your prompt reply. My setup is very simple, I have 1 Centos 7.4 VM which is a storage node which is running latest 12.2.2 Luminous and 2nd VM is Ubuntu 16.04.3 192.168.80.235 where I run local kubernetes cluster based on the master. On client side I have ceph-common

Re: [ceph-users] Not timing out watcher

2017-12-20 Thread Jason Dillaman
Can you please provide steps to repeat this scenario? What is/was the client running on the host at 192.168.80.235 and how did you shut down that client? In your PR [1], it showed a different client as a watcher ("192.168.80.235:0/34739158 client.64354 cookie=1"), so how did the previous entry get

[ceph-users] Not timing out watcher

2017-12-20 Thread Serguei Bezverkhi (sbezverk)
Hello, I hit an issue with latest Luminous when a Watcher is not timing out when the image is not mapped. It seems something similar was reported in 2016, here is the link: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-August/012140.html Has it been fixed? Appreciate some help here.