Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-12-05 Thread Florian Haas
On 02/12/2019 16:48, Florian Haas wrote: > Doc patch PR is here, for anyone who would feels inclined to review: > > https://github.com/ceph/ceph/pull/31893 Landed, here's the new documentation: https://docs.ceph.com/docs/master/rbd/rbd-exclusive-locks/ Thanks everyone for chiming in, and

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-12-02 Thread Florian Haas
On 19/11/2019 22:42, Florian Haas wrote: > On 19/11/2019 22:34, Jason Dillaman wrote: >>> Oh totally, I wasn't arguing it was a bad idea for it to do what it >>> does! I just got confused by the fact that our mon logs showed what >>> looked like a (failed) attempt to blacklist an entire client IP

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Jason Dillaman
On Tue, Nov 19, 2019 at 4:42 PM Florian Haas wrote: > > On 19/11/2019 22:34, Jason Dillaman wrote: > >> Oh totally, I wasn't arguing it was a bad idea for it to do what it > >> does! I just got confused by the fact that our mon logs showed what > >> looked like a (failed) attempt to blacklist an

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Florian Haas
On 19/11/2019 22:34, Jason Dillaman wrote: >> Oh totally, I wasn't arguing it was a bad idea for it to do what it >> does! I just got confused by the fact that our mon logs showed what >> looked like a (failed) attempt to blacklist an entire client IP address. > > There should have been an

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Jason Dillaman
On Tue, Nov 19, 2019 at 4:31 PM Florian Haas wrote: > > On 19/11/2019 22:19, Jason Dillaman wrote: > > On Tue, Nov 19, 2019 at 4:09 PM Florian Haas wrote: > >> > >> On 19/11/2019 21:32, Jason Dillaman wrote: > What, exactly, is the "reasonably configured hypervisor" here, in other >

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Florian Haas
On 19/11/2019 22:19, Jason Dillaman wrote: > On Tue, Nov 19, 2019 at 4:09 PM Florian Haas wrote: >> >> On 19/11/2019 21:32, Jason Dillaman wrote: What, exactly, is the "reasonably configured hypervisor" here, in other words, what is it that grabs and releases this lock? It's evidently

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Jason Dillaman
On Tue, Nov 19, 2019 at 4:09 PM Florian Haas wrote: > > On 19/11/2019 21:32, Jason Dillaman wrote: > >> What, exactly, is the "reasonably configured hypervisor" here, in other > >> words, what is it that grabs and releases this lock? It's evidently not > >> Nova that does this, but is it libvirt,

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Florian Haas
On 19/11/2019 21:32, Jason Dillaman wrote: >> What, exactly, is the "reasonably configured hypervisor" here, in other >> words, what is it that grabs and releases this lock? It's evidently not >> Nova that does this, but is it libvirt, or Qemu/KVM, and if so, what >> magic in there makes this

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Jason Dillaman
On Tue, Nov 19, 2019 at 2:49 PM Florian Haas wrote: > > On 19/11/2019 20:03, Jason Dillaman wrote: > > On Tue, Nov 19, 2019 at 1:51 PM shubjero wrote: > >> > >> Florian, > >> > >> Thanks for posting about this issue. This is something that we have > >> been experiencing (stale exclusive locks)

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Florian Haas
On 19/11/2019 20:03, Jason Dillaman wrote: > On Tue, Nov 19, 2019 at 1:51 PM shubjero wrote: >> >> Florian, >> >> Thanks for posting about this issue. This is something that we have >> been experiencing (stale exclusive locks) with our OpenStack and Ceph >> cloud more frequently as our datacentre

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread Jason Dillaman
On Tue, Nov 19, 2019 at 1:51 PM shubjero wrote: > > Florian, > > Thanks for posting about this issue. This is something that we have > been experiencing (stale exclusive locks) with our OpenStack and Ceph > cloud more frequently as our datacentre has had some reliability > issues recently with

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-19 Thread shubjero
Florian, Thanks for posting about this issue. This is something that we have been experiencing (stale exclusive locks) with our OpenStack and Ceph cloud more frequently as our datacentre has had some reliability issues recently with power and cooling causing several unexpected shutdowns. At this

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Paul Emmerich
To clear up a few misconceptions here: * RBD keyrings should use the "profile rbd" permissions, everything else is *wrong* and should be fixed asap * Manually adding the blacklist permission might work but isn't future-proof, fix the keyring instead * The suggestion to mount them elsewhere to fix

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Joshua M. Boniface
Thanks Simon! I've implemented it, I guess I'll test it out next time my homelab's power dies :-) On 2019-11-15 10:54 a.m., Simon Ironside wrote: On 15/11/2019 15:44, Joshua M. Boniface wrote: Hey All: I've also quite frequently experienced this sort of issue with my Ceph RBD-backed

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Simon Ironside
On 15/11/2019 15:44, Joshua M. Boniface wrote: Hey All: I've also quite frequently experienced this sort of issue with my Ceph RBD-backed QEMU/KVM cluster (not OpenStack specifically). Should this workaround of allowing the 'osd blacklist' command in the caps help in that scenario as well, or

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Joshua M. Boniface
Hey All: I've also quite frequently experienced this sort of issue with my Ceph RBD-backed QEMU/KVM cluster (not OpenStack specifically). Should this workaround of allowing the 'osd blacklist' command in the caps help in that scenario as well, or is this an OpenStack-specific functionality?

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread EDH - Manuel Rios Fernandez
rs] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks Hi Florian, On 15/11/2019 12:32, Florian Haas wrote: > I received this off-list but then subsequently saw this message pop up > in the list archive, so I hope it's OK to reply on-list? Of course, I just clicked the wr

[ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Florian Haas
On 15/11/2019 14:27, Simon Ironside wrote: > Hi Florian, > > On 15/11/2019 12:32, Florian Haas wrote: > >> I received this off-list but then subsequently saw this message pop up >> in the list archive, so I hope it's OK to reply on-list? > > Of course, I just clicked the wrong reply button the

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Simon Ironside
Hi Florian, On 15/11/2019 12:32, Florian Haas wrote: I received this off-list but then subsequently saw this message pop up in the list archive, so I hope it's OK to reply on-list? Of course, I just clicked the wrong reply button the first time. So that cap was indeed missing, thanks for

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Florian Haas
On 15/11/2019 11:23, Simon Ironside wrote: > Hi Florian, > > Any chance the key your compute nodes are using for the RBD pool is > missing 'allow command "osd blacklist"' from its mon caps? > > Simon Hi Simon, I received this off-list but then subsequently saw this message pop up in the list

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Wido den Hollander
On 11/15/19 11:24 AM, Simon Ironside wrote: > Hi Florian, > > Any chance the key your compute nodes are using for the RBD pool is > missing 'allow command "osd blacklist"' from its mon caps? > Added to this I recommend to use the 'profile rbd' for the mon caps. As also stated in the

Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Simon Ironside
Hi Florian, Any chance the key your compute nodes are using for the RBD pool is missing 'allow command "osd blacklist"' from its mon caps? Simon On 15/11/2019 08:19, Florian Haas wrote: Hi everyone, I'm trying to wrap my head around an issue we recently saw, as it relates to RBD locks,

[ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Florian Haas
Hi everyone, I'm trying to wrap my head around an issue we recently saw, as it relates to RBD locks, Qemu/KVM, and libvirt. Our data center graced us with a sudden and complete dual-feed power failure that affected both a Ceph cluster (Luminous, 12.2.12), and OpenStack compute nodes that used