Re: [ceph-users] Blacklisting during boot storm

2019-08-03 Thread Paul Emmerich
The usual reason for blacklisting RBD clients is breaking an exclusive lock because the previous owner seemed to have crashed. Blacklisting the old owner is necessary in case you had a network partition and not a crash. Note that this is entirely normal and no reason to worry. Paul -- Paul

[ceph-users] Blacklisting during boot storm

2019-08-03 Thread Kees Meijs
Hi list, Yesterday afternoon we experienced a compute node outage in our OpenStack (obviously Ceph backed) cluster. We tried to (re)start compute instances again as fast as possible, resulting in some KVM/RBD clients getting blacklisted. The problem was spotted very quickly so we could remove

Re: [ceph-users] How to add 100 new OSDs...

2019-08-03 Thread Robert LeBlanc
It does better because it is a fair share queue and doesn't let recovery ops take priority over client ops at any point for any time. It allows clients to have a much more predictable latency to the storage. Sent from a mobile device, please excuse any typos. On Sat, Aug 3, 2019, 1:10 PM Alex

[ceph-users] Bluestore caching oddities, again

2019-08-03 Thread Christian Balzer
Hello, preparing the first production bluestore, nautilus (latest) based cluster I've run into the same things other people and myself ran into before. Firstly HW, 3 nodes with 12 SATA HDDs each, IT mode LSI 3008, wal/db on 40GB SSD partitions. (boy do I hate the inability of ceph-volume to

Re: [ceph-users] Ceph Nautilus - can't balance due to degraded state

2019-08-03 Thread David Herselman
Hi, Problem: The balancer still refuses to run and distribution is not what it was on Luminous. I presume a flag somewhere wasn't unset whilst the autoscaling was reducing PGs? I wasted a lot of time this last week after having enabled pg_autoscale_mode on pools in a cluster. I had to

Re: [ceph-users] How to add 100 new OSDs...

2019-08-03 Thread Alex Gorbachev
On Fri, Aug 2, 2019 at 6:57 PM Robert LeBlanc wrote: > > On Fri, Jul 26, 2019 at 1:02 PM Peter Sabaini wrote: >> >> On 26.07.19 15:03, Stefan Kooman wrote: >> > Quoting Peter Sabaini (pe...@sabaini.at): >> >> What kind of commit/apply latency increases have you seen when adding a >> >> large