Re: [ceph-users] Decreasing the impact of reweighting osds

2019-10-25 Thread Robert LeBlanc
Yout can try adding osd op queue = wpq osd op queue cut off = high To all the osd ceph configs and restarting, That has made reweighting pretty painless for us. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Tue, Oct 22, 2019 at 8:36 PM Davi

Re: [ceph-users] Decreasing the impact of reweighting osds

2019-10-25 Thread Robert LeBlanc
You can try adding Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Tue, Oct 22, 2019 at 8:36 PM David Turner wrote: > > Most times you are better served with simpler settings like > osd_recovery_sleep, which has 3 variants if you have multip

Re: [ceph-users] Decreasing the impact of reweighting osds

2019-10-22 Thread David Turner
Most times you are better served with simpler settings like osd_recovery_sleep, which has 3 variants if you have multiple types of OSDs in your cluster (osd_recovery_sleep_hdd, osd_recovery_sleep_sdd, osd_recovery_sleep_hybrid). Using those you can tweak a specific type of OSD that might be having

Re: [ceph-users] Decreasing the impact of reweighting osds

2019-10-22 Thread Mark Kirkwood
Thanks - that's a good suggestion! However I'd still like to know the answers to my 2 questions. regards Mark On 22/10/19 11:22 pm, Paul Emmerich wrote: getting rid of filestore solves most latency spike issues during recovery because they are often caused by random XFS hangs (splitting dirs

Re: [ceph-users] Decreasing the impact of reweighting osds

2019-10-22 Thread Paul Emmerich
getting rid of filestore solves most latency spike issues during recovery because they are often caused by random XFS hangs (splitting dirs or just xfs having a bad day) Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h

[ceph-users] Decreasing the impact of reweighting osds

2019-10-21 Thread Mark Kirkwood
We recently needed to reweight a couple of OSDs on one of our clusters (luminous on Ubuntu,  8 hosts, 8 OSD/host). I (think) we reweighted by approx 0.2. This was perhaps too much, as IO latency on RBD drives spiked to several seconds at times. We'd like to lessen this effect as much as we can