Re: [ceph-users] strange backfill delay after outing one node

2019-08-14 Thread Simon Oosthoek
On 14/08/2019 10:44, Wido den Hollander wrote:
> 
> 
> On 8/14/19 9:48 AM, Simon Oosthoek wrote:
>> Is it a good idea to give the above commands or other commands to speed
>> up the backfilling? (e.g. like increasing "osd max backfills")
>>
> 
> Yes, as right now the OSDs aren't doing that many backfills. You still
> have a large queue of PGs which need to be backfilled.
> 
> $ ceph tell osd.* config set osd_max_backfills 5
> 
> The default is that only one (1) backfills runs at the same time per
> OSD. By setting it to 5 you speed up the process by increasing the
> concurrency. This will however add load to the system and thus reduce
> the available I/O for clients.
> 

Currently the main user is the backfilling, so that's ok for now ;-)

It seems to reduce the wait time by about 3-5 times, so this will help
us make the changes we need. We can always reduce the max_backfills
later when we have actual users on the system...

Cheers

/Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] strange backfill delay after outing one node

2019-08-14 Thread Janne Johansson
Den ons 14 aug. 2019 kl 09:49 skrev Simon Oosthoek :

> Hi all,
>
> Yesterday I marked out all the osds on one node in our new cluster to
> reconfigure them with WAL/DB on their NVMe devices, but it is taking
> ages to rebalance.
>




> > ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
> > ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
> Since the cluster is currently hardly loaded, backfilling can take up
> all the unused bandwidth as far as I'm concerned...
> Is it a good idea to give the above commands or other commands to speed
> up the backfilling? (e.g. like increasing "osd max backfills")
>
>
> OSD max backfills is going to have a very large effect on recovery time,
so that
would be the obvious knob to twist first. Check what it defaults to now,
raise to 4,8,12,16
in steps and see that it doesn't slow rebalancing down too much.
Spindrives without any ssd/nvme journal/wal/db should perhaps have 1 or 2
at most,
ssds can take more than that and nvme even more before diminishing gains
occur.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] strange backfill delay after outing one node

2019-08-14 Thread Wido den Hollander



On 8/14/19 9:48 AM, Simon Oosthoek wrote:
> Hi all,
> 
> Yesterday I marked out all the osds on one node in our new cluster to
> reconfigure them with WAL/DB on their NVMe devices, but it is taking
> ages to rebalance. The whole cluster (and thus the osds) is only ~1%
> full, therefore the full ratio is nowhere in sight.
> 
> We have 14 osd nodes with 12 disks each, one of them was marked out,
> Yesterday around noon. It is still not completed and all the while, the
> cluster is in ERROR state, even though this is a normal maintenance
> operation.
> 
> We are still experimenting with the cluster, and it is still operational
> while being in ERROR state, however it is slightly worrying when
> considering that it could take even (50x?) longer if the cluster has 50x
> the amount of data. And the OSD's are mostly flatlined in the dashboard
> graphs, so it could potentially do it much faster, I think.
> 
> below are a few outputs of ceph -s(w):
> 
> Yesterday afternoon (~16:00)
> # ceph -w
>   cluster:
> id: b489547c-ba50-4745-a914-23eb78e0e5dc
> health: HEALTH_ERR
> Degraded data redundancy (low space): 139 pgs backfill_toofull
> 
>   services:
> mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 4h)
> mgr: cephmon1(active, since 4h), standbys: cephmon2, cephmon3
> mds: cephfs:1 {0=cephmds1=up:active} 1 up:standby
> osd: 168 osds: 168 up (since 3h), 156 in (since 3h); 1588 remapped pgs
> rgw: 1 daemon active (cephs3.rgw0)
> 
>   data:
> pools:   12 pools, 4116 pgs
> objects: 14.04M objects, 11 TiB
> usage:   20 TiB used, 1.7 PiB / 1.8 PiB avail
> pgs: 16188696/109408503 objects misplaced (14.797%)
>  2528 active+clean
>  1422 active+remapped+backfill_wait
>  139  active+remapped+backfill_wait+backfill_toofull
>  27   active+remapped+backfilling
> 
>   io:
> recovery: 205 MiB/s, 198 objects/s
> 
>   progress:
> Rebalancing after osd.47 marked out
>   [=.]
> Rebalancing after osd.5 marked out
>   [===...]
> Rebalancing after osd.132 marked out
>   [=.]
> Rebalancing after osd.90 marked out
>   [=.]
> Rebalancing after osd.76 marked out
>   [=.]
> Rebalancing after osd.157 marked out
>   [==]
> Rebalancing after osd.19 marked out
>   [=.]
> Rebalancing after osd.118 marked out
>   [..]
> Rebalancing after osd.146 marked out
>   [=.]
> Rebalancing after osd.104 marked out
>   [..]
> Rebalancing after osd.62 marked out
>   [===...]
> Rebalancing after osd.33 marked out
>   [==]
> 
> 
> This morning:
> # ceph -s
>   cluster:
> id: b489547c-ba50-4745-a914-23eb78e0e5dc
> health: HEALTH_ERR
> Degraded data redundancy (low space): 8 pgs backfill_toofull
> 
>   services:
> mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 22h)
> mgr: cephmon1(active, since 22h), standbys: cephmon2, cephmon3
> mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
> osd: 168 osds: 168 up (since 22h), 156 in (since 21h); 189 remapped pgs
> rgw: 1 daemon active (cephs3.rgw0)
> 
>   data:
> pools:   12 pools, 4116 pgs
> objects: 14.11M objects, 11 TiB
> usage:   21 TiB used, 1.7 PiB / 1.8 PiB avail
> pgs: 4643284/110159565 objects misplaced (4.215%)
>  3927 active+clean
>  162  active+remapped+backfill_wait
>  19   active+remapped+backfilling
>  8active+remapped+backfill_wait+backfill_toofull
> 
>   io:
> client:   32 KiB/s rd, 0 B/s wr, 31 op/s rd, 21 op/s wr
> recovery: 198 MiB/s, 149 objects/s
> 

It is still recovering it seems with 149 objects/second.

>   progress:
> Rebalancing after osd.47 marked out
>   [=.]
> Rebalancing after osd.5 marked out
>   [=.]
> Rebalancing after osd.132 marked out
>   [=.]
> Rebalancing after osd.90 marked out
>   [=.]
> Rebalancing after osd.76 marked out
>   [=.]
> Rebalancing after osd.157 marked out
>   [=.]
> Rebalancing after osd.19 marked out
>   [=.]
> Rebalancing after osd.146 marked out
>   [=.]
> Rebalancing after osd.104 marked out
>   [=.]
> Rebalancing after osd.62 marked out
>   [=.]
> 
> 
> I found some hints, though I'm not sure it's right for us at this url:
>