[ceph-users] Re: Impact of large PG splits

2024-04-29 Thread Eugen Block
The split process completed over the weekend, the balancer did a great job: MIN PGs | MAX PGs | MIN USE % | MAX USE % 322 | 338 | 73,3 | 75,5 Although the number of PGs per OSD differs a bit the usage per OSD is quite good (and more important). The new hardware also arrived, so

[ceph-users] Re: Impact of large PG splits

2024-04-25 Thread Eugen Block
No, we didn’t change much, just increased the max pg per osd to avoid warnings and inactive PGs in case a node would fail during this process. And the max backfills, of course. Zitat von Frédéric Nass : Hello Eugen, Thanks for sharing the good news. Did you have to raise

[ceph-users] Re: Impact of large PG splits

2024-04-25 Thread Frédéric Nass
Hello Eugen, Thanks for sharing the good news. Did you have to raise mon_osd_nearfull_ratio temporarily? Frédéric. - Le 25 Avr 24, à 12:35, Eugen Block ebl...@nde.ag a écrit : > For those interested, just a short update: the split process is > approaching its end, two days ago there

[ceph-users] Re: Impact of large PG splits

2024-04-25 Thread Eugen Block
For those interested, just a short update: the split process is approaching its end, two days ago there were around 230 PGs left (target are 4096 PGs). So far there were no complaints, no cluster impact was reported (the cluster load is quite moderate, but still sensitive). Every now and

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Anthony D'Atri
One can up the ratios temporarily but it's all too easy to forget to reduce them later, or think that it's okay to run all the time with reduced headroom. Until a host blows up and you don't have enough space to recover into. > On Apr 12, 2024, at 05:01, Frédéric Nass > wrote: > > > Oh, and

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Eugen Block
Thanks for chiming in. They are on version 16.2.13 (I was already made aware of the bug you mentioned, thanks!) with wpq. Until now I haven't got an emergency call so I assume everything is calm (I hope). New hardware has been ordered but it will take a couple of weeks until it's delivered,

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Frédéric Nass
Oh, and yeah, considering "The fullest OSD is already at 85% usage" best move for now would be to add new hardware/OSDs (to avoid reaching the backfill too full limit), prior to start the splitting PGs before or after enabling upmap balancer depending on how the PGs got rebalanced (well enough

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Frédéric Nass
Hello Eugen, Is this cluster using WPQ or mClock scheduler? (cephadm shell ceph daemon osd.0 config show | grep osd_op_queue) If WPQ, you might want to tune osd_recovery_sleep* values as they do have a real impact on the recovery/backfilling speed. Just lower osd_max_backfills to 1 before

[ceph-users] Re: Impact of large PG splits

2024-04-10 Thread Gregory Orange
Setting osd_max_backfills at much more than 1 on HDD spinners seems anathema to me, and I recall reading others saying the same thing. That's because seek time is a major constraint on them, so keeping activity as contiguous as possible is going to help performance. Maybe pushing it to 2-3 is

[ceph-users] Re: Impact of large PG splits

2024-04-10 Thread Eugen Block
Thank you for input! We started the split with max_backfills = 1 and watched for a few minutes, then gradually increased it to 8. Now it's backfilling with around 180 MB/s, not really much but since client impact has to be avoided if possible, we decided to let that run for a couple of

[ceph-users] Re: Impact of large PG splits

2024-04-10 Thread Konstantin Shalygin
> On 10 Apr 2024, at 01:00, Eugen Block wrote: > > I appreciate your message, it really sounds tough (9 months, really?!). But > thanks for the reassurance :-) Yes, the total "make this project great again" tooks 16 month, I think. This my work First problem after 1M objects in PG was a

[ceph-users] Re: Impact of large PG splits

2024-04-10 Thread Gregory Orange
We are in the middle of splitting 16k EC 8+3 PGs on 2600x 16TB OSDs with NVME RocksDB, used exclusively for RGWs, holding about 60b objects. We are splitting for the same reason as you - improved balance. We also thought long and hard before we began, concerned about impact, stability etc.

[ceph-users] Re: Impact of large PG splits

2024-04-10 Thread Eugen Block
Thank you, Janne. I believe the default 5% target_max_misplaced_ratio would work as well, we've had good experience with that in the past, without the autoscaler. I just haven't dealt with such large PGs, I've been warning them for two years (when the PGs were only almost half this size)

[ceph-users] Re: Impact of large PG splits

2024-04-10 Thread Janne Johansson
Den tis 9 apr. 2024 kl 10:39 skrev Eugen Block : > I'm trying to estimate the possible impact when large PGs are > splitted. Here's one example of such a PG: > > PG_STAT OBJECTS BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOGUP > 86.3ff277708 4144030984090 0

[ceph-users] Re: Impact of large PG splits

2024-04-09 Thread Eugen Block
Hi, I appreciate your message, it really sounds tough (9 months, really?!). But thanks for the reassurance :-) They don’t have any other options so we’ll have to start that process anyway, probably tomorrow. We’ll see how it goes… Zitat von Konstantin Shalygin : Hi Eugene! I have a case,

[ceph-users] Re: Impact of large PG splits

2024-04-09 Thread Konstantin Shalygin
Hi Eugene! I have a case, where PG with millions of objects, like this ``` root@host# ./show_osd_pool_pg_usage.sh | less | head id used_mbytes used_objects omap_used_mbytes omap_used_keys -- --- -- 17.c91