The split process completed over the weekend, the balancer did a great job:
MIN PGs | MAX PGs | MIN USE % | MAX USE %
322 | 338 | 73,3 | 75,5
Although the number of PGs per OSD differs a bit the usage per OSD is
quite good (and more important). The new hardware also arrived, so
No, we didn’t change much, just increased the max pg per osd to avoid
warnings and inactive PGs in case a node would fail during this
process. And the max backfills, of course.
Zitat von Frédéric Nass :
Hello Eugen,
Thanks for sharing the good news. Did you have to raise
Hello Eugen,
Thanks for sharing the good news. Did you have to raise mon_osd_nearfull_ratio
temporarily?
Frédéric.
- Le 25 Avr 24, à 12:35, Eugen Block ebl...@nde.ag a écrit :
> For those interested, just a short update: the split process is
> approaching its end, two days ago there
For those interested, just a short update: the split process is
approaching its end, two days ago there were around 230 PGs left
(target are 4096 PGs). So far there were no complaints, no cluster
impact was reported (the cluster load is quite moderate, but still
sensitive). Every now and
One can up the ratios temporarily but it's all too easy to forget to reduce
them later, or think that it's okay to run all the time with reduced headroom.
Until a host blows up and you don't have enough space to recover into.
> On Apr 12, 2024, at 05:01, Frédéric Nass
> wrote:
>
>
> Oh, and
Thanks for chiming in.
They are on version 16.2.13 (I was already made aware of the bug you
mentioned, thanks!) with wpq.
Until now I haven't got an emergency call so I assume everything is
calm (I hope). New hardware has been ordered but it will take a couple
of weeks until it's delivered,
Oh, and yeah, considering "The fullest OSD is already at 85% usage" best move
for now would be to add new hardware/OSDs (to avoid reaching the backfill too
full limit), prior to start the splitting PGs before or after enabling upmap
balancer depending on how the PGs got rebalanced (well enough
Hello Eugen,
Is this cluster using WPQ or mClock scheduler? (cephadm shell ceph daemon osd.0
config show | grep osd_op_queue)
If WPQ, you might want to tune osd_recovery_sleep* values as they do have a
real impact on the recovery/backfilling speed. Just lower osd_max_backfills to
1 before
Setting osd_max_backfills at much more than 1 on HDD spinners seems
anathema to me, and I recall reading others saying the same thing.
That's because seek time is a major constraint on them, so keeping
activity as contiguous as possible is going to help performance. Maybe
pushing it to 2-3 is
Thank you for input!
We started the split with max_backfills = 1 and watched for a few
minutes, then gradually increased it to 8. Now it's backfilling with
around 180 MB/s, not really much but since client impact has to be
avoided if possible, we decided to let that run for a couple of
> On 10 Apr 2024, at 01:00, Eugen Block wrote:
>
> I appreciate your message, it really sounds tough (9 months, really?!). But
> thanks for the reassurance :-)
Yes, the total "make this project great again" tooks 16 month, I think. This my
work
First problem after 1M objects in PG was a
We are in the middle of splitting 16k EC 8+3 PGs on 2600x 16TB OSDs with
NVME RocksDB, used exclusively for RGWs, holding about 60b objects. We
are splitting for the same reason as you - improved balance. We also
thought long and hard before we began, concerned about impact, stability
etc.
Thank you, Janne.
I believe the default 5% target_max_misplaced_ratio would work as
well, we've had good experience with that in the past, without the
autoscaler. I just haven't dealt with such large PGs, I've been
warning them for two years (when the PGs were only almost half this
size)
Den tis 9 apr. 2024 kl 10:39 skrev Eugen Block :
> I'm trying to estimate the possible impact when large PGs are
> splitted. Here's one example of such a PG:
>
> PG_STAT OBJECTS BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOGUP
> 86.3ff277708 4144030984090 0
Hi,
I appreciate your message, it really sounds tough (9 months,
really?!). But thanks for the reassurance :-)
They don’t have any other options so we’ll have to start that process
anyway, probably tomorrow. We’ll see how it goes…
Zitat von Konstantin Shalygin :
Hi Eugene!
I have a case,
Hi Eugene!
I have a case, where PG with millions of objects, like this
```
root@host# ./show_osd_pool_pg_usage.sh | less | head
id used_mbytes used_objects omap_used_mbytes omap_used_keys
-- --- --
17.c91
16 matches
Mail list logo