I don't have much experience in recovering struggling EC pools
unfortunately. Looks like it can't find OSDs for 2 out of the 6
shards. Since you run EC 4+2 the data isn't lost but not 100% sure how
to make it healthy.
There was a thread a while back that had some similar issue albeit
possibly
That is - one thing you could do is to rate limit PUT requests on your
haproxy down to a level that your cluster is stable. At least that
gives you a chance to finish the PG scaling without OSDs dying on you
constantly
On Fri, 1 Oct 2021 at 11:56, Christian Wuerdig
wrote:
>
> Ok, so I guess
Ok, so I guess there are several things coming together that end up
making your life a bit miserable at the moment:
- PG scaling causing increase IO
- Ingesting large number of objects into RGW causing lots of IOPs
- Usual client traffic
- Your NVME that's being used for WAL/DB has only half the
5.2.14 becauase that one has buffered_io
enabled by default.
ty
From: Frédéric Nass
Sent: Thursday, September 30, 2021 4:43 PM
To: Szabo, Istvan (Agoda) ; Christian Wuerdig
Cc: Ceph Users
Subject: Re: [ceph-users] Re: osd_memory_target=level0 ?
Email received from the internet
Hi Christian,
Yes, I very clearly know what is spillover, read that github leveled document
in the last couple of days every day multiple time. (Answers for your questions
are after the cluster background information).
About the cluster:
- users are doing continuously put/head/delete
Hi,
As Christian said, osd_memory_target has nothing to do with rocksdb
levels and will certainly not decide when overspilling occurs. With that
said, I doubt any of us here ever gave 32GB of RAM to any OSD, so in
case you're not sure that OSDs can handle that much memory correctly, I
would
Bluestore memory targets have nothing to do with spillover. It's
already been said several times: The spillover warning is simply
telling you that instead of writing data to your supposedly fast
wal/blockdb device it's now hitting your slow device.
You've stated previously that your fast device