Hey Konstantin,
forget to mention - indeed clusters having 4K bluestore min alloc size
are more likely to be exposed to the issue. The key point is the
difference between bluestore and bluefs allocation sizes. The issue
likely to pop-up when user and DB data are collocated but different
Hi Igor,
> On 12 Sep 2023, at 15:28, Igor Fedotov wrote:
>
> Default hybrid allocator (as well as AVL one it's based on) could take
> dramatically long time to allocate pretty large (hundreds of MBs) 64K-aligned
> chunks for BlueFS. At the original cluster it was exposed as 20-30 sec OSD
>
HI All,
as promised here is a postmortem analysis on what happened.
the following ticket (https://tracker.ceph.com/issues/62815) with
accompanying materials provide a low-level overview on the issue.
In a few words it is as follows:
Default hybrid allocator (as well as AVL one it's based
The bluestore configuration was 100% default when we did the upgrade and
the issue happened. We have provided Igor with an OSD dump and a db dump
last Friday, so hopefully you can figure out something from it.
On 9/8/23 02:48, Konstantin Shalygin wrote:
This cluster use the default settings or
This cluster use the default settings or something for Bluestore was changed?
You can check this via `ceph config diff`
As Mark said, it will be nice to have a tracker, if this really release problem
Thanks,
k
Sent from my iPhone
> On 7 Sep 2023, at 20:22, J-P Methot wrote:
>
> We went from
On 07-09-2023 19:20, J-P Methot wrote:
We went from 16.2.13 to 16.2.14
Also, timeout is 15 seconds because it's the default in Ceph. Basically,
15 seconds before Ceph shows a warning that OSD is timing out.
We may have found the solution, but it would be, in fact, related to
I also see the dreaded. i find this is bcache problem .you can use blktrace tools capture iodatas analysis
发自我的小米在 Stefan Kooman ,2023年9月7日 下午10:52写道:On 07-09-2023 09:05, J-P Methot wrote:
> Hi,
>
> We're running latest Pacific on our production cluster and we've been
> seeing the dreaded
Oh that's very good to know. I'm sure Igor will respond here, but do
you know which PR this was related to? (possibly
https://github.com/ceph/ceph/pull/50321)
If we think there's a regression here we should get it into the tracker
ASAP.
Mark
On 9/7/23 13:45, J-P Methot wrote:
To be
To be quite honest, I will not pretend I have a low level understanding
of what was going on. There is very little documentation as to what the
bluestore allocator actually does and we had to rely on Igor's help to
find the solution, so my understanding of the situation is limited. What
I
Ok, good to know. Please feel free to update us here with what you are
seeing in the allocator. It might also be worth opening a tracker
ticket as well. I did some work in the AVL allocator a while back where
we were repeating the linear search from the same offset every
allocation, getting
Hi,
By this point, we're 95% sure that, contrary to our previous beliefs,
it's an issue with changes to the bluestore_allocator and not the
compaction process. That said, I will keep this email in mind as we will
want to test optimizations to compaction on our test environment.
On 9/7/23
We went from 16.2.13 to 16.2.14
Also, timeout is 15 seconds because it's the default in Ceph. Basically,
15 seconds before Ceph shows a warning that OSD is timing out.
We may have found the solution, but it would be, in fact, related to
bluestore_allocator and not the compaction process.
Hello,
There are two things that might help you here. One is to try the new
"rocksdb_cf_compaction_on_deletion" feature that I added in Reef and we
backported to Pacific in 16.2.13. So far this appears to be a huge win
for avoiding tombstone accumulation during iteration which is often the
Hi,
> On 7 Sep 2023, at 18:21, J-P Methot wrote:
>
> Since my post, we've been speaking with a member of the Ceph dev team. He
> did, at first, believe it was an issue linked to the common performance
> degradation after huge deletes operation. So we did do offline compactions on
> all our
Hi,
Since my post, we've been speaking with a member of the Ceph dev team.
He did, at first, believe it was an issue linked to the common
performance degradation after huge deletes operation. So we did do
offline compactions on all our OSDs. It fixed nothing and we are going
through the logs
On 07-09-2023 09:05, J-P Methot wrote:
Hi,
We're running latest Pacific on our production cluster and we've been
seeing the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed out
after 15.00954s' error. We have reasons to believe this happens each
time the RocksDB compaction
On an HDD-based Quincy 17.2.5 cluster (with DB/WAL on datacenter-class
NVMe with enhanced power loss protection), I sometimes (once or twice
per week) see log entries similar to what I reproduced below (a bit
trimmed):
Wed 2023-09-06 22:41:54 UTC ceph-osd09 ceph-osd@39.service[5574]:
We're talking about automatic online compaction here, not running the
command.
On 9/7/23 04:04, Konstantin Shalygin wrote:
Hi,
On 7 Sep 2023, at 10:05, J-P Methot wrote:
We're running latest Pacific on our production cluster and we've been
seeing the dreaded 'OSD::osd_op_tp thread
Hi,
> On 7 Sep 2023, at 10:05, J-P Methot wrote:
>
> We're running latest Pacific on our production cluster and we've been seeing
> the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed out after
> 15.00954s' error. We have reasons to believe this happens each time the
> RocksDB
19 matches
Mail list logo