[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Mark Nelson
February 2, 2024 5:41 AM To: ceph-users Subject: [ceph-users] Re: OSD read latency grows over time I found the internal note I made about it, see below. When we trim thousands of OMAP keys in RocksDB this calls SingleDelete() in the RocksDBStore in Ceph, this causes tombstones in the RocksDB

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Cory Snyder
1024 PGs on NVMe. From: Anthony D'Atri Sent: Friday, February 2, 2024 2:37 PM To: Cory Snyder Subject: Re: [ceph-users] OSD read latency grows over time   Thanks. What type of media are your index OSDs? How many PGs? > On Feb 2, 2024, at 2: 32 PM, Cory Snyder wrote: > > Yes, we changed

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Cory Snyder
Yes, we changed osd_memory_target to 10 GB on just our index OSDs. These OSDs have over 300 GB of lz4 compressed bucket index omap data. Here is a graph showing the latencies before/after that single change: https://pasteboard.co/IMCUWa1t3Uau.png Cory Snyder From: Anthony D'Atri Sent:

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Anthony D'Atri
You adjusted osd_memory_target? Higher than the default 4GB? > > > Another thing that we've found is that rocksdb can become quite slow if it > doesn't have enough memory for internal caches. As our cluster usage has > grown, we've needed to increase OSD memory in accordance with bucket

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Cory Snyder
that increasing OSD memory improved rocksdb latencies by over 10x. Hope this helps! Cory Snyder From: Tobias Urdin Sent: Friday, February 2, 2024 5:41 AM To: ceph-users Subject: [ceph-users] Re: OSD read latency grows over time   I found the internal note I made about it, see below. When we

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Tobias Urdin
I found the internal note I made about it, see below. When we trim thousands of OMAP keys in RocksDB this calls SingleDelete() in the RocksDBStore in Ceph, this causes tombstones in the RocksDB database. These thousands of tombstones that each needs to be iterated over

[ceph-users] Re: OSD read latency grows over time

2024-02-02 Thread Tobias Urdin
Shiming in here, just so that it’s indexed in archives. We’ve have a lot of issues with tombstones when running RGW usage logging and when we trim those the Ceph OSD hosting that usage.X object will basically kill the OSD performance due to the tombstones being so many, restarting the OSD

[ceph-users] Re: OSD read latency grows over time

2024-01-26 Thread Mark Nelson
On 1/26/24 11:26, Roman Pashin wrote: Unfortunately they cannot. You'll want to set them in centralized conf and then restart OSDs for them to take effect. Got it. Thank you Josh! WIll put it to config of affected OSDs and restart them. Just curious, can decreasing

[ceph-users] Re: OSD read latency grows over time

2024-01-26 Thread Josh Baergen
> Just curious, can decreasing rocksdb_cf_compact_on_deletion_trigger 16384 > > 4096 hurt performance of HDD OSDs in any way? I have no growing latency on > HDD OSD, where data is stored, but it would be easier to set it to [osd] > section without cherry picking only SSD/NVME OSDs, but for all at

[ceph-users] Re: OSD read latency grows over time

2024-01-26 Thread Roman Pashin
> Unfortunately they cannot. You'll want to set them in centralized conf > and then restart OSDs for them to take effect. > Got it. Thank you Josh! WIll put it to config of affected OSDs and restart them. Just curious, can decreasing rocksdb_cf_compact_on_deletion_trigger 16384 > 4096 hurt

[ceph-users] Re: OSD read latency grows over time

2024-01-26 Thread Josh Baergen
> Do you know if it rocksdb_cf_compact_on_deletion_trigger and > rocksdb_cf_compact_on_deletion_sliding_window can be changed in runtime > without OSD restart? Unfortunately they cannot. You'll want to set them in centralized conf and then restart OSDs for them to take effect. Josh On Fri, Jan

[ceph-users] Re: OSD read latency grows over time

2024-01-26 Thread Roman Pashin
Hi Mark, In v17.2.7 we enabled a feature that automatically performs a compaction >> if too many tombstones are present during iteration in RocksDB. It >> might be worth upgrading to see if it helps (you might have to try >> tweaking the settings if the defaults aren't helping enough). The PR

[ceph-users] Re: OSD read latency grows over time

2024-01-22 Thread Roman Pashin
> > Hi Mark, thank you for prompt answer. The fact that changing the pg_num for the index pool drops the latency > back down might be a clue. Do you have a lot of deletes happening on > this cluster? If you have a lot of deletes and long pauses between > writes, you could be accumulating

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Mark Nelson
HI Roman, The fact that changing the pg_num for the index pool drops the latency back down might be a clue.  Do you have a lot of deletes happening on this cluster?  If you have a lot of deletes and long pauses between writes, you could be accumulating tombstones that you have to keep

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Roman Pashin
Hi Stefan, Do you make use of a separate db partition as well? And if so, where is > it stored? > No, only WAL partition is on separate NVME partition. Not sure if ceph-ansible could install Ceph with db partition on separate device on v17.6.2 Do you only see latency increase in reads? And not

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Roman Pashin
Hi Eugen, How is the data growth in your cluster? Is the pool size rather stable or > is it constantly growing? > Pool size is fairly constant with tiny up trend. It's growth doesn't correlate with increase of OSD read latency. I've combined pool usage with OSD read latency on one graph to

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Stefan Kooman
On 16-01-2024 11:22, Roman Pashin wrote: Hello Ceph users, we see strange issue on last recent Ceph installation v17.6.2. We store data on HDD pool, index pool is on SSD. Each OSD store its wal on NVME partition. Do you make use of a separate db partition as well? And if so, where is it

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Eugen Block
Hi, I checked two production clusters which don't use RGW too heavily, both on Pacific though. There's no latency increase visible there. How is the data growth in your cluster? Is the pool size rather stable or is it constantly growing? Thanks, Eugen Zitat von Roman Pashin : Hello