[ceph-users] Re: [Octopus] OSD overloading

2020-04-13 Thread Xiaoxi Chen
I am not sure if any change in Octopus make this worse, but we are in Nautilus also seeing the RocksDB overhead during snaptrim is huge, we walk around by throttling the snaptrim speed to minimal as well as throttle deep-scurb, see https://www.spinics.net/lists/dev-ceph/msg01277.html for

[ceph-users] Re: [Octopus] OSD overloading

2020-04-13 Thread Igor Fedotov
Given the symptoms high CPU usage within RocksDB and corresponding slowdown were presumably caused by RocksDB fragmentation. And temporary workaround would be to do manual DB compaction using  ceph-kvstore-tool's compact command. Thanks, Igor On 4/13/2020 1:01 AM, Jack wrote: Yep I am

[ceph-users] Re: [Octopus] OSD overloading

2020-04-12 Thread Jack
Yep I am The issue is solved now .. and by solved, brace yourselves, I mean I had to recreate all OSDs And this the cluster would not heal itself (because of the original issue), I had to drop every rados pool, stop all OSDs, destroy & recreate them .. Yeah, well, hum There is definitly an

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Ashley Merrick
Are you sure your not being hit by: ceph config set osd bluestore_fsck_quick_fix_on_mount false @ https://docs.ceph.com/docs/master/releases/octopus/ Have all your OSD's successfully completed the fsck? Reasons I say that is I can see "20 OSD(s) reporting legacy (not per-pool) BlueStore

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Jack
Just to confirm this does not get better: root@backup1:~# ceph status cluster: id: 9cd41f0f-936d-4b59-8e5d-9b679dae9140 health: HEALTH_WARN 20 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats 4/50952060 objects unfound (0.000%)

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Jack
The CPU is used by userspace, not kernelspace Here is the perf top, see attachment Rocksdb eats everything :/ On 4/8/20 3:14 PM, Paul Emmerich wrote: > What's the CPU busy with while spinning at 100%? > > Check "perf top" for a quick overview > > > Paul > Samples: 1M of event

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Paul Emmerich
What's the CPU busy with while spinning at 100%? Check "perf top" for a quick overview Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Apr 8, 2020 at 3:09 PM

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Jack
I do: root@backup1:~# ceph config dump | grep snap_trim_sleep globaladvanced osd_snap_trim_sleep 60.00 globaladvanced osd_snap_trim_sleep_hdd 60.00 (cluster is fully rusty) On 4/8/20 2:53 PM, Dan van der Ster wrote: > Do you have a custom value for osd_snap_trim_sleep

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Dan van der Ster
Do you have a custom value for osd_snap_trim_sleep ? On Wed, Apr 8, 2020 at 2:03 PM Jack wrote: > > I put the nosnaptrim during upgrade because I saw high CPU usage and > though it was somehow related to the upgrade process > However, all my daemon are now running Octopus, and the issue is still

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Jack
I put the nosnaptrim during upgrade because I saw high CPU usage and though it was somehow related to the upgrade process However, all my daemon are now running Octopus, and the issue is still here, so I was wrong On 4/8/20 1:58 PM, Wido den Hollander wrote: > > > On 4/8/20 1:38 PM, Jack

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Wido den Hollander
On 4/8/20 1:38 PM, Jack wrote: > Hello, > > I've a issue, since my Nautilus -> Octopus upgrade > > My cluster has many rbd images (~3k or something) > Each of them has ~30 snapshots > Each day, I create and remove a least a snapshot per image > > Since Octopus, when I remove the "nosnaptrim"