I am not sure if any change in Octopus make this worse, but we are in
Nautilus also seeing the RocksDB overhead during snaptrim is huge, we
walk around by throttling the snaptrim speed to minimal as well as throttle
deep-scurb, see https://www.spinics.net/lists/dev-ceph/msg01277.html for
Given the symptoms high CPU usage within RocksDB and corresponding
slowdown were presumably caused by RocksDB fragmentation.
And temporary workaround would be to do manual DB compaction using
ceph-kvstore-tool's compact command.
Thanks,
Igor
On 4/13/2020 1:01 AM, Jack wrote:
Yep I am
Yep I am
The issue is solved now .. and by solved, brace yourselves, I mean I had
to recreate all OSDs
And this the cluster would not heal itself (because of the original
issue), I had to drop every rados pool, stop all OSDs, destroy &
recreate them ..
Yeah, well, hum
There is definitly an
Are you sure your not being hit by:
ceph config set osd bluestore_fsck_quick_fix_on_mount false @
https://docs.ceph.com/docs/master/releases/octopus/
Have all your OSD's successfully completed the fsck?
Reasons I say that is I can see "20 OSD(s) reporting legacy (not per-pool)
BlueStore
Just to confirm this does not get better:
root@backup1:~# ceph status
cluster:
id: 9cd41f0f-936d-4b59-8e5d-9b679dae9140
health: HEALTH_WARN
20 OSD(s) reporting legacy (not per-pool) BlueStore omap
usage stats
4/50952060 objects unfound (0.000%)
The CPU is used by userspace, not kernelspace
Here is the perf top, see attachment
Rocksdb eats everything :/
On 4/8/20 3:14 PM, Paul Emmerich wrote:
> What's the CPU busy with while spinning at 100%?
>
> Check "perf top" for a quick overview
>
>
> Paul
>
Samples: 1M of event
What's the CPU busy with while spinning at 100%?
Check "perf top" for a quick overview
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Wed, Apr 8, 2020 at 3:09 PM
I do:
root@backup1:~# ceph config dump | grep snap_trim_sleep
globaladvanced osd_snap_trim_sleep
60.00
globaladvanced osd_snap_trim_sleep_hdd
60.00
(cluster is fully rusty)
On 4/8/20 2:53 PM, Dan van der Ster wrote:
> Do you have a custom value for osd_snap_trim_sleep
Do you have a custom value for osd_snap_trim_sleep ?
On Wed, Apr 8, 2020 at 2:03 PM Jack wrote:
>
> I put the nosnaptrim during upgrade because I saw high CPU usage and
> though it was somehow related to the upgrade process
> However, all my daemon are now running Octopus, and the issue is still
I put the nosnaptrim during upgrade because I saw high CPU usage and
though it was somehow related to the upgrade process
However, all my daemon are now running Octopus, and the issue is still
here, so I was wrong
On 4/8/20 1:58 PM, Wido den Hollander wrote:
>
>
> On 4/8/20 1:38 PM, Jack
On 4/8/20 1:38 PM, Jack wrote:
> Hello,
>
> I've a issue, since my Nautilus -> Octopus upgrade
>
> My cluster has many rbd images (~3k or something)
> Each of them has ~30 snapshots
> Each day, I create and remove a least a snapshot per image
>
> Since Octopus, when I remove the "nosnaptrim"
11 matches
Mail list logo