[ceph-users] octopus rbd cluster just stopped out of nowhere (>20k slow ops)

2022-12-02 Thread Boris Behrens
hi, maybe someone here can help me to debug an issue we faced today. Today one of our clusters came to a grinding halt with 2/3 of our OSDs reporting slow ops. Only option to get it back to work fast, was to restart all OSDs daemons. The cluster is an octopus cluster with 150 enterprise SSD

[ceph-users] OMAP data growth

2022-12-02 Thread Wyll Ingersoll
We have a large cluster (10PB) which is about 30% full at this point. We recently fixed a configuration issue that then triggered the pg autoscaler to start moving around massive amounts of data (85% misplaced objects - about 7.5B objects). The misplaced % is dropping slowly (about 10% each

[ceph-users] Re: radosgw octopus - how to cleanup orphan multipart uploads

2022-12-02 Thread Boris Behrens
I am currently going over all our buckets, which takes some time: # for BUCKET in `radosgw-admin bucket stats | jq -r '.[] | .bucket'`; do radosgw-admin bi list --bucket ${BUCKET} | jq -r '.[] | select(.idx? | match("_multipart.*")) | .idx + ", " + .entry.meta.mtime' > ${BUCKET}.multiparts done

[ceph-users] Re: Tuning CephFS on NVME for HPC / IO500

2022-12-02 Thread Manuel Holtgrewe
Dear Mark. Thank you very much for all of this information. I learned a lot! In particular that I need to learn more about pinning. In the end, I want to run the whole thing in production with real world workloads. My main aim in running the benchmark is to ensure that my hardware and OS is

[ceph-users] Re: How to replace or add a monitor in stretch cluster?

2022-12-02 Thread Sake Paulusma
The instructions work great, the monitor is added in the monmap now. I asked about the Tiebreaker because there is a special command to replace the current one. But this manual intervention is probably still needed to first set the correct location. Will report back later when I replace the

[ceph-users] Re: How to replace or add a monitor in stretch cluster?

2022-12-02 Thread Adam King
yes, I think so. I think the context in which this originally came up was somebody trying to replace the tiebreaker mon. On Fri, Dec 2, 2022 at 9:08 AM Sake Paulusma wrote: > That isn't a great solution indeed, but I'll try the solution. Would this > also be necessary to replace the Tiebreaker?

[ceph-users] Re: How to replace or add a monitor in stretch cluster?

2022-12-02 Thread Sake Paulusma
That isn't a great solution indeed, but I'll try the solution. Would this also be necessary to replace the Tiebreaker? From: Adam King Sent: Friday, December 2, 2022 2:48:19 PM To: Sake Paulusma Cc: ceph-users@ceph.io Subject: Re: [ceph-users] How to replace or

[ceph-users] Re: How to replace or add a monitor in stretch cluster?

2022-12-02 Thread Adam King
This can't be done in a very nice way currently. There's actually an open PR against main to allow setting the crush location for mons in the service spec specifically because others found that this was annoying as well. What I think should work as a workaround is, go to the host where the mon

[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
Thanks for the hint, I tried turning that off: $ sudo ceph osd pool get cephfs_data pg_autoscale_mode pg_autoscale_mode: on $ sudo ceph osd pool set cephfs_data pg_autoscale_mode off set pool 9 pg_autoscale_mode to off $ sudo ceph osd pool get cephfs_data pg_autoscale_mode pg_autoscale_mode: off

[ceph-users] Re: proxmox hyperconverged pg calculations in ceph pacific, pve 7.2

2022-12-02 Thread Anthony D'Atri
> Hello, > > still do not really understand why this error message comes up. > The error message contains two significant numbers. The first one which is > easy to understand is the maximal value of pgs for each osd a precompiled > config variable (mon_max_pg_per_osd). The value on my cluster

[ceph-users] How to replace or add a monitor in stretch cluster?

2022-12-02 Thread Sake Paulusma
I succesfully setup a stretched cluster, except the CRUSH rule mentioned in the docs wasn't correct. The parameters for "min_size" and "max_size" should be removed, or else the rule can't be imported. Second there should be a mention about setting the monitor crush location takes sometime and

[ceph-users] Re: Ceph commands hang + no CephFS or RBD access

2022-12-02 Thread Eugen Block
Hi, can you elaborate a bit what happened and why "a few reboots" were required? 64% inactive PGs and 700 unkown PGs doesn't look too good. Has this improved a bit since your post? If ceph orch commands are not responding it could point to a broken mgr, do you see anything in the logs of

[ceph-users] Re: proxmox hyperconverged pg calculations in ceph pacific, pve 7.2

2022-12-02 Thread Frank Schilder
Hi Rainer, there is indeed a bit of a mess in terminology. The number mon_max_pg_per_osd means "the maximum number of PGs an OSD is a member of", which is equal to "the number of PG shards an OSD holds". Unfortunately, this confusion is endemic in the entire documentation and one needs to look

[ceph-users] Re: proxmox hyperconverged pg calculations in ceph pacific, pve 7.2

2022-12-02 Thread Rainer Krienke
Hello, still do not really understand why this error message comes up. The error message contains two significant numbers. The first one which is easy to understand is the maximal value of pgs for each osd a precompiled config variable (mon_max_pg_per_osd). The value on my cluster is 250.

[ceph-users] Re: radosgw-octopus latest - NoSuchKey Error - some buckets lose their rados objects, but not the bucket index

2022-12-02 Thread Boris Behrens
Hi Eric, sadly it took too long from the customer complaining until it reached my desk, so there are not RGW CLIENT logs. We are currently improving out logging situation to move the logs to graylog. Currently it looks like, that the GC removed rados objects it should not have removed due to

[ceph-users] radosgw octopus - how to cleanup orphan multipart uploads

2022-12-02 Thread Boris Behrens
Hi, we are currently encountering a lot of broken / orphan multipart uploads. When I try to fetch the multipart uploads via s3cmd, it just never finishes. Debug output looks like this and it basically never changes. DEBUG: signature-v4 headers: {'x-amz-date': '20221202T105838Z', 'Authorization':

[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
Can I get rid of PGs after trying to decrease the number on the pool again? Doing a backup and nuking the cluster seems a little too much work for me :) $ sudo ceph osd pool get cephfs_data pg_num pg_num: 128 $ sudo ceph osd pool set cephfs_data pg_num 16 $ sudo ceph osd pool get cephfs_data

[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Janne Johansson
> my OSDs are running odroid-hc4's and they only have about 4GB of memory, > and every 10 minutes a random OSD crashes due to out of memory. Sadly the > whole machine gets unresponsive when the memory gets completely full, so no > ssh access or prometheus output in the meantime. > I've set the

[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
for example on of my latest osd crashes looks like this in dmesg: [Dec 2 08:26] bstore_mempool invoked oom-killer: gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), nodemask=0, order=0, oom_score_adj=0 [ +0.06] bstore_mempool cpuset=ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc

[ceph-users] OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
Hi, my OSDs are running odroid-hc4's and they only have about 4GB of memory, and every 10 minutes a random OSD crashes due to out of memory. Sadly the whole machine gets unresponsive when the memory gets completely full, so no ssh access or prometheus output in the meantime. After the osd

[ceph-users] Re: Cache modes libvirt

2022-12-02 Thread Dominique Ramaekers
I use the same technic for my normal snapshot backups. But it’s regarding a Autodesk database. In order to have full support of Autodesk in case things go wrong, I need to follow Autodesk recommendations. That is to do a data-backup (db-dump + file store copie) with their tool being the ADMS