[ceph-users] Re: ceph orch upgrade to 18.2.1 seems stuck on MDS?

2024-02-07 Thread Nigel Williams
On Wed, 7 Feb 2024 at 20:00, Nigel Williams wrote: > > and just MDS left to do but upgrade has been sitting for hours on this > > resolved by rebooting a single host...still not sure why this fixed it other than it had a standby MDS that would not upgrade?

[ceph-users] Re: Snapshot automation/scheduling for rbd?

2024-02-07 Thread Jayanth Reddy
Right. IIUC, disk snapshots are disabled in the global settings and I believe they also warn you that it can not produce crash-consistent snapshots. I believe the snapshots can be taken like but not sure if a pause or fs freeze is involved. AFAIK, you'll have to initiate snapshot for each volum

[ceph-users] Re: Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Jasper Tan
Hi Anthony and everyone else We have found the issue. Because the new 20x 14 TiB OSDs were onboarded onto a single node, there was not only an imbalance in the capacity of each OSD but also between the nodes (other nodes each have around 15x 1.7TiB). Furthermore, CRUSH rule sets default failure do

[ceph-users] Re: PG stuck at recovery

2024-02-07 Thread Kai Stian Olstad
You don't say anything about the Ceph version you are running. I had an similar issue with 17.2.7, and is seams to be an issue with mclock, when I switch to wpq everything worked again. You can read more about it here https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/IPHBE3DLW5ABCZH

[ceph-users] Re: Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Anthony D'Atri
> I have recently onboarded new OSDs into my Ceph Cluster. Previously, I had > 44 OSDs of 1.7TiB each and was using it for about a year. About 1 year ago, > we onboarded an additional 20 OSDs of 14TiB each. That's a big difference in size. I suggest increasing mon_max_pg_per_osd to 1000 --

[ceph-users] Re: Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Dan van der Ster
Hi Jasper, I suggest to disable all the crush-compat and reweighting approaches. They rarely work out. The state of the art is: ceph balancer on ceph balancer mode upmap ceph config set mgr mgr/balancer/upmap_max_deviation 1 Cheers, Dan -- Dan van der Ster CTO Clyso GmbH p: +49 89 215252722 |

[ceph-users] PG stuck at recovery

2024-02-07 Thread LeonGao
Hi community We have a new Ceph cluster deployment with 100 nodes. When we are draining an OSD host from the cluster, we see a small amount of PGs that cannot make any progress to the end. From the logs and metrics, it seems like the recovery progress is stuck (0 recovery ops for several days).

[ceph-users] Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Jasper Tan
Hi I have recently onboarded new OSDs into my Ceph Cluster. Previously, I had 44 OSDs of 1.7TiB each and was using it for about a year. About 1 year ago, we onboarded an additional 20 OSDs of 14TiB each. However I observed that many of the data were still being written onto the original 1.7TiB OS

[ceph-users] Re: RBD Image Returning 'Unknown Filesystem LVM2_member' On Mount - Help Please

2024-02-07 Thread Gilles Mocellin
Le dimanche 4 février 2024, 09:29:04 CET duluxoz a écrit : > Hi Cedric, > > That's what I thought - the access method shouldn't make a difference. > > No, no lvs details at all - I mean, yes, the osds show up with the lvs > command on the ceph node(s), but not on the individual pools/images (on

[ceph-users] Accumulation of removed_snaps_queue After Deleting Snapshots in Ceph RBD

2024-02-07 Thread localhost Liam
Hello, I'm encountering an issue with Ceph when using it as the backend storage for OpenStack Cinder. Specifically, after deleting RBD snapshots through Cinder, I've noticed a significant increase in the removed_snaps_queue entries within the corresponding Ceph pool. It seems to affect the pool

[ceph-users] ceph error connecting to the cluster

2024-02-07 Thread arimbidhea3
hello, i was tried to create osd but when i run ceph command the output is like this : root@pod-deyyaa-ceph1:~# sudo ceph -s 2024-02-02T16:01:23.627+0700 7fc762f37640 0 monclient(hunting): authenticate timed out after 300 [errno 110] RADOS timed out (error connecting to the cluster) can anyon

[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-07 Thread Yuri Weinstein
We are still working through the remaining issues and will do a full cycle of testing soon. Adam, the issues mentioned by Ilya below require some response and resolution, pls take a look > rbd looks good overall but we are missing iSCSI coverage due to > https://tracker.ceph.com/issues/64126 On

[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-07 Thread Konstantin Shalygin
> > On Feb 7, 2024, at 16:59, Zakhar Kirpichenko wrote: > > Indeed, it looks like it's been recently reopened. Thanks for this! Hi, It was merged yesterday Thanks for the right noise, k ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscr

[ceph-users] Re: Problems adding a new host via orchestration.

2024-02-07 Thread Eugen Block
I still don't have an explanation or other ideas, but I was able to add a Rocky Linux 9 host to my existing quincy cluster based on openSUSE (don't have pacific in this environment) quite fast and easy. It is a fresh Rocky Install, only added cephadm and podman packages, copied the ceph.pub

[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-07 Thread Zakhar Kirpichenko
Indeed, it looks like it's been recently reopened. Thanks for this! /Z On Wed, 7 Feb 2024 at 15:43, David Orman wrote: > That tracker's last update indicates it's slated for inclusion. > > On Thu, Feb 1, 2024, at 10:47, Zakhar Kirpichenko wrote: > > Hi, > > > > Please consider not leaving this

[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-07 Thread David Orman
That tracker's last update indicates it's slated for inclusion. On Thu, Feb 1, 2024, at 10:47, Zakhar Kirpichenko wrote: > Hi, > > Please consider not leaving this behind: > https://github.com/ceph/ceph/pull/55109 > > It's a serious bug, which potentially affects a whole node stability if > the

[ceph-users] Re: Direct ceph mount on desktops

2024-02-07 Thread Tim Holloway
Followup. Desktop system went to sleep overnight. I woke up to this: HEALTH_WARN 1 client(s) laggy due to laggy OSDs; 1 clients failing to respond to capability release; 1 MDSs report slow requests [WRN] MDS_CLIENTS_LAGGY: 1 client(s) laggy due to laggy OSDs mds.ceefs.www2.lzjqgd(mds.0): Cli

[ceph-users] ceph orch upgrade to 18.2.1 seems stuck on MDS?

2024-02-07 Thread Nigel Williams
kicked off ceph orch upgrade start --image quay.io/ceph/ceph:v18.2.1 and just MDS left to do but upgrade has been sitting for hours on this root@rdx-00:~# ceph orch upgrade status { "target_image": " quay.io/ceph/ceph@sha256:a4e86c750cc11a8c93453ef5682acfa543e3ca08410efefa30f520b54f41831f ",