[ceph-users] Re: 'ceph fs status' no longer works?

2024-05-02 Thread Eugen Block
Yeah, I knew it was something trivial, I just checked my notes but didn't have anything written down. I agree, it's not a big deal but shouldn't be necessary. Zitat von Erich Weiler : Excellent! Restarting all the MDS daemons fixed it. Thank you. This kinda feels like a bug. -erich On

[ceph-users] Re: 'ceph fs status' no longer works?

2024-05-02 Thread Erich Weiler
Excellent! Restarting all the MDS daemons fixed it. Thank you. This kinda feels like a bug. -erich On 5/2/24 12:44 PM, Bandelow, Gunnar wrote: Hi Erich, im not sure about this specific error message, but "ceph fs status" sometimes did fail me end of last year/in the beginning of the year.

[ceph-users] Re: 'ceph fs status' no longer works?

2024-05-02 Thread Bandelow, Gunnar
Hi Erich, im not sure about this specific error message, but "ceph fs status" sometimes did fail me end of last year/in the beginning of the year. Restarting ALL mon, mgr AND mds fixed it at the time. Best regards, Gunnar === Gunnar

[ceph-users] NVME node disks maxed out during rebalance after adding to existing cluster

2024-05-02 Thread Szabo, Istvan (Agoda)
Hi, I have slow heartbeat in front and back with the extra node added to the cluster and this occasionally causing slow ops and failed osd reports. I'm extending our cluster with +3 relatively differently configured servers compared to the original 12. Our cluster (latest octopus) is an

[ceph-users] Re: 'ceph fs status' no longer works?

2024-05-02 Thread Erich Weiler
Hi Eugen, Thanks for the tip! I just ran: ceph orch daemon restart mgr.pr-md-01.jemmdf (my specific mgr instance) And it restarted my primary mgr daemon, and in the process failed over to my standby mgr daemon on another server. That went smoothly. Unfortunately, I still cannot get 'ceph

[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-05-02 Thread Murilo Morais
Em qui., 2 de mai. de 2024 às 06:20, Matthew Vernon escreveu: > On 24/04/2024 13:43, Bailey Allison wrote: > > > A simple ceph-volume lvm activate should get all of the OSDs back up and > > running once you install the proper packages/restore the ceph config > > file/etc., > > What's the

[ceph-users] Re: 'ceph fs status' no longer works?

2024-05-02 Thread Eugen Block
Yep, seen this a couple of times during upgrades. I’ll have to check my notes if I wrote anything down for that. But try a mgr failover first, that could help. Zitat von Erich Weiler : Hi All, For a while now I've been using 'ceph fs status' to show current MDS active servers,

[ceph-users] 'ceph fs status' no longer works?

2024-05-02 Thread Erich Weiler
Hi All, For a while now I've been using 'ceph fs status' to show current MDS active servers, filesystem status, etc. I recently took down my MDS servers and added RAM to them (one by one, so the filesystem stayed online). After doing that with my four MDS servers (I had two active and two

[ceph-users] Re: service:mgr [ERROR] "Failed to apply:

2024-05-02 Thread Eugen Block
Can you please paste the output of the following command? ceph orch host ls Zitat von "Roberto Maggi @ Debian" : Hi you all, it is a couple of days I'm facing this problem. Although I already destroyed the cluster a couple of times I continuously get these error I instruct ceph to place

[ceph-users] Re: Ceph Day NYC 2024 Slides

2024-05-02 Thread Laura Flores
I was notified that my attachment may be stripped out of the email in some cases. Here is a link to the presentation in GitHub: https://github.com/ljflores/ceph_user_dev_monthly_meeting/blob/main/Launch%20of%20Ceph%20User%20Council%20.pdf Hopefully that works better for some people. Thanks,

[ceph-users] Re: cephadm custom crush location hooks

2024-05-02 Thread Eugen Block
Thank you very much for the quick response! I will take a look first thing tomorrow and try that in a test cluster. But I agree, it would be helpful to have a way with cephadm to apply these hooks without these workarounds. I'll check if there's a tracker issue for that, and create one if

[ceph-users] Re: cephadm custom crush location hooks

2024-05-02 Thread Wyll Ingersoll
I've found the crush location hook script code to be problematic in the containerized/cephadm world. Our workaround is to place the script in a common place on each OSD node, such as /etc/crush/crushhook.sh, and then make a link from /rootfs -> /, and set the configuration value so that the

[ceph-users] cephadm custom crush location hooks

2024-05-02 Thread Eugen Block
Hi, we've been using custom crush location hooks for some OSDs [1] for years. Since we moved to cephadm, we always have to manually edit the unit.run file of those OSDs because the path to the script is not mapped into the containers. I don't want to define custom location hooks for all

[ceph-users] service:mgr [ERROR] "Failed to apply:

2024-05-02 Thread Roberto Maggi @ Debian
Hi you all, it is a couple of days I'm facing this problem. Although I already destroyed the cluster a couple of times I continuously get these error I instruct ceph to place 3 daemons ceph orch apply mgr 3

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-02 Thread Wesley Dillingham
In our case it was with a EC pool as well. I believe the PG state was degraded+recovering / recovery_wait and iirc the PGs just simply sat in the recovering state without any progress (degraded PG object count did not decline). A repeer of the PG was attempted but no success there. A restart of

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-02 Thread Sridhar Seshasayee
> > Multiple people -- including me -- have also observed backfill/recovery > stop completely for no apparent reason. > > In some cases poking the lead OSD for a PG with `ceph osd down` restores, > in other cases it doesn't. > > Anecdotally this *may* only happen for EC pools on HDDs but that

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-02 Thread Anthony D'Atri
>> For our customers we are still disabling mclock and using wpq. Might be >> worth trying. >> >> > Could you please elaborate a bit on the issue(s) preventing the > use of mClock. Is this specific to only the slow backfill rate and/or other > issue? > > This feedback would help prioritize

[ceph-users] Re: After dockerized ceph cluster to Pacific, the fsid changed in the output of 'ceph -s'

2024-05-02 Thread Eugen Block
Hi, did you maybe have some test clusters leftovers on the hosts so cephadm might have picked up the wrong FSID? Does that mean that you adopted all daemons and only afterwards looked into ceph -s? I would have adopted the first daemon and checked immediately if everything still was as

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-02 Thread Mark Nelson
Hi Sridhar, (Very!) Slow backfill was one issue, but if I recall we hit a case where backfill wasn't completing at all until we reverted to WPQ. I was getting hammered with other stuff at the time so I don't quite remember the details, but Dan might. I think this was in Quincy after the

[ceph-users] Re: Unable to add new OSDs

2024-05-02 Thread Eugen Block
Hi, is the cluster healthy? Sometimes a degraded state prevents the orchestrator from doing its work. Then I would fail the mgr (ceph mgr fail), this seems to be necessary lots of times. Then keep an eye on the active mgr log as well as the cephadm.log locally on the host where the OSDs

[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-05-02 Thread Matthew Vernon
On 24/04/2024 13:43, Bailey Allison wrote: A simple ceph-volume lvm activate should get all of the OSDs back up and running once you install the proper packages/restore the ceph config file/etc., What's the equivalent procedure in a cephadm-managed cluster? Thanks, Matthew

[ceph-users] RBD Mirroring with Journaling and Snapshot mechanism

2024-05-02 Thread V A Prabha
Dear Eugen, We have a scenario of DC and DR replication, and planned to explore RBD mirroring with both Journaling and Snapshot mechanism. I have a 5 TB storage at Primary DC and 5 TB storage at DR site with 2 different ceph clusters configured. Please clarify the following queries 1. With One

[ceph-users] Re: Unable to add new OSDs

2024-05-02 Thread Bogdan Adrian Velica
Hi, I would suggest wiping the disks first with "wipefs -af /dev/_your_disk" or "sgdisk --zap-all /dev/your_disk" and try again. Try only one disk first. Is the host visible by running the command: "ceph orch host ls". Is the FQDN name correct? If so, does the following command return any errors?

[ceph-users] Re: Ceph client cluster compatibility

2024-05-02 Thread Konstantin Shalygin
Hi, Yes, like it always do k Sent from my iPhone > On 2 May 2024, at 07:09, Nima AbolhassanBeigi > wrote: > > We are trying to upgrade our OS version from ubuntu 18.04 to ubuntu 22.04. > Our ceph cluster version is 16.2.13 (pacific). > > The problem is that the ubuntu packages for the ceph