[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-10 Thread Robert Sander
On 5/9/24 07:22, Xiubo Li wrote: We are disscussing the same issue in slack thread https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1715189877518529. Why is there a discussion about a bug off-list on a proprietary platform? Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str.

[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-09 Thread Dejan Lesjak
Hi Xiubo, Thanks. We'll check that. Cheers, Dejan On 9. 05. 24 07:22, Xiubo Li wrote: On 5/8/24 17:36, Dejan Lesjak wrote: Hi Xiubo, On 8. 05. 24 09:53, Xiubo Li wrote: Hi Dejan, This is a known issue and please see https://tracker.ceph.com/issues/61009. For the workaround please see

[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-08 Thread Xiubo Li
On 5/8/24 17:36, Dejan Lesjak wrote: Hi Xiubo, On 8. 05. 24 09:53, Xiubo Li wrote: Hi Dejan, This is a known issue and please see https://tracker.ceph.com/issues/61009. For the workaround please see https://tracker.ceph.com/issues/61009#note-26. Thank you for the links. Unfortunately

[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-08 Thread Dejan Lesjak
Hi Xiubo, On 8. 05. 24 09:53, Xiubo Li wrote: Hi Dejan, This is a known issue and please see https://tracker.ceph.com/issues/61009. For the workaround please see https://tracker.ceph.com/issues/61009#note-26. Thank you for the links. Unfortunately I'm not sure I understand the workaround:

[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-08 Thread Xiubo Li
Hi Dejan, This is a known issue and please see https://tracker.ceph.com/issues/61009. For the workaround please see https://tracker.ceph.com/issues/61009#note-26. Thanks - Xiubo On 5/8/24 06:49, Dejan Lesjak wrote: Hello, We have cephfs with two active MDS. Currently rank 1 is repeatedly

[ceph-users] Re: MDS crash

2024-04-28 Thread Eugen Block
Hi, can you share the current 'ceph status'? Do you have any inconsistent PGs or something? What are the cephfs data pool's min_size and size? Zitat von Alexey GERASIMOV : Colleagues, thank you for the advice to check the operability of MGRs. In fact, it is strange also: we checked our

[ceph-users] Re: MDS crash

2024-04-27 Thread Alexey GERASIMOV
I don't know why, but I miss my topic when I reply to it. Moderators, please delete unnecessary topics and move my answer to the correct topic. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS crash

2024-04-27 Thread Alexey GERASIMOV
Colleagues, thank you for the advice to check the operability of MGRs. In fact, it is strange also: we checked our nodes for the network issues (ip connectivity, sockets, ACL, DNS) and find nothing wrong - but suddenly just the restart of all MGRs solved the problem with stale PGs and with ceph

[ceph-users] Re: MDS crash

2024-04-27 Thread Alexey GERASIMOV
Colleagues, thank you for the advice to check the operability of MGRs. In fact, it is strange also: we checked our nodes for the network issues (ip connectivity, sockets, ACL, DNS) and find nothing wrong - but suddenly just the restart of all MGRs solved the problem with stale PGs and with ceph

[ceph-users] Re: MDS crash

2024-04-26 Thread Frédéric Nass
Hello, 'almost all diagnostic ceph subcommands hang!' -> this triggered my bell. We've had a similar issue with many ceph commands hanging due to a missing L3 ACL between MGRs and a new MDS machine that we added to the cluster. I second Eugen analysis: network issue, whatever the OSI layer.

[ceph-users] Re: MDS crash

2024-04-26 Thread Eugen Block
Hi, it's unlikely that all OSDs fail at the same time, it seems like a network issue. Do you have an active MGR? Just a couple of days ago someone reported incorrect OSD stats because no MGR was up. Although your 'ceph health detail' output doesn't mention that, there are still issues when

[ceph-users] Re: MDS crash

2024-04-25 Thread Alexey GERASIMOV
Colleagues, I have the update. Starting from yestrerday the situation with ceph health is much worse than it was previously. We found that - ceph -s informs us that some PGs are in stale state - almost all diagnostic ceph subcommands hang! For example, "ceph osd ls" , "ceph osd dump", "ceph

[ceph-users] Re: MDS crash

2024-04-22 Thread Eugen Block
Right, I just figured from the health output you would have a couple of seconds or so to query the daemon: mds: 1/1 daemons up Zitat von Alexey GERASIMOV : Ok, we will create the ticket. Eugen Block - ceph tell command needs to communicate with the MDS daemon running, but it is

[ceph-users] Re: MDS crash

2024-04-22 Thread Alexey GERASIMOV
Ok, we will create the ticket. Eugen Block - ceph tell command needs to communicate with the MDS daemon running, but it is crashed. So, I just have the information about the impossibility to receive the information from daemon: ceph tell mds.0 damage ls Error ENOENT: problem getting command

[ceph-users] Re: MDS crash

2024-04-21 Thread Xiubo Li
Hi Alexey, This looks a new issue for me. Please create a tracker for it and provide the detail call trace there. Thanks - Xiubo On 4/19/24 05:42, alexey.gerasi...@opencascade.com wrote: Dear colleagues, hope that anybody can help us. The initial point: Ceph cluster v15.2 (installed and

[ceph-users] Re: MDS crash

2024-04-21 Thread Eugen Block
What’s the output of: ceph tell mds.0 damage ls Zitat von alexey.gerasi...@opencascade.com: Dear colleagues, hope that anybody can help us. The initial point: Ceph cluster v15.2 (installed and controlled by the Proxmox) with 3 nodes based on physical servers rented from a cloud

[ceph-users] Re: MDS crash after Disaster Recovery

2023-09-13 Thread Eugen Block
Hi, I would try to finish the upgrade first and bring all daemons to the same ceph version before trying any recovery. Was it a failed upgrade attempt? Can you please share 'ceph -s', 'ceph versions' and 'ceph orch upgrade status'? Zitat von Sasha BALLET : Hi, I'm struggling with a

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2023-05-04 Thread Peter van Heusden
Hi Emmaneul It was a while ago, but as I recall I evicted all clients and that allowed me to restart the MDS servers. There was something clearly "broken" in how at least one of the clients was interacting with the system. Peter On Thu, 4 May 2023 at 07:18, Emmanuel Jaep wrote: > Hi, > > did

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2023-05-03 Thread Emmanuel Jaep
Hi, did you finally figure out what happened? I do have the same behavior and we can't get the mds to start again... Thanks, Emmanuel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Mds crash at cscs

2023-01-24 Thread Venky Shankar
On Thu, Jan 19, 2023 at 9:07 PM Lo Re Giuseppe wrote: > > Dear all, > > We have started to use more intensively cephfs for some wlcg related workload. > We have 3 active mds instances spread on 3 servers, > mds_cache_memory_limit=12G, most of the other configs are default ones. > One of them has

[ceph-users] Re: MDS crash due to seemingly unrecoverable metadata error

2022-02-23 Thread Xiubo Li
Have you tried to backup and then remove the 'mds%d_openfiles.%x' object to see could you start the MDS ? Thanks. On 2/23/22 7:07 PM, Wolfgang Mair wrote: Update: I managed to clear the inode errors by deleting the parent directory entry from the metadata pool. However the MDS still refuses

[ceph-users] Re: MDS crash when unlink file

2022-02-15 Thread Venky Shankar
On Mon, Feb 14, 2022 at 5:33 PM Arnaud MARTEL wrote: > > Hi Venky, > > Thank's a lot for your answer. I needed to reduce the number of running MDS > before set debug_mds to 20 but, now, I was able to reproduce the crash and > generate the full logfile. > You can download it with the following

[ceph-users] Re: MDS crash when unlink file

2022-02-11 Thread Venky Shankar
Hi Arnaud, On Fri, Feb 11, 2022 at 2:42 PM Arnaud MARTEL wrote: > > Hi, > > MDSs are crashing on my production cluster when trying to unlink some files > and I need help :-). > When looking into the log files, I have identified some associated files and > I ran a scrub on the parent directory

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2021-08-06 Thread Yann Dupont
Le 06/08/2021 à 10:46, Peter van Heusden a écrit : Hi Yann So I resolved the problem by taking clients offline till I found the one - i think the one where conda had been running - that seemed to be causing Hi Peter, thanks for the answer, I'm afraid I'll have to do this too. the problem. I

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2021-08-06 Thread Peter van Heusden
Hi Yann So I resolved the problem by taking clients offline till I found the one - i think the one where conda had been running - that seemed to be causing the problem. I then was able to restart the MDS daemons and things came right. It is certainly a troubling issue. And sorry, I didn't open a

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2021-08-06 Thread Yann Dupont
Le 28/06/2021 à 10:52, Peter van Heusden a écrit : I am running Ceph 15.2.13 on CentOS 7.9.2009 and recently my MDS servers have started failing with the error message In function 'void Server::handle_client_open(MDRequestRef&)' thread 7f0ca9908700 time 2021-06-28T09:21:11.484768+0200

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2021-06-28 Thread Stefan Kooman
On 6/28/21 3:52 PM, Peter van Heusden wrote: Yes it keeps crashing in a loop.  I ran again with debug set to 20 and the last 100,000 lines of that log are here: https://gist.github.com/pvanheus/33eb22b179a9cbd68a460984de8ef24a

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2021-06-28 Thread Stefan Kooman
On 6/28/21 10:52 AM, Peter van Heusden wrote: I am running Ceph 15.2.13 on CentOS 7.9.2009 and recently my MDS servers have started failing with the error message Do they keep crashing (in a loop)? Can you set ms / mds debug to ... say 20/20? debug_ms = 20/20 debug_mds = 20/20 Gr. Stefan

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2021-06-28 Thread Peter van Heusden
Yes it keeps crashing in a loop. I ran again with debug set to 20 and the last 100,000 lines of that log are here: https://gist.github.com/pvanheus/33eb22b179a9cbd68a460984de8ef24a On Mon, 28 Jun 2021 at 15:29, Stefan Kooman wrote: > On 6/28/21 10:52 AM, Peter van Heusden wrote: > > I am

[ceph-users] Re: mds crash loop

2019-12-05 Thread Karsten Nielsen
:Re: [ceph-users] Re: mds crash loop To: Karsten Nielsen ; CC: ceph-users@ceph.io; > On Tue, Nov 12, 2019 at 6:18 PM Karsten Nielsen wrote: > > > > -Original message- > > From: Karsten Nielsen > > Sent: Tue 12-11-2019 10:30 > > Subject:

[ceph-users] Re: mds crash loop

2019-11-12 Thread Yan, Zheng
On Tue, Nov 12, 2019 at 6:18 PM Karsten Nielsen wrote: > > -Original message- > From: Karsten Nielsen > Sent: Tue 12-11-2019 10:30 > Subject: [ceph-users] Re: mds crash loop > To: Yan, Zheng ; > CC: ceph-users@ceph.io; > > -Original

[ceph-users] Re: mds crash loop

2019-11-12 Thread Karsten Nielsen
-Original message- From: Karsten Nielsen Sent: Tue 12-11-2019 10:30 Subject:[ceph-users] Re: mds crash loop To: Yan, Zheng ; CC: ceph-users@ceph.io; > -Original message- > From: Yan, Zheng > Sent: Mon 11-11-2019 15:09 > Subject: Re: [ceph-us

[ceph-users] Re: mds crash loop

2019-11-12 Thread Karsten Nielsen
-Original message- From: Yan, Zheng Sent: Mon 11-11-2019 15:09 Subject:Re: [ceph-users] Re: mds crash loop To: Karsten Nielsen ; CC: ceph-users@ceph.io; > On Mon, Nov 11, 2019 at 5:09 PM Karsten Nielsen wrote: > > > > I started a job that moved so

[ceph-users] Re: mds crash loop

2019-11-11 Thread Yan, Zheng
n > > -Original message- > From: Yan, Zheng > Sent: Thu 07-11-2019 14:20 > Subject:Re: [ceph-users] Re: mds crash loop > To: Karsten Nielsen ; > CC: ceph-users@ceph.io; > > On Thu, Nov 7, 2019 at 6:40 PM Karsten Nielsen wrote: > > > > &g

[ceph-users] Re: mds crash loop

2019-11-11 Thread Karsten Nielsen
-2019 14:20 Subject:Re: [ceph-users] Re: mds crash loop To: Karsten Nielsen ; CC: ceph-users@ceph.io; > On Thu, Nov 7, 2019 at 6:40 PM Karsten Nielsen wrote: > > > > That is awesome. > > > > Now I just need to figure out where the lost+found files needs t

[ceph-users] Re: mds crash loop

2019-11-08 Thread Karsten Nielsen
-Original message- From: Yan, Zheng Sent: Thu 07-11-2019 14:20 Subject:Re: [ceph-users] Re: mds crash loop To: Karsten Nielsen ; CC: ceph-users@ceph.io; > On Thu, Nov 7, 2019 at 6:40 PM Karsten Nielsen wrote: > > > > That is awesome. > >

[ceph-users] Re: mds crash loop

2019-11-07 Thread Yan, Zheng
s/missing_obj_dirs > Any tool that is able to do that ? > > Thanks > - Karsten > > -Original message- > From: Yan, Zheng > Sent: Thu 07-11-2019 09:22 > Subject:Re: [ceph-users] Re: mds crash loop > To: Karsten Nielsen ; > CC: ceph-users@c

[ceph-users] Re: mds crash loop

2019-11-07 Thread Karsten Nielsen
: [ceph-users] Re: mds crash loop To: Karsten Nielsen ; CC: ceph-users@ceph.io; > I have tracked down the root cause. See https://tracker.ceph.com/issues/42675 > > Regards > Yan, Zheng > > On Thu, Nov 7, 2019 at 4:01 PM Karsten Nielsen wrote: > > > > -Origina

[ceph-users] Re: mds crash loop

2019-11-07 Thread Yan, Zheng
I have tracked down the root cause. See https://tracker.ceph.com/issues/42675 Regards Yan, Zheng On Thu, Nov 7, 2019 at 4:01 PM Karsten Nielsen wrote: > > -Original message- > From: Yan, Zheng > Sent: Thu 07-11-2019 07:21 > Subject: Re: [ceph-users] Re:

[ceph-users] Re: mds crash loop

2019-11-07 Thread Karsten Nielsen
-Original message- From: Yan, Zheng Sent: Thu 07-11-2019 07:21 Subject:Re: [ceph-users] Re: mds crash loop To: Karsten Nielsen ; CC: ceph-users@ceph.io; > On Thu, Nov 7, 2019 at 5:50 AM Karsten Nielsen wrote: > > > > -Original message- > &g

[ceph-users] Re: mds crash loop

2019-11-06 Thread Karsten Nielsen
-Original message- From: Yan, Zheng Sent: Wed 06-11-2019 14:16 Subject:Re: [ceph-users] mds crash loop To: Karsten Nielsen ; CC: ceph-users@ceph.io; > On Wed, Nov 6, 2019 at 4:42 PM Karsten Nielsen wrote: > > > > -Original message- > > From: Yan, Zheng >

[ceph-users] Re: mds crash loop

2019-11-06 Thread Karsten Nielsen
-Original message- From: Yan, Zheng Sent: Wed 06-11-2019 14:16 Subject:Re: [ceph-users] mds crash loop To: Karsten Nielsen ; CC: ceph-users@ceph.io; > On Wed, Nov 6, 2019 at 4:42 PM Karsten Nielsen wrote: > > > > -Original message- > > From: Yan, Zheng >

[ceph-users] Re: mds crash loop

2019-11-06 Thread Yan, Zheng
On Wed, Nov 6, 2019 at 4:42 PM Karsten Nielsen wrote: > > -Original message- > From: Yan, Zheng > Sent: Wed 06-11-2019 08:15 > Subject:Re: [ceph-users] mds crash loop > To: Karsten Nielsen ; > CC: ceph-users@ceph.io; > > On Tue, Nov 5, 2019 at 5:29 PM Karsten Nielsen

[ceph-users] Re: mds crash loop

2019-11-06 Thread Karsten Nielsen
-Original message- From: Yan, Zheng Sent: Wed 06-11-2019 08:15 Subject:Re: [ceph-users] mds crash loop To: Karsten Nielsen ; CC: ceph-users@ceph.io; > On Tue, Nov 5, 2019 at 5:29 PM Karsten Nielsen wrote: > > > > Hi, > > > > Last week I upgraded my ceph cluster from

[ceph-users] Re: mds crash loop

2019-11-05 Thread Yan, Zheng
On Tue, Nov 5, 2019 at 5:29 PM Karsten Nielsen wrote: > > Hi, > > Last week I upgraded my ceph cluster from luminus to mimic 13.2.6 > It was running fine for a while but yesterday my mds went into a crash loop. > > I have 1 active and 1 standby mds for my cephfs both of which is running the >

[ceph-users] Re: mds crash loop

2019-11-05 Thread Karsten Nielsen
from ceph -w [root@k8s-node-01 /]# ceph -w cluster: id: 571d4bfe-2c5d-45ca-8da1-91dcaf69942c health: HEALTH_WARN 1 filesystem is degraded services: mon: 3 daemons, quorum k8s-node-00,k8s-node-01,k8s-node-02 mgr: k8s-node-01(active) mds: cephfs-1/1/1 up