Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2019-06-12 Thread Stefan Kooman
Quoting Patrick Donnelly (pdonn...@redhat.com): > Hi Stefan, > > Sorry I couldn't get back to you sooner. NP. > Looks like you hit the infinite loop bug in OpTracker. It was fixed in > 12.2.11: https://tracker.ceph.com/issues/37977 > > The problem was introduced in 12.2.8. We've been quite

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2019-05-31 Thread Patrick Donnelly
Hi Stefan, Sorry I couldn't get back to you sooner. On Mon, May 27, 2019 at 5:02 AM Stefan Kooman wrote: > > Quoting Stefan Kooman (ste...@bit.nl): > > Hi Patrick, > > > > Quoting Stefan Kooman (ste...@bit.nl): > > > Quoting Stefan Kooman (ste...@bit.nl): > > > > Quoting Patrick Donnelly

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2019-05-27 Thread Stefan Kooman
Quoting Stefan Kooman (ste...@bit.nl): > Hi Patrick, > > Quoting Stefan Kooman (ste...@bit.nl): > > Quoting Stefan Kooman (ste...@bit.nl): > > > Quoting Patrick Donnelly (pdonn...@redhat.com): > > > > Thanks for the detailed notes. It looks like the MDS is stuck > > > > somewhere it's not even

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2019-01-16 Thread Stefan Kooman
Hi Patrick, Quoting Stefan Kooman (ste...@bit.nl): > Quoting Stefan Kooman (ste...@bit.nl): > > Quoting Patrick Donnelly (pdonn...@redhat.com): > > > Thanks for the detailed notes. It looks like the MDS is stuck > > > somewhere it's not even outputting any log messages. If possible, it'd > > > be

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-11-15 Thread Stefan Kooman
Quoting Stefan Kooman (ste...@bit.nl): > Quoting Patrick Donnelly (pdonn...@redhat.com): > > Thanks for the detailed notes. It looks like the MDS is stuck > > somewhere it's not even outputting any log messages. If possible, it'd > > be helpful to get a coredump (e.g. by sending SIGQUIT to the

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-23 Thread Stefan Kooman
Quoting Patrick Donnelly (pdonn...@redhat.com): > Thanks for the detailed notes. It looks like the MDS is stuck > somewhere it's not even outputting any log messages. If possible, it'd > be helpful to get a coredump (e.g. by sending SIGQUIT to the MDS) or, > if you're comfortable with gdb, a

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-08 Thread Patrick Donnelly
On Thu, Oct 4, 2018 at 3:58 PM Stefan Kooman wrote: > A couple of hours later we hit the same issue. We restarted with > debug_mds=20 and debug_journaler=20 on the standby-replay node. Eight > hours later (an hour ago) we hit the same issue. We captured ~ 4.7 GB of > logging I skipped to the

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-08 Thread Stefan Kooman
Quoting Stefan Kooman (ste...@bit.nl): > > From what you've described here, it's most likely that the MDS is trying to > > read something out of RADOS which is taking a long time, and which we > > didn't expect to cause a slow down. You can check via the admin socket to > > see if there are

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-05 Thread Stefan Kooman
Quoting Gregory Farnum (gfar...@redhat.com): > > Ah, there's a misunderstanding here — the output isn't terribly clear. > "is_healthy" is the name of a *function* in the source code. The line > > heartbeat_map is_healthy 'MDSRank' had timed out after 15 > > is telling you that the

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-05 Thread Gregory Farnum
On Thu, Oct 4, 2018 at 3:58 PM Stefan Kooman wrote: > Dear list, > > Today we hit our first Ceph MDS issue. Out of the blue the active MDS > stopped working: > > mon.mon1 [WRN] daemon mds.mds1 is not responding, replacing it as rank 0 > with standby > daemon mds.mds2. > > Logging of ceph-mds1: >