Quoting Patrick Donnelly (pdonn...@redhat.com):
> Hi Stefan,
>
> Sorry I couldn't get back to you sooner.
NP.
> Looks like you hit the infinite loop bug in OpTracker. It was fixed in
> 12.2.11: https://tracker.ceph.com/issues/37977
>
> The problem was introduced in 12.2.8.
We've been quite
Hi Stefan,
Sorry I couldn't get back to you sooner.
On Mon, May 27, 2019 at 5:02 AM Stefan Kooman wrote:
>
> Quoting Stefan Kooman (ste...@bit.nl):
> > Hi Patrick,
> >
> > Quoting Stefan Kooman (ste...@bit.nl):
> > > Quoting Stefan Kooman (ste...@bit.nl):
> > > > Quoting Patrick Donnelly
Quoting Stefan Kooman (ste...@bit.nl):
> Hi Patrick,
>
> Quoting Stefan Kooman (ste...@bit.nl):
> > Quoting Stefan Kooman (ste...@bit.nl):
> > > Quoting Patrick Donnelly (pdonn...@redhat.com):
> > > > Thanks for the detailed notes. It looks like the MDS is stuck
> > > > somewhere it's not even
Hi Patrick,
Quoting Stefan Kooman (ste...@bit.nl):
> Quoting Stefan Kooman (ste...@bit.nl):
> > Quoting Patrick Donnelly (pdonn...@redhat.com):
> > > Thanks for the detailed notes. It looks like the MDS is stuck
> > > somewhere it's not even outputting any log messages. If possible, it'd
> > > be
Quoting Stefan Kooman (ste...@bit.nl):
> Quoting Patrick Donnelly (pdonn...@redhat.com):
> > Thanks for the detailed notes. It looks like the MDS is stuck
> > somewhere it's not even outputting any log messages. If possible, it'd
> > be helpful to get a coredump (e.g. by sending SIGQUIT to the
Quoting Patrick Donnelly (pdonn...@redhat.com):
> Thanks for the detailed notes. It looks like the MDS is stuck
> somewhere it's not even outputting any log messages. If possible, it'd
> be helpful to get a coredump (e.g. by sending SIGQUIT to the MDS) or,
> if you're comfortable with gdb, a
On Thu, Oct 4, 2018 at 3:58 PM Stefan Kooman wrote:
> A couple of hours later we hit the same issue. We restarted with
> debug_mds=20 and debug_journaler=20 on the standby-replay node. Eight
> hours later (an hour ago) we hit the same issue. We captured ~ 4.7 GB of
> logging I skipped to the
Quoting Stefan Kooman (ste...@bit.nl):
> > From what you've described here, it's most likely that the MDS is trying to
> > read something out of RADOS which is taking a long time, and which we
> > didn't expect to cause a slow down. You can check via the admin socket to
> > see if there are
Quoting Gregory Farnum (gfar...@redhat.com):
>
> Ah, there's a misunderstanding here — the output isn't terribly clear.
> "is_healthy" is the name of a *function* in the source code. The line
>
> heartbeat_map is_healthy 'MDSRank' had timed out after 15
>
> is telling you that the
On Thu, Oct 4, 2018 at 3:58 PM Stefan Kooman wrote:
> Dear list,
>
> Today we hit our first Ceph MDS issue. Out of the blue the active MDS
> stopped working:
>
> mon.mon1 [WRN] daemon mds.mds1 is not responding, replacing it as rank 0
> with standby
> daemon mds.mds2.
>
> Logging of ceph-mds1:
>
10 matches
Mail list logo