==
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
> From: Yan, Zheng
> Sent: 20 May 2019 13:34
> To: Frank Schilder
> Cc: Stefan Kooman; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] mimic: MDS st
13:34
To: Frank Schilder
Cc: Stefan Kooman; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS
bug?)
On Sat, May 18, 2019 at 5:47 PM Frank Schilder wrote:
>
> Dear Yan and Stefan,
>
> it happened again and there were only very fe
To: Frank Schilder
Cc: Stefan Kooman; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS
bug?)
On Sat, May 18, 2019 at 5:47 PM Frank Schilder wrote:
>
> Dear Yan and Stefan,
>
> it happened again and there were only very few ops in
On Sat, May 18, 2019 at 5:47 PM Frank Schilder wrote:
>
> Dear Yan and Stefan,
>
> it happened again and there were only very few ops in the queue. I pulled the
> ops list and the cache. Please find a zip file here:
> "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l"; . Its a bit
9 17:41
To: Frank Schilder
Cc: Yan, Zheng; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS
bug?)
Quoting Frank Schilder (fr...@dtu.dk):
>
> [root@ceph-01 ~]# ceph status # before the MDS failed over
> cluster:
> id: ###
>
Quoting Frank Schilder (fr...@dtu.dk):
>
> [root@ceph-01 ~]# ceph status # before the MDS failed over
> cluster:
> id: ###
> health: HEALTH_WARN
> 1 MDSs report slow requests
>
> services:
> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
> mgr: ceph-01(active), st
Hi Stefan, cc Yan,
thanks for your quick reply.
> I am pretty sure you hit bug #26982: https://tracker.ceph.com/issues/26982
> "mds: crash when dumping ops in flight".
Everything is fine, the daemon did not crash. The dump cache operation seems to
be a blocking operation. It simply blocked the
Quoting Frank Schilder (fr...@dtu.dk):
> Dear Yan and Stefan,
>
> it happened again and there were only very few ops in the queue. I
> pulled the ops list and the cache. Please find a zip file here:
> "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l"; .
> Its a bit more than 100MB.
Dear Yan and Stefan,
it happened again and there were only very few ops in the queue. I pulled the
ops list and the cache. Please find a zip file here:
"https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l"; . Its a bit
more than 100MB.
The active MDS failed over to the standby afte
e it doesn't take too long.
>
> Thanks for your input!
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
> From: Yan, Zheng
> Sent: 16 May 2019 09:35
> To: Frank Schilder
> Subject: R
From: Yan, Zheng
Sent: 16 May 2019 09:35
To: Frank Schilder
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS
bug?)
On Thu, May 16, 2019 at 2:52 PM Frank Schilder wrote:
>
> Dear Yan,
>
> OK, I will try to trigger the problem again and dump the
ank Schilder
> Cc: Stefan Kooman; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS
> bug?)
>
> > [...]
> > This time I captured the MDS ops list (log output does not really contain
> > more info than this list). I
Quoting Frank Schilder (fr...@dtu.dk):
> Dear Stefan,
>
> thanks for the fast reply. We encountered the problem again, this time in a
> much simpler situation; please see below. However, let me start with your
> questions first:
>
> What bug? -- In a single-active MDS set-up, should there ever
-users] mimic: MDS standby-replay causing blocked ops (MDS
bug?)
> [...]
> This time I captured the MDS ops list (log output does not really contain
> more info than this list). It contains 12 ops and I will include it here in
> full length (hope this is acceptable):
>
Your issues
"age": 23.462997,
> "duration": 23.463467,
> "type_data": {
> "flag_point": "failed to authpin, dir is being fragmented",
> "reqid": "client.377552:5446
},
{
"time": "2019-05-15 11:38:36.511392",
"event": "all_read"
},
{
"time": "2019-05-15 11:38:36.511561&
Quoting Frank Schilder (fr...@dtu.dk):
If at all possible I would:
Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2).
Use more recent kernels on the clients.
Below settings for [mds] might help with trimming (you might already
have changed mds_log_max_segments to 128 accordi
Short story:
We have a new HPC installation with file systems provided by cephfs (home,
apps, ...). We have one cephfs and all client file systems are sub-directory
mounts. On this ceph file system, we have a bit more than 500 nodes with
currently 2 ceph fs mounts each, resulting i
18 matches
Mail list logo