Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-05-01 Thread Daniel Gryniewicz

On 05/01/2018 01:43 PM, Oliver Freyermuth wrote:

Hi all,

Am 17.04.2018 um 19:38 schrieb Oliver Freyermuth:

Am 17.04.2018 um 19:35 schrieb Daniel Gryniewicz:

On 04/17/2018 11:40 AM, Oliver Freyermuth wrote:

Am 17.04.2018 um 17:34 schrieb Paul Emmerich:



[...]


  We are right now using the packages from https://eu.ceph.com/nfs-ganesha/ 
 since we would like not to have to build NFS 
Ganesha against Ceph ourselves,
  but would love to just follow upstream to save the maintenance burden. 
Are you building packages yourself, or using a repo maintained upstream?


We are building it ourselves. We plan to soon publish our own repository for it.


This is certainly interesting for us!
Ideally, we would love something with a similar maintenance-promise as the 
eu.ceph.com repositories, always in sync with upstream's ceph releases.
I still hope NFS Ganesha will play a larger role in Mimic and later releases 
any maybe become a more integral part of the Ceph ecosystem.

This is also the main problem preventing it from building it ourselves - if we 
had to do it, we would have to build Ceph and NFS Ganesha, e.g. on the Open 
Build Service,
and closely monitor upstream ourselves.

Many thanks and looking forward towards the publishing of your repo,
 Oliver


2.6 packages are now up on download.ceph.com, thanks to Ali.


Many thanks to Ali also from my side!
We'll schedule the upgrade within the next days and observe closely whether the 
problem reappears.

Many thanks and all the best,
Oliver


Finally, after watching the situation for ~2 weeks after the update, I can say:
Upgrading NFS Ganesha to 2.6 appears to have solved the issue.
I did not observe any further lockup like this anymore, even though our users 
tried their best to re-trigger the issue!
So it seems this has been solved for good by the update :-).

So many thanks and all the best,
Oliver


Thanks, that's good to know.

Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-05-01 Thread Oliver Freyermuth
Hi all,

Am 17.04.2018 um 19:38 schrieb Oliver Freyermuth:
> Am 17.04.2018 um 19:35 schrieb Daniel Gryniewicz:
>> On 04/17/2018 11:40 AM, Oliver Freyermuth wrote:
>>> Am 17.04.2018 um 17:34 schrieb Paul Emmerich:
>> 
>>> [...]

  We are right now using the packages from 
 https://eu.ceph.com/nfs-ganesha/  since 
 we would like not to have to build NFS Ganesha against Ceph ourselves,
  but would love to just follow upstream to save the maintenance 
 burden. Are you building packages yourself, or using a repo maintained 
 upstream?


 We are building it ourselves. We plan to soon publish our own repository 
 for it.
>>>
>>> This is certainly interesting for us!
>>> Ideally, we would love something with a similar maintenance-promise as the 
>>> eu.ceph.com repositories, always in sync with upstream's ceph releases.
>>> I still hope NFS Ganesha will play a larger role in Mimic and later 
>>> releases any maybe become a more integral part of the Ceph ecosystem.
>>>
>>> This is also the main problem preventing it from building it ourselves - if 
>>> we had to do it, we would have to build Ceph and NFS Ganesha, e.g. on the 
>>> Open Build Service,
>>> and closely monitor upstream ourselves.
>>>
>>> Many thanks and looking forward towards the publishing of your repo,
>>> Oliver
>>
>> 2.6 packages are now up on download.ceph.com, thanks to Ali.
> 
> Many thanks to Ali also from my side! 
> We'll schedule the upgrade within the next days and observe closely whether 
> the problem reappears. 
> 
> Many thanks and all the best,
>   Oliver

Finally, after watching the situation for ~2 weeks after the update, I can say:
Upgrading NFS Ganesha to 2.6 appears to have solved the issue. 
I did not observe any further lockup like this anymore, even though our users 
tried their best to re-trigger the issue! 
So it seems this has been solved for good by the update :-). 

So many thanks and all the best,
Oliver

> 
>>
>> Daniel
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-17 Thread Oliver Freyermuth
Am 17.04.2018 um 19:35 schrieb Daniel Gryniewicz:
> On 04/17/2018 11:40 AM, Oliver Freyermuth wrote:
>> Am 17.04.2018 um 17:34 schrieb Paul Emmerich:
> 
>> [...]
>>>
>>>  We are right now using the packages from 
>>> https://eu.ceph.com/nfs-ganesha/  since 
>>> we would like not to have to build NFS Ganesha against Ceph ourselves,
>>>  but would love to just follow upstream to save the maintenance burden. 
>>> Are you building packages yourself, or using a repo maintained upstream?
>>>
>>>
>>> We are building it ourselves. We plan to soon publish our own repository 
>>> for it.
>>
>> This is certainly interesting for us!
>> Ideally, we would love something with a similar maintenance-promise as the 
>> eu.ceph.com repositories, always in sync with upstream's ceph releases.
>> I still hope NFS Ganesha will play a larger role in Mimic and later releases 
>> any maybe become a more integral part of the Ceph ecosystem.
>>
>> This is also the main problem preventing it from building it ourselves - if 
>> we had to do it, we would have to build Ceph and NFS Ganesha, e.g. on the 
>> Open Build Service,
>> and closely monitor upstream ourselves.
>>
>> Many thanks and looking forward towards the publishing of your repo,
>> Oliver
> 
> 2.6 packages are now up on download.ceph.com, thanks to Ali.

Many thanks to Ali also from my side! 
We'll schedule the upgrade within the next days and observe closely whether the 
problem reappears. 

Many thanks and all the best,
Oliver

> 
> Daniel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-17 Thread Daniel Gryniewicz

On 04/17/2018 11:40 AM, Oliver Freyermuth wrote:

Am 17.04.2018 um 17:34 schrieb Paul Emmerich:



[...]


 We are right now using the packages from https://eu.ceph.com/nfs-ganesha/ 
 since we would like not to have to build NFS 
Ganesha against Ceph ourselves,
 but would love to just follow upstream to save the maintenance burden. Are 
you building packages yourself, or using a repo maintained upstream?


We are building it ourselves. We plan to soon publish our own repository for it.


This is certainly interesting for us!
Ideally, we would love something with a similar maintenance-promise as the 
eu.ceph.com repositories, always in sync with upstream's ceph releases.
I still hope NFS Ganesha will play a larger role in Mimic and later releases 
any maybe become a more integral part of the Ceph ecosystem.

This is also the main problem preventing it from building it ourselves - if we 
had to do it, we would have to build Ceph and NFS Ganesha, e.g. on the Open 
Build Service,
and closely monitor upstream ourselves.

Many thanks and looking forward towards the publishing of your repo,
Oliver


2.6 packages are now up on download.ceph.com, thanks to Ali.

Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-17 Thread Oliver Freyermuth
Am 17.04.2018 um 17:34 schrieb Paul Emmerich:
> 
> 
> 2018-04-16 18:24 GMT+02:00 Oliver Freyermuth  >:
> 
> Hi Paul,
> 
> Am 16.04.2018 um 17:51 schrieb Paul Emmerich:
> > Hi,
> > 
> > can you try to get a stack trace from ganesha (with gdb or from procfs) 
> when it's stuck?
> 
> I can try, as soon as it happens again. The problem is that it's not 
> fully stuck - only the other clients are stuck when trying to access the 
> directory in question,
> since it seems the Ganesha server holds a lock and never gives it back. 
> Not sure if a backtrace can shed much light on this.
> It seems the NFS server itself is completely happy all the time, it only 
> gets problems when I want to stop / restart it.
> 
> > Oh, I assumed the whole process got stuck because you mentioned it not 
> > terminating on SIGTERM. But then it just doesn't handle stuck requests 
> > gracefully there.

Yes, sorry, my statement was unclear - this only happened when I actually 
wanted to issue:
systemctl stop nfs-ganesha
Only then, nfs-ganesha did hang - before, it was running and just not 
relinquishing the lock. 

>  
[...]  
> 
> We are right now using the packages from https://eu.ceph.com/nfs-ganesha/ 
>  since we would like not to have to build 
> NFS Ganesha against Ceph ourselves,
> but would love to just follow upstream to save the maintenance burden. 
> Are you building packages yourself, or using a repo maintained upstream?
> 
> 
> We are building it ourselves. We plan to soon publish our own repository for 
> it.

This is certainly interesting for us!
Ideally, we would love something with a similar maintenance-promise as the 
eu.ceph.com repositories, always in sync with upstream's ceph releases. 
I still hope NFS Ganesha will play a larger role in Mimic and later releases 
any maybe become a more integral part of the Ceph ecosystem. 

This is also the main problem preventing it from building it ourselves - if we 
had to do it, we would have to build Ceph and NFS Ganesha, e.g. on the Open 
Build Service,
and closely monitor upstream ourselves. 

Many thanks and looking forward towards the publishing of your repo,
Oliver
> 
> 
> Paul
>  
> 
> 
> I see there have been some locking fixes upstream:
> 
> https://github.com/nfs-ganesha/nfs-ganesha/search?q=ceph+lock=Commits 
> 
> but I'm not expert enough to deduce whether they apply to our observation.
> 
> Cheers and thanks,
>         Oliver
> 
> >
> > Paul
> >
> > 2018-04-16 17:30 GMT+02:00 Oliver Freyermuth 
>  
>  >>:
> > 
> >     Am 16.04.2018 um 08:58 schrieb Oliver Freyermuth:
> >     > Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth:
> >     >> Am 15.04.2018 um 23:04 schrieb John Spray:
> >     >>> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
> >     >>>    >> wrote:
> >      Dear Cephalopodians,
> >     
> >      in our cluster (CentOS 7.4, EC Pool, Snappy compression, 
> Luminous 12.2.4),
> >      we often have all (~40) clients accessing one file in readonly 
> mode, even with multiple processes per client doing that.
> >     
> >      Sometimes (I do not yet know when, nor why!) the MDS ends up 
> in a situation like:
> >      ---
> >      2018-04-13 18:08:34.37 7f1ce4472700  0 
> log_channel(cluster) log [WRN] : 292 slow requests, 5 included below; oldest 
> blocked for > 1745.864417 secs
> >      2018-04-13 18:08:34.378900 7f1ce4472700  0 
> log_channel(cluster) log [WRN] : slow request 960.563534 seconds old, 
> received at 2018-04-13 17:52:33.815273: client_request(client.34720:16487379 
> getattr pAsLsXsFs #0x109ff6d 2018-04-13 17:52:33.814904 caller_uid=94894, 
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >      2018-04-13 18:08:34.378904 7f1ce4472700  0 
> log_channel(cluster) log [WRN] : slow request 30.636678 seconds old, received 
> at 2018-04-13 18:08:03.742128: client_request(client.34302:16453640 getattr 
> pAsLsXsFs #0x109ff6d 2018-04-13 18:08:03.741630 caller_uid=94894, 
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >      2018-04-13 18:08:34.378908 7f1ce4472700  0 
> log_channel(cluster) log [WRN] : slow request 972.648926 seconds old, 
> received at 2018-04-13 17:52:21.729881: client_request(client.34720:16487334 
> lookup #0x101fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450 
> 

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-17 Thread Paul Emmerich
2018-04-16 18:24 GMT+02:00 Oliver Freyermuth 
:

> Hi Paul,
>
> Am 16.04.2018 um 17:51 schrieb Paul Emmerich:
> > Hi,
> >
> > can you try to get a stack trace from ganesha (with gdb or from procfs)
> when it's stuck?
>
> I can try, as soon as it happens again. The problem is that it's not fully
> stuck - only the other clients are stuck when trying to access the
> directory in question,
> since it seems the Ganesha server holds a lock and never gives it back.
> Not sure if a backtrace can shed much light on this.
> It seems the NFS server itself is completely happy all the time, it only
> gets problems when I want to stop / restart it.
>

Oh, I assumed the whole process got stuck because you mentioned it not
terminating on SIGTERM. But then it just doesn't handle stuck requests
gracefully there.


>
> But of course, I will try to get a stacktrace next time.
>
> > Also, try to upgrade to ganesha 2.6. I'm running a bigger deployment
> with ~30 ganesha 2.6
> > gateways that are quite stable so far.
>
> That's good news! Do you also have parallel access from ceph-fuse clients
> to the same CephFS? Is this also Luminous?
>

yes and yes


> We are right now using the packages from https://eu.ceph.com/nfs-ganesha/
> since we would like not to have to build NFS Ganesha against Ceph ourselves,
> but would love to just follow upstream to save the maintenance burden. Are
> you building packages yourself, or using a repo maintained upstream?
>

We are building it ourselves. We plan to soon publish our own repository
for it.


Paul


>
> I see there have been some locking fixes upstream:
> https://github.com/nfs-ganesha/nfs-ganesha/search?q=ceph+lock=Commits
> but I'm not expert enough to deduce whether they apply to our observation.
>
> Cheers and thanks,
> Oliver
>
> >
> > Paul
> >
> > 2018-04-16 17:30 GMT+02:00 Oliver Freyermuth <
> freyerm...@physik.uni-bonn.de >:
> >
> > Am 16.04.2018 um 08:58 schrieb Oliver Freyermuth:
> > > Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth:
> > >> Am 15.04.2018 um 23:04 schrieb John Spray:
> > >>> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
> > >>>  bonn.de>> wrote:
> >  Dear Cephalopodians,
> > 
> >  in our cluster (CentOS 7.4, EC Pool, Snappy compression,
> Luminous 12.2.4),
> >  we often have all (~40) clients accessing one file in readonly
> mode, even with multiple processes per client doing that.
> > 
> >  Sometimes (I do not yet know when, nor why!) the MDS ends up in
> a situation like:
> >  ---
> >  2018-04-13 18:08:34.37 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : 292 slow requests, 5 included below; oldest blocked for >
> 1745.864417 secs
> >  2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : slow request 960.563534 seconds old, received at 2018-04-13
> 17:52:33.815273: client_request(client.34720:16487379 getattr pAsLsXsFs
> #0x109ff6d 2018-04-13 17:52:33.814904 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >  2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : slow request 30.636678 seconds old, received at 2018-04-13
> 18:08:03.742128: client_request(client.34302:16453640 getattr pAsLsXsFs
> #0x109ff6d 2018-04-13 18:08:03.741630 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >  2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : slow request 972.648926 seconds old, received at 2018-04-13
> 17:52:21.729881: client_request(client.34720:16487334 lookup
> #0x101fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450
> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
> >  2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : slow request 1685.953657 seconds old, received at 2018-04-13
> 17:40:28.425149: client_request(client.34810:16564864 lookup
> #0x101fcab/sometarball.tar.gz 2018-04-13 17:40:28.424961
> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
> >  2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : slow request 1552.743795 seconds old, received at 2018-04-13
> 17:42:41.635012: client_request(client.34302:16453566 getattr pAsLsXsFs
> #0x109ff6d 2018-04-13 17:42:41.634726 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >  ---
> >  As you can see (oldest blocked for > 1745.864417 secs) it stays
> in that situation for quite a while.
> >  The number of blocked requests is also not decreasing, but
> instead slowly increasing whenever a new request is added to 

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-16 Thread Oliver Freyermuth
Hi Paul,

Am 16.04.2018 um 17:51 schrieb Paul Emmerich:
> Hi,
> 
> can you try to get a stack trace from ganesha (with gdb or from procfs) when 
> it's stuck?

I can try, as soon as it happens again. The problem is that it's not fully 
stuck - only the other clients are stuck when trying to access the directory in 
question,
since it seems the Ganesha server holds a lock and never gives it back. Not 
sure if a backtrace can shed much light on this. 
It seems the NFS server itself is completely happy all the time, it only gets 
problems when I want to stop / restart it. 

But of course, I will try to get a stacktrace next time. 

> Also, try to upgrade to ganesha 2.6. I'm running a bigger deployment with ~30 
> ganesha 2.6
> gateways that are quite stable so far.

That's good news! Do you also have parallel access from ceph-fuse clients to 
the same CephFS? Is this also Luminous? 
We are right now using the packages from https://eu.ceph.com/nfs-ganesha/ since 
we would like not to have to build NFS Ganesha against Ceph ourselves,
but would love to just follow upstream to save the maintenance burden. Are you 
building packages yourself, or using a repo maintained upstream? 

I see there have been some locking fixes upstream:
https://github.com/nfs-ganesha/nfs-ganesha/search?q=ceph+lock=Commits
but I'm not expert enough to deduce whether they apply to our observation. 

Cheers and thanks,
Oliver

> 
> Paul
> 
> 2018-04-16 17:30 GMT+02:00 Oliver Freyermuth  >:
> 
> Am 16.04.2018 um 08:58 schrieb Oliver Freyermuth:
> > Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth:
> >> Am 15.04.2018 um 23:04 schrieb John Spray:
> >>> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
> >>>  > wrote:
>  Dear Cephalopodians,
> 
>  in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous 
> 12.2.4),
>  we often have all (~40) clients accessing one file in readonly mode, 
> even with multiple processes per client doing that.
> 
>  Sometimes (I do not yet know when, nor why!) the MDS ends up in a 
> situation like:
>  ---
>  2018-04-13 18:08:34.37 7f1ce4472700  0 log_channel(cluster) log 
> [WRN] : 292 slow requests, 5 included below; oldest blocked for > 1745.864417 
> secs
>  2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster) log 
> [WRN] : slow request 960.563534 seconds old, received at 2018-04-13 
> 17:52:33.815273: client_request(client.34720:16487379 getattr pAsLsXsFs 
> #0x109ff6d 2018-04-13 17:52:33.814904 caller_uid=94894, 
> caller_gid=513{513,}) currently failed to rdlock, waiting
>  2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster) log 
> [WRN] : slow request 30.636678 seconds old, received at 2018-04-13 
> 18:08:03.742128: client_request(client.34302:16453640 getattr pAsLsXsFs 
> #0x109ff6d 2018-04-13 18:08:03.741630 caller_uid=94894, 
> caller_gid=513{513,}) currently failed to rdlock, waiting
>  2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster) log 
> [WRN] : slow request 972.648926 seconds old, received at 2018-04-13 
> 17:52:21.729881: client_request(client.34720:16487334 lookup 
> #0x101fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450 
> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
>  2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster) log 
> [WRN] : slow request 1685.953657 seconds old, received at 2018-04-13 
> 17:40:28.425149: client_request(client.34810:16564864 lookup 
> #0x101fcab/sometarball.tar.gz 2018-04-13 17:40:28.424961 
> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
>  2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster) log 
> [WRN] : slow request 1552.743795 seconds old, received at 2018-04-13 
> 17:42:41.635012: client_request(client.34302:16453566 getattr pAsLsXsFs 
> #0x109ff6d 2018-04-13 17:42:41.634726 caller_uid=94894, 
> caller_gid=513{513,}) currently failed to rdlock, waiting
>  ---
>  As you can see (oldest blocked for > 1745.864417 secs) it stays in 
> that situation for quite a while.
>  The number of blocked requests is also not decreasing, but instead 
> slowly increasing whenever a new request is added to the queue.
> 
>  We have a setup of one active MDS, a standby-replay, and a standby.
>  Triggering a failover does not help, it only resets the "oldest 
> blocked" time.
> >>>
> >>> Sounds like a client issue (a client is holding a lock on a file but
> >>> failing to relinquish it for another client's request to be
> >>> processed).
> >>>
> >>> Are these kernel (and what 

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-16 Thread Paul Emmerich
Hi,

can you try to get a stack trace from ganesha (with gdb or from procfs)
when it's stuck?
Also, try to upgrade to ganesha 2.6. I'm running a bigger deployment with
~30 ganesha 2.6
gateways that are quite stable so far.

Paul

2018-04-16 17:30 GMT+02:00 Oliver Freyermuth 
:

> Am 16.04.2018 um 08:58 schrieb Oliver Freyermuth:
> > Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth:
> >> Am 15.04.2018 um 23:04 schrieb John Spray:
> >>> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
> >>>  wrote:
>  Dear Cephalopodians,
> 
>  in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous
> 12.2.4),
>  we often have all (~40) clients accessing one file in readonly mode,
> even with multiple processes per client doing that.
> 
>  Sometimes (I do not yet know when, nor why!) the MDS ends up in a
> situation like:
>  ---
>  2018-04-13 18:08:34.37 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : 292 slow requests, 5 included below; oldest blocked for >
> 1745.864417 secs
>  2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : slow request 960.563534 seconds old, received at 2018-04-13
> 17:52:33.815273: client_request(client.34720:16487379 getattr pAsLsXsFs
> #0x109ff6d 2018-04-13 17:52:33.814904 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
>  2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : slow request 30.636678 seconds old, received at 2018-04-13
> 18:08:03.742128: client_request(client.34302:16453640 getattr pAsLsXsFs
> #0x109ff6d 2018-04-13 18:08:03.741630 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
>  2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : slow request 972.648926 seconds old, received at 2018-04-13
> 17:52:21.729881: client_request(client.34720:16487334 lookup
> #0x101fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450
> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
>  2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : slow request 1685.953657 seconds old, received at 2018-04-13
> 17:40:28.425149: client_request(client.34810:16564864 lookup
> #0x101fcab/sometarball.tar.gz 2018-04-13 17:40:28.424961
> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
>  2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : slow request 1552.743795 seconds old, received at 2018-04-13
> 17:42:41.635012: client_request(client.34302:16453566 getattr pAsLsXsFs
> #0x109ff6d 2018-04-13 17:42:41.634726 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
>  ---
>  As you can see (oldest blocked for > 1745.864417 secs) it stays in
> that situation for quite a while.
>  The number of blocked requests is also not decreasing, but instead
> slowly increasing whenever a new request is added to the queue.
> 
>  We have a setup of one active MDS, a standby-replay, and a standby.
>  Triggering a failover does not help, it only resets the "oldest
> blocked" time.
> >>>
> >>> Sounds like a client issue (a client is holding a lock on a file but
> >>> failing to relinquish it for another client's request to be
> >>> processed).
> >>>
> >>> Are these kernel (and what version?) or fuse clients?
> >>
> >> The full cluster is running with Fuse clients, all on 12.2.4.
> >> Additionally, there are 6 NFS Ganesha servers using FSAL_CEPH, i.e.
> libcephfs.
> >>
> >> Cheers,
> >>  Oliver
> >
> > Related to that: In case it happens again, which it surely will sooner
> or later, how can I diagnose
> > which client is holding a lock and not relinquishing it?
> > Is there a way to dump all held locks, ideally with the time period how
> long they were held?
> >
> > ceph daemon mds.XXX help
> > does not yield anything obvious.
> >
> > Cheers and thanks,
> >   Oliver
>
> It happened again - and I managed to track down the client causing it:
> Dumping all operations in progress on the MDS, and looking for machines
> without operations which were waiting for an rdlock,
> one NFS Ganesha server was not affected.
> Unmounting the NFS share on the clients did not have any effect - things
> stayed in "stuck" state.
>
> I then stopped the NFS Ganesha server. This took quite a while, until
> systemd noticed it's taking too long, and force-killed the server.
> Shortly after, even before the NFS Ganesha libchepfs client was evicted,
> the MDS got unstuck, processed all requests within about 2 seconds, and
> everything was fine again.
>
> After that, the dead NFS Ganesha client was evicted, and I just restarted
> the NFS Ganesha server.
>
> Since this appears to be reproducible (even though we still 

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-16 Thread Oliver Freyermuth
Am 16.04.2018 um 08:58 schrieb Oliver Freyermuth:
> Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth:
>> Am 15.04.2018 um 23:04 schrieb John Spray:
>>> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
>>>  wrote:
 Dear Cephalopodians,

 in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous 12.2.4),
 we often have all (~40) clients accessing one file in readonly mode, even 
 with multiple processes per client doing that.

 Sometimes (I do not yet know when, nor why!) the MDS ends up in a 
 situation like:
 ---
 2018-04-13 18:08:34.37 7f1ce4472700  0 log_channel(cluster) log [WRN] 
 : 292 slow requests, 5 included below; oldest blocked for > 1745.864417 
 secs
 2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster) log [WRN] 
 : slow request 960.563534 seconds old, received at 2018-04-13 
 17:52:33.815273: client_request(client.34720:16487379 getattr pAsLsXsFs 
 #0x109ff6d 2018-04-13 17:52:33.814904 caller_uid=94894, 
 caller_gid=513{513,}) currently failed to rdlock, waiting
 2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster) log [WRN] 
 : slow request 30.636678 seconds old, received at 2018-04-13 
 18:08:03.742128: client_request(client.34302:16453640 getattr pAsLsXsFs 
 #0x109ff6d 2018-04-13 18:08:03.741630 caller_uid=94894, 
 caller_gid=513{513,}) currently failed to rdlock, waiting
 2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster) log [WRN] 
 : slow request 972.648926 seconds old, received at 2018-04-13 
 17:52:21.729881: client_request(client.34720:16487334 lookup 
 #0x101fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450 
 caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
 2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster) log [WRN] 
 : slow request 1685.953657 seconds old, received at 2018-04-13 
 17:40:28.425149: client_request(client.34810:16564864 lookup 
 #0x101fcab/sometarball.tar.gz 2018-04-13 17:40:28.424961 
 caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
 2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster) log [WRN] 
 : slow request 1552.743795 seconds old, received at 2018-04-13 
 17:42:41.635012: client_request(client.34302:16453566 getattr pAsLsXsFs 
 #0x109ff6d 2018-04-13 17:42:41.634726 caller_uid=94894, 
 caller_gid=513{513,}) currently failed to rdlock, waiting
 ---
 As you can see (oldest blocked for > 1745.864417 secs) it stays in that 
 situation for quite a while.
 The number of blocked requests is also not decreasing, but instead slowly 
 increasing whenever a new request is added to the queue.

 We have a setup of one active MDS, a standby-replay, and a standby.
 Triggering a failover does not help, it only resets the "oldest blocked" 
 time.
>>>
>>> Sounds like a client issue (a client is holding a lock on a file but
>>> failing to relinquish it for another client's request to be
>>> processed).
>>>
>>> Are these kernel (and what version?) or fuse clients?
>>
>> The full cluster is running with Fuse clients, all on 12.2.4. 
>> Additionally, there are 6 NFS Ganesha servers using FSAL_CEPH, i.e. 
>> libcephfs. 
>>
>> Cheers,
>>  Oliver
> 
> Related to that: In case it happens again, which it surely will sooner or 
> later, how can I diagnose
> which client is holding a lock and not relinquishing it? 
> Is there a way to dump all held locks, ideally with the time period how long 
> they were held? 
> 
> ceph daemon mds.XXX help
> does not yield anything obvious. 
> 
> Cheers and thanks,
>   Oliver

It happened again - and I managed to track down the client causing it: 
Dumping all operations in progress on the MDS, and looking for machines without 
operations which were waiting for an rdlock,
one NFS Ganesha server was not affected. 
Unmounting the NFS share on the clients did not have any effect - things stayed 
in "stuck" state. 

I then stopped the NFS Ganesha server. This took quite a while, until systemd 
noticed it's taking too long, and force-killed the server. 
Shortly after, even before the NFS Ganesha libchepfs client was evicted, the 
MDS got unstuck, processed all requests within about 2 seconds, and everything 
was fine again. 

After that, the dead NFS Ganesha client was evicted, and I just restarted the 
NFS Ganesha server. 

Since this appears to be reproducible (even though we still do not know how 
exactly), this is a rather ugly issue. It seems NFS Ganesha with libcephfs is 
holding locks
and not returning them. We are using the packages from:
https://eu.ceph.com/nfs-ganesha/
Newer versions are available upstream, but I don't know if they fix this 
critial issue. 

Is there 

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-16 Thread Oliver Freyermuth
Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth:
> Am 15.04.2018 um 23:04 schrieb John Spray:
>> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
>>  wrote:
>>> Dear Cephalopodians,
>>>
>>> in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous 12.2.4),
>>> we often have all (~40) clients accessing one file in readonly mode, even 
>>> with multiple processes per client doing that.
>>>
>>> Sometimes (I do not yet know when, nor why!) the MDS ends up in a situation 
>>> like:
>>> ---
>>> 2018-04-13 18:08:34.37 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>>> 292 slow requests, 5 included below; oldest blocked for > 1745.864417 secs
>>> 2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>>> slow request 960.563534 seconds old, received at 2018-04-13 
>>> 17:52:33.815273: client_request(client.34720:16487379 getattr pAsLsXsFs 
>>> #0x109ff6d 2018-04-13 17:52:33.814904 caller_uid=94894, 
>>> caller_gid=513{513,}) currently failed to rdlock, waiting
>>> 2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>>> slow request 30.636678 seconds old, received at 2018-04-13 18:08:03.742128: 
>>> client_request(client.34302:16453640 getattr pAsLsXsFs #0x109ff6d 
>>> 2018-04-13 18:08:03.741630 caller_uid=94894, caller_gid=513{513,}) 
>>> currently failed to rdlock, waiting
>>> 2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>>> slow request 972.648926 seconds old, received at 2018-04-13 
>>> 17:52:21.729881: client_request(client.34720:16487334 lookup 
>>> #0x101fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450 
>>> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
>>> 2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>>> slow request 1685.953657 seconds old, received at 2018-04-13 
>>> 17:40:28.425149: client_request(client.34810:16564864 lookup 
>>> #0x101fcab/sometarball.tar.gz 2018-04-13 17:40:28.424961 
>>> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
>>> 2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>>> slow request 1552.743795 seconds old, received at 2018-04-13 
>>> 17:42:41.635012: client_request(client.34302:16453566 getattr pAsLsXsFs 
>>> #0x109ff6d 2018-04-13 17:42:41.634726 caller_uid=94894, 
>>> caller_gid=513{513,}) currently failed to rdlock, waiting
>>> ---
>>> As you can see (oldest blocked for > 1745.864417 secs) it stays in that 
>>> situation for quite a while.
>>> The number of blocked requests is also not decreasing, but instead slowly 
>>> increasing whenever a new request is added to the queue.
>>>
>>> We have a setup of one active MDS, a standby-replay, and a standby.
>>> Triggering a failover does not help, it only resets the "oldest blocked" 
>>> time.
>>
>> Sounds like a client issue (a client is holding a lock on a file but
>> failing to relinquish it for another client's request to be
>> processed).
>>
>> Are these kernel (and what version?) or fuse clients?
> 
> The full cluster is running with Fuse clients, all on 12.2.4. 
> Additionally, there are 6 NFS Ganesha servers using FSAL_CEPH, i.e. 
> libcephfs. 
> 
> Cheers,
>   Oliver

Related to that: In case it happens again, which it surely will sooner or 
later, how can I diagnose
which client is holding a lock and not relinquishing it? 
Is there a way to dump all held locks, ideally with the time period how long 
they were held? 

ceph daemon mds.XXX help
does not yield anything obvious. 

Cheers and thanks,
Oliver

> 
>>
>> John
>>
>>>
>>> I checked the following things on the active MDS:
>>> ---
>>> # ceph daemon mds.mon001 objecter_requests
>>> {
>>> "ops": [],
>>> "linger_ops": [],
>>> "pool_ops": [],
>>> "pool_stat_ops": [],
>>> "statfs_ops": [],
>>> "command_ops": []
>>> }
>>> # ceph daemon mds.mon001 ops | grep event | grep -v "initiated" | grep -v 
>>> "failed to rdlock" | grep -v events
>>> => no output, only "initiated" and "rdlock" are in the queue.
>>> ---
>>>
>>> There's also almost no CPU load, almost no other I/O, and ceph is 
>>> deep-scrubbing ~pg (this also finishes and the next pg is scrubbed fine),
>>> and the scrubbing is not even happening in the metadata pool (easy to see 
>>> in the Luminous dashboard):
>>> ---
>>> # ceph -s
>>>   cluster:
>>> id: some_funny_hash
>>> health: HEALTH_WARN
>>> 1 MDSs report slow requests
>>>
>>>   services:
>>> mon: 3 daemons, quorum mon003,mon001,mon002
>>> mgr: mon001(active), standbys: mon002, mon003
>>> mds: cephfs_baf-1/1/1 up  {0=mon001=up:active}, 1 up:standby-replay, 

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-15 Thread Oliver Freyermuth
Am 15.04.2018 um 23:04 schrieb John Spray:
> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
>  wrote:
>> Dear Cephalopodians,
>>
>> in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous 12.2.4),
>> we often have all (~40) clients accessing one file in readonly mode, even 
>> with multiple processes per client doing that.
>>
>> Sometimes (I do not yet know when, nor why!) the MDS ends up in a situation 
>> like:
>> ---
>> 2018-04-13 18:08:34.37 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> 292 slow requests, 5 included below; oldest blocked for > 1745.864417 secs
>> 2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> slow request 960.563534 seconds old, received at 2018-04-13 17:52:33.815273: 
>> client_request(client.34720:16487379 getattr pAsLsXsFs #0x109ff6d 
>> 2018-04-13 17:52:33.814904 caller_uid=94894, caller_gid=513{513,}) currently 
>> failed to rdlock, waiting
>> 2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> slow request 30.636678 seconds old, received at 2018-04-13 18:08:03.742128: 
>> client_request(client.34302:16453640 getattr pAsLsXsFs #0x109ff6d 
>> 2018-04-13 18:08:03.741630 caller_uid=94894, caller_gid=513{513,}) currently 
>> failed to rdlock, waiting
>> 2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> slow request 972.648926 seconds old, received at 2018-04-13 17:52:21.729881: 
>> client_request(client.34720:16487334 lookup 
>> #0x101fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450 
>> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
>> 2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> slow request 1685.953657 seconds old, received at 2018-04-13 
>> 17:40:28.425149: client_request(client.34810:16564864 lookup 
>> #0x101fcab/sometarball.tar.gz 2018-04-13 17:40:28.424961 
>> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
>> 2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> slow request 1552.743795 seconds old, received at 2018-04-13 
>> 17:42:41.635012: client_request(client.34302:16453566 getattr pAsLsXsFs 
>> #0x109ff6d 2018-04-13 17:42:41.634726 caller_uid=94894, 
>> caller_gid=513{513,}) currently failed to rdlock, waiting
>> ---
>> As you can see (oldest blocked for > 1745.864417 secs) it stays in that 
>> situation for quite a while.
>> The number of blocked requests is also not decreasing, but instead slowly 
>> increasing whenever a new request is added to the queue.
>>
>> We have a setup of one active MDS, a standby-replay, and a standby.
>> Triggering a failover does not help, it only resets the "oldest blocked" 
>> time.
> 
> Sounds like a client issue (a client is holding a lock on a file but
> failing to relinquish it for another client's request to be
> processed).
> 
> Are these kernel (and what version?) or fuse clients?

The full cluster is running with Fuse clients, all on 12.2.4. 
Additionally, there are 6 NFS Ganesha servers using FSAL_CEPH, i.e. libcephfs. 

Cheers,
Oliver

> 
> John
> 
>>
>> I checked the following things on the active MDS:
>> ---
>> # ceph daemon mds.mon001 objecter_requests
>> {
>> "ops": [],
>> "linger_ops": [],
>> "pool_ops": [],
>> "pool_stat_ops": [],
>> "statfs_ops": [],
>> "command_ops": []
>> }
>> # ceph daemon mds.mon001 ops | grep event | grep -v "initiated" | grep -v 
>> "failed to rdlock" | grep -v events
>> => no output, only "initiated" and "rdlock" are in the queue.
>> ---
>>
>> There's also almost no CPU load, almost no other I/O, and ceph is 
>> deep-scrubbing ~pg (this also finishes and the next pg is scrubbed fine),
>> and the scrubbing is not even happening in the metadata pool (easy to see in 
>> the Luminous dashboard):
>> ---
>> # ceph -s
>>   cluster:
>> id: some_funny_hash
>> health: HEALTH_WARN
>> 1 MDSs report slow requests
>>
>>   services:
>> mon: 3 daemons, quorum mon003,mon001,mon002
>> mgr: mon001(active), standbys: mon002, mon003
>> mds: cephfs_baf-1/1/1 up  {0=mon001=up:active}, 1 up:standby-replay, 1 
>> up:standby
>> osd: 196 osds: 196 up, 196 in
>>
>>   data:
>> pools:   2 pools, 4224 pgs
>> objects: 15649k objects, 61761 GB
>> usage:   114 TB used, 586 TB / 700 TB avail
>> pgs: 4223 active+clean
>>  1active+clean+scrubbing+deep
>>
>>   io:
>> client:   175 kB/s rd, 3 op/s rd, 0 op/s wr
>> ---
>>
>> Does anybody have any idea what's going on here?
>>
>> Yesterday, this also happened, but 

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-15 Thread John Spray
On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
 wrote:
> Dear Cephalopodians,
>
> in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous 12.2.4),
> we often have all (~40) clients accessing one file in readonly mode, even 
> with multiple processes per client doing that.
>
> Sometimes (I do not yet know when, nor why!) the MDS ends up in a situation 
> like:
> ---
> 2018-04-13 18:08:34.37 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> 292 slow requests, 5 included below; oldest blocked for > 1745.864417 secs
> 2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 960.563534 seconds old, received at 2018-04-13 17:52:33.815273: 
> client_request(client.34720:16487379 getattr pAsLsXsFs #0x109ff6d 
> 2018-04-13 17:52:33.814904 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 30.636678 seconds old, received at 2018-04-13 18:08:03.742128: 
> client_request(client.34302:16453640 getattr pAsLsXsFs #0x109ff6d 
> 2018-04-13 18:08:03.741630 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 972.648926 seconds old, received at 2018-04-13 17:52:21.729881: 
> client_request(client.34720:16487334 lookup #0x101fcab/sometarball.tar.gz 
> 2018-04-13 17:52:21.729450 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 1685.953657 seconds old, received at 2018-04-13 17:40:28.425149: 
> client_request(client.34810:16564864 lookup #0x101fcab/sometarball.tar.gz 
> 2018-04-13 17:40:28.424961 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 1552.743795 seconds old, received at 2018-04-13 17:42:41.635012: 
> client_request(client.34302:16453566 getattr pAsLsXsFs #0x109ff6d 
> 2018-04-13 17:42:41.634726 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> ---
> As you can see (oldest blocked for > 1745.864417 secs) it stays in that 
> situation for quite a while.
> The number of blocked requests is also not decreasing, but instead slowly 
> increasing whenever a new request is added to the queue.
>
> We have a setup of one active MDS, a standby-replay, and a standby.
> Triggering a failover does not help, it only resets the "oldest blocked" time.

Sounds like a client issue (a client is holding a lock on a file but
failing to relinquish it for another client's request to be
processed).

Are these kernel (and what version?) or fuse clients?

John

>
> I checked the following things on the active MDS:
> ---
> # ceph daemon mds.mon001 objecter_requests
> {
> "ops": [],
> "linger_ops": [],
> "pool_ops": [],
> "pool_stat_ops": [],
> "statfs_ops": [],
> "command_ops": []
> }
> # ceph daemon mds.mon001 ops | grep event | grep -v "initiated" | grep -v 
> "failed to rdlock" | grep -v events
> => no output, only "initiated" and "rdlock" are in the queue.
> ---
>
> There's also almost no CPU load, almost no other I/O, and ceph is 
> deep-scrubbing ~pg (this also finishes and the next pg is scrubbed fine),
> and the scrubbing is not even happening in the metadata pool (easy to see in 
> the Luminous dashboard):
> ---
> # ceph -s
>   cluster:
> id: some_funny_hash
> health: HEALTH_WARN
> 1 MDSs report slow requests
>
>   services:
> mon: 3 daemons, quorum mon003,mon001,mon002
> mgr: mon001(active), standbys: mon002, mon003
> mds: cephfs_baf-1/1/1 up  {0=mon001=up:active}, 1 up:standby-replay, 1 
> up:standby
> osd: 196 osds: 196 up, 196 in
>
>   data:
> pools:   2 pools, 4224 pgs
> objects: 15649k objects, 61761 GB
> usage:   114 TB used, 586 TB / 700 TB avail
> pgs: 4223 active+clean
>  1active+clean+scrubbing+deep
>
>   io:
> client:   175 kB/s rd, 3 op/s rd, 0 op/s wr
> ---
>
> Does anybody have any idea what's going on here?
>
> Yesterday, this also happened, but resolved itself after about 1 hour.
> Right now, it's going on for about half an hour...
>
> Cheers,
> Oliver
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-13 Thread Oliver Freyermuth
Dear Cephalopodians,

a small addition. 

As far as I know, the I/O the user is performing is based on the following 
directory structure:
datafolder/some_older_tarball.tar.gz
datafolder/sometarball.tar.gz
datafolder/processing_number_2/
datafolder/processing_number_3/
datafolder/processing_number_4/

The problem appeared to start when:
- many clients were reading from datafolder/some_older_tarball.tar.gz, but 
extracting somewhere else (to another filesystem). 
- then, one single client starts to create datafolder/sometarball.tar.gz, 
packaging files from another filesystem. 
Can this cause such a lockup? If so, can we prevent it somehow? 

Cheers,
Oliver

Am 13.04.2018 um 18:16 schrieb Oliver Freyermuth:
> Dear Cephalopodians,
> 
> in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous 12.2.4), 
> we often have all (~40) clients accessing one file in readonly mode, even 
> with multiple processes per client doing that. 
> 
> Sometimes (I do not yet know when, nor why!) the MDS ends up in a situation 
> like:
> ---
> 2018-04-13 18:08:34.37 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> 292 slow requests, 5 included below; oldest blocked for > 1745.864417 secs
> 2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 960.563534 seconds old, received at 2018-04-13 17:52:33.815273: 
> client_request(client.34720:16487379 getattr pAsLsXsFs #0x109ff6d 
> 2018-04-13 17:52:33.814904 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 30.636678 seconds old, received at 2018-04-13 18:08:03.742128: 
> client_request(client.34302:16453640 getattr pAsLsXsFs #0x109ff6d 
> 2018-04-13 18:08:03.741630 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 972.648926 seconds old, received at 2018-04-13 17:52:21.729881: 
> client_request(client.34720:16487334 lookup #0x101fcab/sometarball.tar.gz 
> 2018-04-13 17:52:21.729450 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 1685.953657 seconds old, received at 2018-04-13 17:40:28.425149: 
> client_request(client.34810:16564864 lookup #0x101fcab/sometarball.tar.gz 
> 2018-04-13 17:40:28.424961 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 1552.743795 seconds old, received at 2018-04-13 17:42:41.635012: 
> client_request(client.34302:16453566 getattr pAsLsXsFs #0x109ff6d 
> 2018-04-13 17:42:41.634726 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> ---
> As you can see (oldest blocked for > 1745.864417 secs) it stays in that 
> situation for quite a while. 
> The number of blocked requests is also not decreasing, but instead slowly 
> increasing whenever a new request is added to the queue. 
> 
> We have a setup of one active MDS, a standby-replay, and a standby. 
> Triggering a failover does not help, it only resets the "oldest blocked" 
> time. 
> 
> I checked the following things on the active MDS:
> ---
> # ceph daemon mds.mon001 objecter_requests
> {
> "ops": [],
> "linger_ops": [],
> "pool_ops": [],
> "pool_stat_ops": [],
> "statfs_ops": [],
> "command_ops": []
> }
> # ceph daemon mds.mon001 ops | grep event | grep -v "initiated" | grep -v 
> "failed to rdlock" | grep -v events
> => no output, only "initiated" and "rdlock" are in the queue. 
> ---
> 
> There's also almost no CPU load, almost no other I/O, and ceph is 
> deep-scrubbing ~pg (this also finishes and the next pg is scrubbed fine),
> and the scrubbing is not even happening in the metadata pool (easy to see in 
> the Luminous dashboard):
> ---
> # ceph -s
>   cluster:
> id: some_funny_hash
> health: HEALTH_WARN
> 1 MDSs report slow requests
>  
>   services:
> mon: 3 daemons, quorum mon003,mon001,mon002
> mgr: mon001(active), standbys: mon002, mon003
> mds: cephfs_baf-1/1/1 up  {0=mon001=up:active}, 1 up:standby-replay, 1 
> up:standby
> osd: 196 osds: 196 up, 196 in
>  
>   data:
> pools:   2 pools, 4224 pgs
> objects: 15649k objects, 61761 GB
> usage:   114 TB used, 586 TB / 700 TB avail
> pgs: 4223 active+clean
>  1active+clean+scrubbing+deep
>  
>   io:
> client:   175 kB/s rd, 3 op/s rd, 0 op/s wr
>