Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-15 Thread Oliver Freyermuth
Am 15.04.2018 um 23:04 schrieb John Spray:
> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
>  wrote:
>> Dear Cephalopodians,
>>
>> in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous 12.2.4),
>> we often have all (~40) clients accessing one file in readonly mode, even 
>> with multiple processes per client doing that.
>>
>> Sometimes (I do not yet know when, nor why!) the MDS ends up in a situation 
>> like:
>> ---
>> 2018-04-13 18:08:34.37 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> 292 slow requests, 5 included below; oldest blocked for > 1745.864417 secs
>> 2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> slow request 960.563534 seconds old, received at 2018-04-13 17:52:33.815273: 
>> client_request(client.34720:16487379 getattr pAsLsXsFs #0x109ff6d 
>> 2018-04-13 17:52:33.814904 caller_uid=94894, caller_gid=513{513,}) currently 
>> failed to rdlock, waiting
>> 2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> slow request 30.636678 seconds old, received at 2018-04-13 18:08:03.742128: 
>> client_request(client.34302:16453640 getattr pAsLsXsFs #0x109ff6d 
>> 2018-04-13 18:08:03.741630 caller_uid=94894, caller_gid=513{513,}) currently 
>> failed to rdlock, waiting
>> 2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> slow request 972.648926 seconds old, received at 2018-04-13 17:52:21.729881: 
>> client_request(client.34720:16487334 lookup 
>> #0x101fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450 
>> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
>> 2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> slow request 1685.953657 seconds old, received at 2018-04-13 
>> 17:40:28.425149: client_request(client.34810:16564864 lookup 
>> #0x101fcab/sometarball.tar.gz 2018-04-13 17:40:28.424961 
>> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
>> 2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
>> slow request 1552.743795 seconds old, received at 2018-04-13 
>> 17:42:41.635012: client_request(client.34302:16453566 getattr pAsLsXsFs 
>> #0x109ff6d 2018-04-13 17:42:41.634726 caller_uid=94894, 
>> caller_gid=513{513,}) currently failed to rdlock, waiting
>> ---
>> As you can see (oldest blocked for > 1745.864417 secs) it stays in that 
>> situation for quite a while.
>> The number of blocked requests is also not decreasing, but instead slowly 
>> increasing whenever a new request is added to the queue.
>>
>> We have a setup of one active MDS, a standby-replay, and a standby.
>> Triggering a failover does not help, it only resets the "oldest blocked" 
>> time.
> 
> Sounds like a client issue (a client is holding a lock on a file but
> failing to relinquish it for another client's request to be
> processed).
> 
> Are these kernel (and what version?) or fuse clients?

The full cluster is running with Fuse clients, all on 12.2.4. 
Additionally, there are 6 NFS Ganesha servers using FSAL_CEPH, i.e. libcephfs. 

Cheers,
Oliver

> 
> John
> 
>>
>> I checked the following things on the active MDS:
>> ---
>> # ceph daemon mds.mon001 objecter_requests
>> {
>> "ops": [],
>> "linger_ops": [],
>> "pool_ops": [],
>> "pool_stat_ops": [],
>> "statfs_ops": [],
>> "command_ops": []
>> }
>> # ceph daemon mds.mon001 ops | grep event | grep -v "initiated" | grep -v 
>> "failed to rdlock" | grep -v events
>> => no output, only "initiated" and "rdlock" are in the queue.
>> ---
>>
>> There's also almost no CPU load, almost no other I/O, and ceph is 
>> deep-scrubbing ~pg (this also finishes and the next pg is scrubbed fine),
>> and the scrubbing is not even happening in the metadata pool (easy to see in 
>> the Luminous dashboard):
>> ---
>> # ceph -s
>>   cluster:
>> id: some_funny_hash
>> health: HEALTH_WARN
>> 1 MDSs report slow requests
>>
>>   services:
>> mon: 3 daemons, quorum mon003,mon001,mon002
>> mgr: mon001(active), standbys: mon002, mon003
>> mds: cephfs_baf-1/1/1 up  {0=mon001=up:active}, 1 up:standby-replay, 1 
>> up:standby
>> osd: 196 osds: 196 up, 196 in
>>
>>   data:
>> pools:   2 pools, 4224 pgs
>> objects: 15649k objects, 61761 GB
>> usage:   114 TB used, 586 TB / 700 TB avail
>> pgs: 4223 active+clean
>>  1active+clean+scrubbing+deep
>>
>>   io:
>> client:   175 kB/s rd, 3 op/s rd, 0 op/s wr
>> ---
>>
>> Does anybody have any idea what's going on here?
>>
>> Yesterday, this also happened, but 

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-15 Thread John Spray
On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
 wrote:
> Dear Cephalopodians,
>
> in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous 12.2.4),
> we often have all (~40) clients accessing one file in readonly mode, even 
> with multiple processes per client doing that.
>
> Sometimes (I do not yet know when, nor why!) the MDS ends up in a situation 
> like:
> ---
> 2018-04-13 18:08:34.37 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> 292 slow requests, 5 included below; oldest blocked for > 1745.864417 secs
> 2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 960.563534 seconds old, received at 2018-04-13 17:52:33.815273: 
> client_request(client.34720:16487379 getattr pAsLsXsFs #0x109ff6d 
> 2018-04-13 17:52:33.814904 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 30.636678 seconds old, received at 2018-04-13 18:08:03.742128: 
> client_request(client.34302:16453640 getattr pAsLsXsFs #0x109ff6d 
> 2018-04-13 18:08:03.741630 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 972.648926 seconds old, received at 2018-04-13 17:52:21.729881: 
> client_request(client.34720:16487334 lookup #0x101fcab/sometarball.tar.gz 
> 2018-04-13 17:52:21.729450 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 1685.953657 seconds old, received at 2018-04-13 17:40:28.425149: 
> client_request(client.34810:16564864 lookup #0x101fcab/sometarball.tar.gz 
> 2018-04-13 17:40:28.424961 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> 2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster) log [WRN] : 
> slow request 1552.743795 seconds old, received at 2018-04-13 17:42:41.635012: 
> client_request(client.34302:16453566 getattr pAsLsXsFs #0x109ff6d 
> 2018-04-13 17:42:41.634726 caller_uid=94894, caller_gid=513{513,}) currently 
> failed to rdlock, waiting
> ---
> As you can see (oldest blocked for > 1745.864417 secs) it stays in that 
> situation for quite a while.
> The number of blocked requests is also not decreasing, but instead slowly 
> increasing whenever a new request is added to the queue.
>
> We have a setup of one active MDS, a standby-replay, and a standby.
> Triggering a failover does not help, it only resets the "oldest blocked" time.

Sounds like a client issue (a client is holding a lock on a file but
failing to relinquish it for another client's request to be
processed).

Are these kernel (and what version?) or fuse clients?

John

>
> I checked the following things on the active MDS:
> ---
> # ceph daemon mds.mon001 objecter_requests
> {
> "ops": [],
> "linger_ops": [],
> "pool_ops": [],
> "pool_stat_ops": [],
> "statfs_ops": [],
> "command_ops": []
> }
> # ceph daemon mds.mon001 ops | grep event | grep -v "initiated" | grep -v 
> "failed to rdlock" | grep -v events
> => no output, only "initiated" and "rdlock" are in the queue.
> ---
>
> There's also almost no CPU load, almost no other I/O, and ceph is 
> deep-scrubbing ~pg (this also finishes and the next pg is scrubbed fine),
> and the scrubbing is not even happening in the metadata pool (easy to see in 
> the Luminous dashboard):
> ---
> # ceph -s
>   cluster:
> id: some_funny_hash
> health: HEALTH_WARN
> 1 MDSs report slow requests
>
>   services:
> mon: 3 daemons, quorum mon003,mon001,mon002
> mgr: mon001(active), standbys: mon002, mon003
> mds: cephfs_baf-1/1/1 up  {0=mon001=up:active}, 1 up:standby-replay, 1 
> up:standby
> osd: 196 osds: 196 up, 196 in
>
>   data:
> pools:   2 pools, 4224 pgs
> objects: 15649k objects, 61761 GB
> usage:   114 TB used, 586 TB / 700 TB avail
> pgs: 4223 active+clean
>  1active+clean+scrubbing+deep
>
>   io:
> client:   175 kB/s rd, 3 op/s rd, 0 op/s wr
> ---
>
> Does anybody have any idea what's going on here?
>
> Yesterday, this also happened, but resolved itself after about 1 hour.
> Right now, it's going on for about half an hour...
>
> Cheers,
> Oliver
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___

Re: [ceph-users] High TCP retransmission rates, only with Ceph

2018-04-15 Thread Robert Stanford
 I should have been more clear.  The TCP retransmissions are on the OSD
host.

On Sun, Apr 15, 2018 at 1:48 PM, Paweł Sadowski  wrote:

> On 04/15/2018 08:18 PM, Robert Stanford wrote:
>
>>
>>  Iperf gives about 7Gb/s between a radosgw host and one of my OSD hosts
>> (8 disks, 8 OSD daemons, one of 3 OSD hosts).  When I benchmark radosgw
>> with cosbench I see high TCP retransmission rates (from sar -n ETCP 1).  I
>> don't see this with iperf.  Why would Ceph, but not iperf, cause high TCP
>> retransmission rates?
>>
>
> Most probably your application (radosgw in this case) is not able to
> process requests fast enough and some packets are dropped.
>
> --
> PS
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] High TCP retransmission rates, only with Ceph

2018-04-15 Thread Paweł Sadowski

On 04/15/2018 08:18 PM, Robert Stanford wrote:


 Iperf gives about 7Gb/s between a radosgw host and one of my OSD 
hosts (8 disks, 8 OSD daemons, one of 3 OSD hosts).  When I benchmark 
radosgw with cosbench I see high TCP retransmission rates (from sar -n 
ETCP 1).  I don't see this with iperf.  Why would Ceph, but not iperf, 
cause high TCP retransmission rates?


Most probably your application (radosgw in this case) is not able to 
process requests fast enough and some packets are dropped.


--
PS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] High TCP retransmission rates, only with Ceph

2018-04-15 Thread Robert Stanford
 Iperf gives about 7Gb/s between a radosgw host and one of my OSD hosts (8
disks, 8 OSD daemons, one of 3 OSD hosts).  When I benchmark radosgw with
cosbench I see high TCP retransmission rates (from sar -n ETCP 1).  I don't
see this with iperf.  Why would Ceph, but not iperf, cause high TCP
retransmission rates?

 Thank you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ZeroDivisionError: float division by zero in /usr/lib/ceph/mgr/dashboard/module.py (12.2.4)

2018-04-15 Thread Nicolas Huillard
Hi,

I'm not sure if this have been solved since 12.2.4. The same code
occurs in a different file in Github https://github.com/ceph/ceph/blob/
50412f7e9c2691ec10132c8bf9310a05a40e9f9d/src/pybind/mgr/status/module.p
y
The ZeroDivisionError occurs when the dashboard is open, and there is a
network outage (link between 2 datacenters is broken). I'm not sure
about the behaviour of the actual UI in the dashboard at the same time.

Syslog trace:

ceph-mgr[1324]: [15/Apr/2018:09:47:12] HTTP Traceback (most recent call last):
ceph-mgr[1324]:   File 
"/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670, in respond
ceph-mgr[1324]: response.body = self.handler()
ceph-mgr[1324]:   File 
"/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 217, in 
__call__
ceph-mgr[1324]: self.body = self.oldhandler(*args, **kwargs)
ceph-mgr[1324]:   File 
"/usr/lib/python2.7/dist-packages/cherrypy/lib/jsontools.py", line 63, in 
json_handler
ceph-mgr[1324]: value = cherrypy.serving.request._json_inner_handler(*args, 
**kwargs)
ceph-mgr[1324]:   File 
"/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61, in __call__
ceph-mgr[1324]: return self.callable(*self.args, **self.kwargs)
ceph-mgr[1324]:   File "/usr/lib/ceph/mgr/dashboard/module.py", line 991, in 
list_data
ceph-mgr[1324]: return self._osds_by_server()
ceph-mgr[1324]:   File "/usr/lib/ceph/mgr/dashboard/module.py", line 1040, in 
_osds_by_server
ceph-mgr[1324]: osd_map.osds_by_id[osd_id])
ceph-mgr[1324]:   File "/usr/lib/ceph/mgr/dashboard/module.py", line 1007, in 
_osd_summary
ceph-mgr[1324]: result['stats'][s.split(".")[1]] = 
global_instance().get_rate('osd', osd_spec, s)
ceph-mgr[1324]:   File "/usr/lib/ceph/mgr/dashboard/module.py", line 268, in 
get_rate
ceph-mgr[1324]: return (data[-1][1] - data[-2][1]) / float(data[-1][0] - 
data[-2][0])
ceph-mgr[1324]: ZeroDivisionError: float division by zero

HTH,

-- 
Nicolas Huillard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com