Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-23 Thread deeepdish
@John-Paul Robinson:

I’ve also experienced nfs being blocked when serving rbd devices (XFS system).  
In my scenario I had rbd device mapped on an OSD host and nfs exported (lab 
scenario).   Log entries below..  Running Centos 7 w/ 
3.10.0-229.14.1.el7.x86_64.   Next step for me is to compile 3.18.22 and test 
nfs and scst (iscsi / fc).

Oct 22 13:30:01 osdhost01 systemd: Started Session 14 of user root.
Oct 22 13:37:04 osdhost01 kernel: INFO: task nfsd:12672 blocked for more than 
120 seconds.
Oct 22 13:37:04 osdhost01 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 22 13:37:04 osdhost01 kernel: nfsdD 880627c73680 0 
12672  2 0x0080
Oct 22 13:37:04 osdhost01 kernel: 880bda763b08 0046 
880be73af1c0 880bda763fd8
Oct 22 13:37:04 osdhost01 kernel: 880bda763fd8 880bda763fd8 
880be73af1c0 880627c73f48
Oct 22 13:37:04 osdhost01 kernel: 880c3ff98ae8 0002 
811562e0 880bda763b80
Oct 22 13:37:04 osdhost01 kernel: Call Trace:
Oct 22 13:37:04 osdhost01 kernel: [] ? 
wait_on_page_read+0x60/0x60
Oct 22 13:37:04 osdhost01 kernel: [] io_schedule+0x9d/0x130
Oct 22 13:37:04 osdhost01 kernel: [] sleep_on_page+0xe/0x20
Oct 22 13:37:04 osdhost01 kernel: [] __wait_on_bit+0x60/0x90
Oct 22 13:37:04 osdhost01 kernel: [] 
wait_on_page_bit+0x86/0xb0
Oct 22 13:37:04 osdhost01 kernel: [] ? 
autoremove_wake_function+0x40/0x40
Oct 22 13:37:04 osdhost01 kernel: [] 
filemap_fdatawait_range+0x111/0x1b0
Oct 22 13:37:04 osdhost01 kernel: [] 
filemap_write_and_wait_range+0x3f/0x70
Oct 22 13:37:04 osdhost01 kernel: [] 
xfs_file_fsync+0x66/0x1f0 [xfs]
Oct 22 13:37:04 osdhost01 kernel: [] vfs_fsync_range+0x1d/0x30
Oct 22 13:37:04 osdhost01 kernel: [] nfsd_commit+0xb9/0xe0 
[nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] nfsd4_commit+0x57/0x60 
[nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] 
nfsd4_proc_compound+0x4d7/0x7f0 [nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] nfsd_dispatch+0xbb/0x200 
[nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] 
svc_process_common+0x453/0x6f0 [sunrpc]
Oct 22 13:37:04 osdhost01 kernel: [] svc_process+0x103/0x170 
[sunrpc]
Oct 22 13:37:04 osdhost01 kernel: [] nfsd+0xe7/0x150 [nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] ? nfsd_destroy+0x80/0x80 
[nfsd]
Oct 22 13:37:04 osdhost01 kernel: [] kthread+0xcf/0xe0
Oct 22 13:37:04 osdhost01 kernel: [] ? 
kthread_create_on_node+0x140/0x140
Oct 22 13:37:04 osdhost01 kernel: [] ret_from_fork+0x58/0x90
Oct 22 13:37:04 osdhost01 kernel: [] ? 
kthread_create_on_node+0x140/0x140
Oct 22 13:37:04 osdhost01 kernel: INFO: task kworker/u50:81:15660 blocked for 
more than 120 seconds.
Oct 22 13:37:04 osdhost01 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 22 13:37:04 osdhost01 kernel: kworker/u50:81  D 880c3fc73680 0 
15660  2 0x0080
Oct 22 13:37:04 osdhost01 kernel: Workqueue: writeback bdi_writeback_workfn 
(flush-252:0)
Oct 22 13:37:04 osdhost01 kernel: 88086deeb738 0046 
880beb6796c0 88086deebfd8
Oct 22 13:37:04 osdhost01 kernel: 88086deebfd8 88086deebfd8 
880beb6796c0 880c3fc73f48
Oct 22 13:37:04 osdhost01 kernel: 88061aec0fc0 880c1bb2dea0 
88061aec0ff0 88061aec0fc0
Oct 22 13:37:04 osdhost01 kernel: Call Trace:
Oct 22 13:37:04 osdhost01 kernel: [] io_schedule+0x9d/0x130
Oct 22 13:37:04 osdhost01 kernel: [] get_request+0x1b5/0x780
Oct 22 13:37:04 osdhost01 kernel: [] ? wake_up_bit+0x30/0x30
Oct 22 13:37:04 osdhost01 kernel: [] blk_queue_bio+0xc6/0x390
Oct 22 13:37:04 osdhost01 kernel: [] 
generic_make_request+0xe2/0x130
Oct 22 13:37:04 osdhost01 kernel: [] submit_bio+0x71/0x150
Oct 22 13:37:04 osdhost01 kernel: [] 
xfs_submit_ioend_bio.isra.12+0x33/0x40 [xfs]
Oct 22 13:37:04 osdhost01 kernel: [] 
xfs_submit_ioend+0xef/0x130 [xfs]
Oct 22 13:37:04 osdhost01 kernel: [] 
xfs_vm_writepage+0x36a/0x5d0 [xfs]
Oct 22 13:37:04 osdhost01 kernel: [] __writepage+0x13/0x50
Oct 22 13:37:04 osdhost01 kernel: [] 
write_cache_pages+0x251/0x4d0
Oct 22 13:37:04 osdhost01 kernel: [] ? 
global_dirtyable_memory+0x70/0x70
Oct 22 13:37:04 osdhost01 kernel: [] 
generic_writepages+0x4d/0x80
Oct 22 13:37:04 osdhost01 kernel: [] 
xfs_vm_writepages+0x43/0x50 [xfs]
Oct 22 13:37:04 osdhost01 kernel: [] do_writepages+0x1e/0x40
Oct 22 13:37:04 osdhost01 kernel: [] 
__writeback_single_inode+0x40/0x220
Oct 22 13:37:04 osdhost01 kernel: [] 
writeback_sb_inodes+0x25e/0x420
Oct 22 13:37:04 osdhost01 kernel: [] 
__writeback_inodes_wb+0x9f/0xd0
Oct 22 13:37:04 osdhost01 kernel: [] wb_writeback+0x263/0x2f0
Oct 22 13:37:04 osdhost01 kernel: [] 
bdi_writeback_workfn+0x1cc/0x460
Oct 22 13:37:04 osdhost01 kernel: [] 
process_one_work+0x17b/0x470
Oct 22 13:37:04 osdhost01 kernel: [] worker_thread+0x11b/0x400
Oct 22 13:37:04 osdhost01 kernel: [] ? 
rescuer_thread+0x400/0x400
Oct 22 13:37:04 osdhost01 kernel: [] kthread+0xcf/0xe0
Oct 22 13:37:04 osdhost01 kernel: [] ? 
kthread_create_on_node+0x140/0x140

Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread Wido den Hollander
On 10/22/2015 10:57 PM, John-Paul Robinson wrote:
> Hi,
> 
> Has anyone else experienced a problem with RBD-to-NFS gateways blocking
> nfsd server requests when their ceph cluster has a placement group that
> is not servicing I/O for some reason, eg. too few replicas or an osd
> with slow request warnings?
> 
> We have an RBD-NFS gateway that stops responding to NFS clients
> (interaction with RBD-backed NFS shares hang on the NFS client),
> whenever our ceph cluster has some part of it in an I/O block
> condition.   This issue only affects the ability of the nfsd processes
> to serve requests to the client.  I can look at and access underlying
> mounted RBD containers without issue, although they appear hung from the
> NFS client side.   The gateway node load numbers spike to a number that
> reflects the number of nfsd processes, but the system is otherwise
> untaxed (unlike the case in a normal high os load, ie. i can type and
> run commands with normal responsiveness.)
> 

Well, that is normal I think. Certain objects become unresponsive if a
PG is not serving I/O.

With a simple 'ls' or 'df -h' you might not be touching those objects,
so for you it seems like everything is functioning.

The nfsd process however might be hung due to a blocking I/O call. That
is completely normal and to be excpected.

That it hangs the complete NFS server might be just a side-effect on how
nfsd was written.

It might be that Ganesha works better for you:
http://blog.widodh.nl/2014/12/nfs-ganesha-with-libcephfs-on-ubuntu-14-04/

> The behavior comes accross like there is some nfsd global lock that an
> nfsd sets before requesting I/O from a backend device.  In the case
> above, the I/O request hangs on one RBD image affected by the I/O block
> caused by the problematic pg or OSD.   The nfsd request blocks on the
> ceph I/O and because it has set a global lock, all other nfsd processes
> are prevented from servicing requests to their clients.  The nfsd
> processes are now all in the wait queue causing the load number on the
> gateway system to spike. Once the Ceph I/O issues is resolved, the nfsd
> I/O request completes and all service returns to normal.  The load on
> the gateway drops to normal immediately and all NFS clients can again
> interact with the nfsd processes.  Thoughout this time unaffected ceph
> objects remain available to other clients, eg. OpenStack volumes.
> 
> Our RBD-NFS gateway is running Ubuntu 12.04.5 with kernel
> 3.11.0-15-generic.  The ceph version installed on this client is 0.72.2,
> though I assume only the kernel resident RBD module matters.
> 
> Any thoughts or pointers appreciated.
> 
> ~jpr
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread John-Paul Robinson
Hi,

Has anyone else experienced a problem with RBD-to-NFS gateways blocking
nfsd server requests when their ceph cluster has a placement group that
is not servicing I/O for some reason, eg. too few replicas or an osd
with slow request warnings?

We have an RBD-NFS gateway that stops responding to NFS clients
(interaction with RBD-backed NFS shares hang on the NFS client),
whenever our ceph cluster has some part of it in an I/O block
condition.   This issue only affects the ability of the nfsd processes
to serve requests to the client.  I can look at and access underlying
mounted RBD containers without issue, although they appear hung from the
NFS client side.   The gateway node load numbers spike to a number that
reflects the number of nfsd processes, but the system is otherwise
untaxed (unlike the case in a normal high os load, ie. i can type and
run commands with normal responsiveness.)

The behavior comes accross like there is some nfsd global lock that an
nfsd sets before requesting I/O from a backend device.  In the case
above, the I/O request hangs on one RBD image affected by the I/O block
caused by the problematic pg or OSD.   The nfsd request blocks on the
ceph I/O and because it has set a global lock, all other nfsd processes
are prevented from servicing requests to their clients.  The nfsd
processes are now all in the wait queue causing the load number on the
gateway system to spike. Once the Ceph I/O issues is resolved, the nfsd
I/O request completes and all service returns to normal.  The load on
the gateway drops to normal immediately and all NFS clients can again
interact with the nfsd processes.  Thoughout this time unaffected ceph
objects remain available to other clients, eg. OpenStack volumes.

Our RBD-NFS gateway is running Ubuntu 12.04.5 with kernel
3.11.0-15-generic.  The ceph version installed on this client is 0.72.2,
though I assume only the kernel resident RBD module matters.

Any thoughts or pointers appreciated.

~jpr
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread John-Paul Robinson


On 10/22/2015 04:03 PM, Wido den Hollander wrote:
> On 10/22/2015 10:57 PM, John-Paul Robinson wrote:
>> Hi,
>>
>> Has anyone else experienced a problem with RBD-to-NFS gateways blocking
>> nfsd server requests when their ceph cluster has a placement group that
>> is not servicing I/O for some reason, eg. too few replicas or an osd
>> with slow request warnings?
>>
>> We have an RBD-NFS gateway that stops responding to NFS clients
>> (interaction with RBD-backed NFS shares hang on the NFS client),
>> whenever our ceph cluster has some part of it in an I/O block
>> condition.   This issue only affects the ability of the nfsd processes
>> to serve requests to the client.  I can look at and access underlying
>> mounted RBD containers without issue, although they appear hung from the
>> NFS client side.   The gateway node load numbers spike to a number that
>> reflects the number of nfsd processes, but the system is otherwise
>> untaxed (unlike the case in a normal high os load, ie. i can type and
>> run commands with normal responsiveness.)
>>
> Well, that is normal I think. Certain objects become unresponsive if a
> PG is not serving I/O.
>
> With a simple 'ls' or 'df -h' you might not be touching those objects,
> so for you it seems like everything is functioning.
>
> The nfsd process however might be hung due to a blocking I/O call. That
> is completely normal and to be excpected.

I agree that an nfsd process blocking on a blocked backend I/O request
is expected an normal.

> That it hangs the complete NFS server might be just a side-effect on how
> nfsd was written.

Hanging all nfsd processes is the part I find unexpected.  I'm just
wondering is someone has experience with this or if this is a known nfsd
issue.

> It might be that Ganesha works better for you:
> http://blog.widodh.nl/2014/12/nfs-ganesha-with-libcephfs-on-ubuntu-14-04/

Thanks genesha looks very interesting!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread Ryan Tokarek

> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson  wrote:
> 
> Hi,
> 
> Has anyone else experienced a problem with RBD-to-NFS gateways blocking
> nfsd server requests when their ceph cluster has a placement group that
> is not servicing I/O for some reason, eg. too few replicas or an osd
> with slow request warnings?

We have experienced exactly that kind of problem except that it sometimes 
happens even when ceph health reports "HEALTH_OK". This has been incredibly 
vexing for us. 


If the cluster is unhealthy for some reason, then I'd expect your/our symptoms 
as writes can't be completed. 

I'm guessing that you have file systems with barriers turned on. Whichever file 
system that has a barrier write stuck on the problem pg, will cause any other 
process trying to write anywhere in that FS also to block. This likely means a 
cascade of nfsd processes will block as they each try to service various client 
writes to that FS. Even though, theoretically, the rest of the "disk" (rbd) and 
other file systems might still be writable, the NFS processes will still be in 
uninterruptible sleep just because of that stuck write request (or such is my 
understanding). 

Disabling barriers on the gateway machine might postpone the problem (never 
tried it and don't want to) until you hit your vm.dirty_bytes or vm.dirty_ratio 
thresholds, but it is dangerous as you could much more easily lose data. You'd 
be better off solving the underlying issues when they happen (too few replicas 
available or overloaded osds). 


For us, even when the cluster reports itself as healthy, we sometimes have this 
problem. All nfsd processes block. sync blocks. echo 3 > 
/proc/sys/vm/drop_caches blocks. There is a persistent 4-8MB "Dirty" in 
/proc/meminfo. None of the osds log slow requests. Everything seems fine on the 
osds and mons. Neither CPU nor I/O load is extraordinary on the ceph nodes, but 
at least one file system on the gateway machine will stop accepting writes. 

If we just wait, the situation resolves itself in 10 to 30 minutes. A forced 
reboot of the NFS gateway "solves" the performance problem, but is annoying and 
dangerous (we unmount all of the file systems that are still unmountable, but 
the stuck ones lead us to a sysrq-b). 

This is on Scientific Linux 6.7 systems with elrepo 4.1.10 Kernels running Ceph 
Firefly (0.8.10) and XFS file systems exported over NFS and samba. 

Ryan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread Ryan Tokarek

> On Oct 22, 2015, at 10:19 PM, John-Paul Robinson  wrote:
> 
> A few clarifications on our experience:
> 
> * We have 200+ rbd images mounted on our RBD-NFS gateway.  (There's
> nothing easier for a user to understand than "your disk is full".)

Same here, and agreed. It sounds like our situations are similar except for my 
blocking on an apparently healthy cluster issue. 

> * I'd expect more contention potential with a single shared RBD back
> end, but with many distinct and presumably isolated backend RBD images,
> I've always been surprised that *all* the nfsd task hang.  This leads me
> to think  it's an nfsd issue rather than and rbd issue.  (I realize this
> is an rbd list, looking for shared experience. ;) )

It's definitely possible. I've experienced exactly the behavior you're seeing. 
My guess is that when an nfsd thread blocks and goes dark, affected clients 
(even if it's only one) will retransmit their requests thinking there's a 
network issue causing more nfsds to go dark until all the server threads are 
stuck (that could be hogwash, but it fits the behavior). Or perhaps there are 
enough individual clients writing to the affected NFS volume that they consume 
all the available nfsd threads (I'm not sure about your client to FS and nfsd 
thread ratio, but that is plausible in my situation).  I think some testing 
with xfs_freeze and non-critical nfs server/clients is called for. 

I don't think this part is related to ceph except that it happens to be 
providing the underlying storage. I'm fairly certain that my problems with an 
apparently healthy cluster blocking writes is a ceph problem, but I haven't 
figured out what the source of that is. 

> * I haven't seen any difference between reads and writes.  Any access to
> any backing RBD store from the NFS client hangs.

All NFS clients are hung, but in my situation, it's usually only 1-3 local file 
systems that stop accepting writes. NFS is completely unresponsive, but local 
and remote-samba operations on the unaffected file systems are totally happy. 

I don't have a solution to NFS issue, but I've seen it all too often. I wonder 
whether setting a huge number of threads and or playing with client retransmit 
times would help, but I suspect this problem is just intrinsic to Linux NFS 
servers. 

Ryan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway

2015-10-22 Thread John-Paul Robinson
A few clarifications on our experience:

* We have 200+ rbd images mounted on our RBD-NFS gateway.  (There's
nothing easier for a user to understand than "your disk is full".)

* I'd expect more contention potential with a single shared RBD back
end, but with many distinct and presumably isolated backend RBD images,
I've always been surprised that *all* the nfsd task hang.  This leads me
to think  it's an nfsd issue rather than and rbd issue.  (I realize this
is an rbd list, looking for shared experience. ;) )
 
* I haven't seen any difference between reads and writes.  Any access to
any backing RBD store from the NFS client hangs.

~jpr

On 10/22/2015 06:42 PM, Ryan Tokarek wrote:
>> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson  wrote:
>>
>> Hi,
>>
>> Has anyone else experienced a problem with RBD-to-NFS gateways blocking
>> nfsd server requests when their ceph cluster has a placement group that
>> is not servicing I/O for some reason, eg. too few replicas or an osd
>> with slow request warnings?
> We have experienced exactly that kind of problem except that it sometimes 
> happens even when ceph health reports "HEALTH_OK". This has been incredibly 
> vexing for us. 
>
>
> If the cluster is unhealthy for some reason, then I'd expect your/our 
> symptoms as writes can't be completed. 
>
> I'm guessing that you have file systems with barriers turned on. Whichever 
> file system that has a barrier write stuck on the problem pg, will cause any 
> other process trying to write anywhere in that FS also to block. This likely 
> means a cascade of nfsd processes will block as they each try to service 
> various client writes to that FS. Even though, theoretically, the rest of the 
> "disk" (rbd) and other file systems might still be writable, the NFS 
> processes will still be in uninterruptible sleep just because of that stuck 
> write request (or such is my understanding). 
>
> Disabling barriers on the gateway machine might postpone the problem (never 
> tried it and don't want to) until you hit your vm.dirty_bytes or 
> vm.dirty_ratio thresholds, but it is dangerous as you could much more easily 
> lose data. You'd be better off solving the underlying issues when they happen 
> (too few replicas available or overloaded osds). 
>
>
> For us, even when the cluster reports itself as healthy, we sometimes have 
> this problem. All nfsd processes block. sync blocks. echo 3 > 
> /proc/sys/vm/drop_caches blocks. There is a persistent 4-8MB "Dirty" in 
> /proc/meminfo. None of the osds log slow requests. Everything seems fine on 
> the osds and mons. Neither CPU nor I/O load is extraordinary on the ceph 
> nodes, but at least one file system on the gateway machine will stop 
> accepting writes. 
>
> If we just wait, the situation resolves itself in 10 to 30 minutes. A forced 
> reboot of the NFS gateway "solves" the performance problem, but is annoying 
> and dangerous (we unmount all of the file systems that are still unmountable, 
> but the stuck ones lead us to a sysrq-b). 
>
> This is on Scientific Linux 6.7 systems with elrepo 4.1.10 Kernels running 
> Ceph Firefly (0.8.10) and XFS file systems exported over NFS and samba. 
>
> Ryan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com