Re: [Nbd] [PATCH] nbd: improve request timeouts handling
Hi Michal, First of all, thanks for resending this. It works for my test setup but I have some comments inline below. On Thu, Feb 26, 2015 at 05:37:12PM +0100, Michal Belczyk wrote: The main idea behind it is to be able to quickly detect broken replica and switch over to another when used with any sort of mirror type device built on top of any number of nbd devices. Before this change a request would time out causing the socket to be shut down and the device to fail in case of a dead server or removed network cable only if: a) either the timer around kernel_sendmsg() kicked in b) or the TCP failures on retransmission finally caused an error on the socket, likely blocked on kernel_recvmsg() at this time, waiting for replies from the server Case a) depends mostly on the size of requests issued and on the maximum size of the socket buffer -- a lot of read request headers or small write requests could be sent without triggering the requested timeout Case b) timeout is independent of nbd-client -t timeout option as there is no TCP_USER_TIMEOUT set on the client socket by default. And even if such timeout was set it would not solve the problem of an nbd-client hung on receiving replies for much longer time without setting TCP keep-alives... and that would be the third, independent timeout setting required to make it work almost as expected... So, instead, take the big hammer approach and: *) trace the number of outstanding requests sent to the server (nbd-inflight) *) enable the timer (nbd-req_timer) before the first request is submitted and leave it enabled *) when sending next request do not touch the timer (it is up to the receiving side to control it at this point) *) on receive side update the timer every time a response is collected but there are more to read from the server *) disable the timer whenever the inflight counter drops to zero or an error (leading to the socket shutdown) is returned This patch does NOT prevent the server to process a request for longer than the timeout specified if only it replies to any other request submitted within the timeout (the server still may reply to a batch of requests in any order). Only the nbd-xmit_timeout != 0 code path is changed so the patch should not affect nbd connections running without an explicit timeout set on the nbd-client command line. There is also no way to enable or disable the timeout on an active (nbd-pid != 0) nbd device, it is however possible to change its value. Otherwise the inflight request counter would have to affect the nbd devices enabled without nbd-client -t timeout. Also move nbd-pid modifications behind nbd-tx_lock wherever possible to avoid races between the concurrent nbd-client invocations. This commit message is really good. Signed-off-by: Michal Belczyk belc...@bsd.krakow.pl --- drivers/block/nbd.c | 150 +--- include/linux/nbd.h | 6 +++ 2 files changed, 125 insertions(+), 31 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 4bc2a5c..cc4a98a 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -140,11 +140,24 @@ static void sock_shutdown(struct nbd_device *nbd, int lock) static void nbd_xmit_timeout(unsigned long arg) { - struct task_struct *task = (struct task_struct *)arg; + struct nbd_device *nbd = (struct nbd_device *)arg; + struct task_struct *task_ary[2]; + unsigned long flags; + int i; - printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, - task-comm, task-pid); - force_sig(SIGKILL, task); + spin_lock_irqsave(nbd-timer_lock, flags); + nbd-timedout = 1; + task_ary[0] = nbd-sender; + task_ary[1] = nbd-receiver; + for (i = 0; i 2; i++) { + if (task_ary[i] == NULL) + continue; + printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, + task_ary[i]-comm, task_ary[i]-pid); + force_sig(SIGKILL, task_ary[i]); + break; + } + spin_unlock_irqrestore(nbd-timer_lock, flags); } /* @@ -158,7 +171,7 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, struct msghdr msg; struct kvec iov; sigset_t blocked, oldset; - unsigned long pflags = current-flags; + unsigned long flags, pflags = current-flags; I would prefer variable declerations in seperate lines, especially if there is an assignment involved, for readability. if (unlikely(!sock)) { dev_err(disk_to_dev(nbd-disk), @@ -183,23 +196,39 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, msg.msg_controllen = 0; msg.msg_flags = msg_flags | MSG_NOSIGNAL; - if (send) { - struct
[Nbd] [PATCH] nbd: improve request timeouts handling
The main idea behind it is to be able to quickly detect broken replica and switch over to another when used with any sort of mirror type device built on top of any number of nbd devices. Before this change a request would time out causing the socket to be shut down and the device to fail in case of a dead server or removed network cable only if: a) either the timer around kernel_sendmsg() kicked in b) or the TCP failures on retransmission finally caused an error on the socket, likely blocked on kernel_recvmsg() at this time, waiting for replies from the server Case a) depends mostly on the size of requests issued and on the maximum size of the socket buffer -- a lot of read request headers or small write requests could be sent without triggering the requested timeout Case b) timeout is independent of nbd-client -t timeout option as there is no TCP_USER_TIMEOUT set on the client socket by default. And even if such timeout was set it would not solve the problem of an nbd-client hung on receiving replies for much longer time without setting TCP keep-alives... and that would be the third, independent timeout setting required to make it work almost as expected... So, instead, take the big hammer approach and: *) trace the number of outstanding requests sent to the server (nbd-inflight) *) enable the timer (nbd-req_timer) before the first request is submitted and leave it enabled *) when sending next request do not touch the timer (it is up to the receiving side to control it at this point) *) on receive side update the timer every time a response is collected but there are more to read from the server *) disable the timer whenever the inflight counter drops to zero or an error (leading to the socket shutdown) is returned This patch does NOT prevent the server to process a request for longer than the timeout specified if only it replies to any other request submitted within the timeout (the server still may reply to a batch of requests in any order). Only the nbd-xmit_timeout != 0 code path is changed so the patch should not affect nbd connections running without an explicit timeout set on the nbd-client command line. There is also no way to enable or disable the timeout on an active (nbd-pid != 0) nbd device, it is however possible to change its value. Otherwise the inflight request counter would have to affect the nbd devices enabled without nbd-client -t timeout. Also move nbd-pid modifications behind nbd-tx_lock wherever possible to avoid races between the concurrent nbd-client invocations. Signed-off-by: Michal Belczyk belc...@bsd.krakow.pl --- drivers/block/nbd.c | 150 +--- include/linux/nbd.h | 6 +++ 2 files changed, 125 insertions(+), 31 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 4bc2a5c..cc4a98a 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -140,11 +140,24 @@ static void sock_shutdown(struct nbd_device *nbd, int lock) static void nbd_xmit_timeout(unsigned long arg) { - struct task_struct *task = (struct task_struct *)arg; + struct nbd_device *nbd = (struct nbd_device *)arg; + struct task_struct *task_ary[2]; + unsigned long flags; + int i; - printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, - task-comm, task-pid); - force_sig(SIGKILL, task); + spin_lock_irqsave(nbd-timer_lock, flags); + nbd-timedout = 1; + task_ary[0] = nbd-sender; + task_ary[1] = nbd-receiver; + for (i = 0; i 2; i++) { + if (task_ary[i] == NULL) + continue; + printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, + task_ary[i]-comm, task_ary[i]-pid); + force_sig(SIGKILL, task_ary[i]); + break; + } + spin_unlock_irqrestore(nbd-timer_lock, flags); } /* @@ -158,7 +171,7 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, struct msghdr msg; struct kvec iov; sigset_t blocked, oldset; - unsigned long pflags = current-flags; + unsigned long flags, pflags = current-flags; if (unlikely(!sock)) { dev_err(disk_to_dev(nbd-disk), @@ -183,23 +196,39 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, msg.msg_controllen = 0; msg.msg_flags = msg_flags | MSG_NOSIGNAL; - if (send) { - struct timer_list ti; - - if (nbd-xmit_timeout) { - init_timer(ti); - ti.function = nbd_xmit_timeout; - ti.data = (unsigned long)current; - ti.expires = jiffies + nbd-xmit_timeout; - add_timer(ti); + if (nbd-xmit_timeout)
Re: [Nbd] [PATCH] nbd: improve request timeouts handling
Michal, thanks for this...a couple of preliminary questions... On Mon, Jun 17, 2013 at 4:10 PM, Michal Belczyk belc...@bsd.krakow.plwrote: The main idea behind it is to be able to quickly detect broken replica and switch over to another when used with any sort of mirror type device built on top of any number of nbd devices. I'm wondering if a simpler change would have the same result. Namely, could you just apply the xmit_timeout, as currently implemented, to the recvmsg path (ignoring the timeout, of course, when no requests are queued). Is there a case that this would not handle that the proposed patch handles? They seem to be equivalent. Also move nbd-pid modifications behind nbd-tx_lock wherever possible to avoid races between the concurrent nbd-client invocations. Is there a bug related to this that you've seen, or is this just a precaution? Thanks, Paul -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH] nbd: improve request timeouts handling
On Mon, Jul 29, 2013 at 12:45:33PM -0400, Paul Clements wrote: Michal, thanks for this...a couple of preliminary questions... On Mon, Jun 17, 2013 at 4:10 PM, Michal Belczyk belc...@bsd.krakow.plwrote: The main idea behind it is to be able to quickly detect broken replica and switch over to another when used with any sort of mirror type device built on top of any number of nbd devices. I'm wondering if a simpler change would have the same result. Namely, could you just apply the xmit_timeout, as currently implemented, to the recvmsg path (ignoring the timeout, of course, when no requests are queued). Is there a case that this would not handle that the proposed patch handles? They seem to be equivalent. Oh... no, they would not be equivalent. It is all my fault -- the commit message lacks very important information: Timeouts implemented via timers directly on top of kernel_sendmsg() or kernel_recvmsg() routines apply to a single bio_vec being sent/received at a time while the proposed patch makes a block queue request the subject to time out -- timeouts handling was moved from the sock_xmit() way up to nbd_send_req() and nbd_do_it()... Currently, assuming for this example max_sectors_kb=512, a single direct read request from an nbd device: # dd if=/dev/nbd0 of=/dev/null bs=512K iflag=direct count=1 generates: [1301353.424080] nbd0: request 8802253d22f0: got reply [1301353.424089] nbd0: request 8802253d22f0: got 4096 bytes data [1301353.424092] nbd0: request 8802253d22f0: got 4096 bytes data [ ... 128 4KB bio_vec-s / 128 sock_xmit() calls? ] So, assuming a timeout set to 30s, a single 512KB read request could take up to 128 times 30s which makes... 64 minutes, if the timeouts were implemented the way you ask... Well, they actually _are_ implemented this way in the upstream code, so a single direct write request like this will generate 128 timer arms/disarms AND a 30s timeout actually means that such request can take up to 64 minutes... I think it is wrong and I believe that if the maximum sized (subject to now configurable max_sectors_kb tunable) request cannot be fulfilled within a specified timeout then this NBD device queue should die, hopefully being handled by a DM layer on top... and such request could be happily requeued to a well-behaving replica, possibly another NBD device... Also move nbd-pid modifications behind nbd-tx_lock wherever possible to avoid races between the concurrent nbd-client invocations. Is there a bug related to this that you've seen, or is this just a precaution? Precaution but 1) I don't like that the current code checks for nbd-pid with nbd-tx_lock held and modifies it later with the same lock released in nbd_do_it() and 2) the proposed patch uses nbd-pid to determine if the device is in use (nbd-client alive) -- should it do it some other way? Thank you for the feedback! -- Michal Belczyk Sr. -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH] nbd: improve request timeouts handling (rebased)
On Sat, Jul 13, 2013 at 10:01:20PM +0200, Michal Belczyk wrote: On Thu, Jul 11, 2013 at 03:23:46PM +0200, Goswin von Brederlow wrote: On Wed, Jul 10, 2013 at 09:40:42PM +0200, Michal Belczyk wrote: [rebased against the master, no other changes] The main idea behind it is to be able to quickly detect broken replica and switch over to another when used with any sort of mirror type device built on top of any number of nbd devices. It is good to see someone checking and fixing timeout issues. But I would also like to have the opposite behaviour and delay detecting failures. Insane? read on... I have a small server that runs 24/7 with minimal power consumption. I also have a storage node that exports a number of disks via NBD. This should be powered down when not in use. The problem is that not in use is incompatible with NBD as is. The storage node must be running and must have an active connection to the server. What I'm suggesting is that the nbd-client should be able to create a network block device in disconnected state (without attached socket) and the kernel would return on the next access. The nbd client can then actually connect to the NBD server and restart the device with a socket to fullfill the request. In my case that would involve a wake-on-lan call to first bring up the storage node. Secondly it would be nice to have an idle timeout in the kernel. If no access happened to the NBD for some time the kernel should return so the client can switch to disconnected mode. In my case that would involve sending the storage node to sleep. Interested in trying to implement this? I'm currently busy trying to push bnbd to its first release which is not easy being distracted by fixing bugs in other but related components on my way... So far the biggest winners of my project are the nbd driver and the nbd client which already have some of the bugfixes upstreamed ;-) I just wonder -- have you tried with autofs plus some scripting over it on mount and unmount upon timeout? I think I have used it once in my life long ago, so not sure if it would work at all, but it seems possible? -- Michal Belczyk Sr. I don't think autofs can umount a filesystem with open files, like a shell having their CWD in the FS. Right? The FS would then easily be blocked accidentally without being actually in real use. MfG Goswin -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH] nbd: improve request timeouts handling (rebased)
On Mon, Jul 15, 2013 at 10:38:35AM +0200, Goswin von Brederlow wrote: I don't think autofs can umount a filesystem with open files, like a shell having their CWD in the FS. Right? The FS would then easily be blocked accidentally without being actually in real use. Yeah, I guess you're right... then it probably should go into the driver. As I said before -- I'm way too busy currently to work on it now... -- Michal Belczyk Sr. -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH] nbd: improve request timeouts handling (rebased)
On Thu, Jul 11, 2013 at 03:23:46PM +0200, Goswin von Brederlow wrote: On Wed, Jul 10, 2013 at 09:40:42PM +0200, Michal Belczyk wrote: [rebased against the master, no other changes] The main idea behind it is to be able to quickly detect broken replica and switch over to another when used with any sort of mirror type device built on top of any number of nbd devices. It is good to see someone checking and fixing timeout issues. But I would also like to have the opposite behaviour and delay detecting failures. Insane? read on... I have a small server that runs 24/7 with minimal power consumption. I also have a storage node that exports a number of disks via NBD. This should be powered down when not in use. The problem is that not in use is incompatible with NBD as is. The storage node must be running and must have an active connection to the server. What I'm suggesting is that the nbd-client should be able to create a network block device in disconnected state (without attached socket) and the kernel would return on the next access. The nbd client can then actually connect to the NBD server and restart the device with a socket to fullfill the request. In my case that would involve a wake-on-lan call to first bring up the storage node. Secondly it would be nice to have an idle timeout in the kernel. If no access happened to the NBD for some time the kernel should return so the client can switch to disconnected mode. In my case that would involve sending the storage node to sleep. Interested in trying to implement this? I'm currently busy trying to push bnbd to its first release which is not easy being distracted by fixing bugs in other but related components on my way... So far the biggest winners of my project are the nbd driver and the nbd client which already have some of the bugfixes upstreamed ;-) I just wonder -- have you tried with autofs plus some scripting over it on mount and unmount upon timeout? I think I have used it once in my life long ago, so not sure if it would work at all, but it seems possible? -- Michal Belczyk Sr. -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH] nbd: improve request timeouts handling (rebased)
On Wed, Jul 10, 2013 at 09:40:42PM +0200, Michal Belczyk wrote: [rebased against the master, no other changes] The main idea behind it is to be able to quickly detect broken replica and switch over to another when used with any sort of mirror type device built on top of any number of nbd devices. It is good to see someone checking and fixing timeout issues. But I would also like to have the opposite behaviour and delay detecting failures. Insane? read on... I have a small server that runs 24/7 with minimal power consumption. I also have a storage node that exports a number of disks via NBD. This should be powered down when not in use. The problem is that not in use is incompatible with NBD as is. The storage node must be running and must have an active connection to the server. What I'm suggesting is that the nbd-client should be able to create a network block device in disconnected state (without attached socket) and the kernel would return on the next access. The nbd client can then actually connect to the NBD server and restart the device with a socket to fullfill the request. In my case that would involve a wake-on-lan call to first bring up the storage node. Secondly it would be nice to have an idle timeout in the kernel. If no access happened to the NBD for some time the kernel should return so the client can switch to disconnected mode. In my case that would involve sending the storage node to sleep. Interested in trying to implement this? MfG Goswin -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
[Nbd] [PATCH] nbd: improve request timeouts handling (rebased)
[rebased against the master, no other changes] The main idea behind it is to be able to quickly detect broken replica and switch over to another when used with any sort of mirror type device built on top of any number of nbd devices. Before this change a request would time out causing the socket to be shut down and the device to fail in case of a dead server or removed network cable only if: a) either the timer around kernel_sendmsg() kicked in b) or the TCP failures on retransmission finally caused an error on the socket, likely blocked on kernel_recvmsg() at this time, waiting for replies from the server Case a) depends mostly on the size of requests issued and on the maximum size of the socket buffer -- a lot of read request headers or small write requests could be sent without triggering the requested timeout Case b) timeout is independent of nbd-client -t timeout option as there is no TCP_USER_TIMEOUT set on the client socket by default. And even if such timeout was set it would not solve the problem of an nbd-client hung on receiving replies for much longer time without setting TCP keep-alives... and that would be the third, independent timeout setting required to make it work almost as expected... So, instead, take the big hammer approach and: *) trace the number of outstanding requests sent to the server (nbd-inflight) *) enable the timer (nbd-req_timer) before the first request is submitted and leave it enabled *) when sending next request do not touch the timer (it is up to the receiving side to control it at this point) *) on receive side update the timer every time a response is collected but there are more to read from the server *) disable the timer whenever the inflight counter drops to zero or an error (leading to the socket shutdown) is returned This patch does NOT prevent the server to process a request for longer than the timeout specified if only it replies to any other request submitted within the timeout (the server still may reply to a batch of requests in any order). Only the nbd-xmit_timeout != 0 code path is changed so the patch should not affect nbd connections running without an explicit timeout set on the nbd-client command line. There is also no way to enable or disable the timeout on an active (nbd-pid != 0) nbd device, it is however possible to change its value. Otherwise the inflight request counter would have to affect the nbd devices enabled without nbd-client -t timeout. Also move nbd-pid modifications behind nbd-tx_lock wherever possible to avoid races between the concurrent nbd-client invocations. Signed-off-by: Michal Belczyk belc...@bsd.krakow.pl --- drivers/block/nbd.c | 150 +--- include/linux/nbd.h | 6 +++ 2 files changed, 125 insertions(+), 31 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 2dc3b51..a20d5eb 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -140,11 +140,24 @@ static void sock_shutdown(struct nbd_device *nbd, int lock) static void nbd_xmit_timeout(unsigned long arg) { - struct task_struct *task = (struct task_struct *)arg; + struct nbd_device *nbd = (struct nbd_device *)arg; + struct task_struct *task_ary[2]; + unsigned long flags; + int i; - printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, - task-comm, task-pid); - force_sig(SIGKILL, task); + spin_lock_irqsave(nbd-timer_lock, flags); + nbd-timedout = 1; + task_ary[0] = nbd-sender; + task_ary[1] = nbd-receiver; + for (i = 0; i 2; i++) { + if (task_ary[i] == NULL) + continue; + printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, + task_ary[i]-comm, task_ary[i]-pid); + force_sig(SIGKILL, task_ary[i]); + break; + } + spin_unlock_irqrestore(nbd-timer_lock, flags); } /* @@ -158,7 +171,7 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, struct msghdr msg; struct kvec iov; sigset_t blocked, oldset; - unsigned long pflags = current-flags; + unsigned long flags, pflags = current-flags; if (unlikely(!sock)) { dev_err(disk_to_dev(nbd-disk), @@ -183,23 +196,39 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, msg.msg_controllen = 0; msg.msg_flags = msg_flags | MSG_NOSIGNAL; - if (send) { - struct timer_list ti; - - if (nbd-xmit_timeout) { - init_timer(ti); - ti.function = nbd_xmit_timeout; - ti.data = (unsigned long)current; - ti.expires = jiffies + nbd-xmit_timeout; -
[Nbd] [PATCH] nbd: improve request timeouts handling
The main idea behind it is to be able to quickly detect broken replica and switch over to another when used with any sort of mirror type device built on top of any number of nbd devices. Before this change a request would time out causing the socket to be shut down and the device to fail in case of a dead server or removed network cable only if: a) either the timer around kernel_sendmsg() kicked in b) or the TCP failures on retransmission finally caused an error on the socket, likely blocked on kernel_recvmsg() at this time, waiting for replies from the server Case a) depends mostly on the size of requests issued and on the maximum size of the socket buffer -- a lot of read request headers or small write requests could be sent without triggering the requested timeout Case b) timeout is independent of nbd-client -t timeout option as there is no TCP_USER_TIMEOUT set on the client socket by default. And even if such timeout was set it would not solve the problem of an nbd-client hung on receiving replies for much longer time without setting TCP keep-alives... and that would be the third, independent timeout setting required to make it work almost as expected... So, instead, take the big hammer approach and: *) trace the number of outstanding requests sent to the server (nbd-inflight) *) enable the timer (nbd-req_timer) before the first request is submitted and leave it enabled *) when sending next request do not touch the timer (it is up to the receiving side to control it at this point) *) on receive side update the timer every time a response is collected but there are more to read from the server *) disable the timer whenever the inflight counter drops to zero or an error (leading to the socket shutdown) is returned This patch does NOT prevent the server to process a request for longer than the timeout specified if only it replies to any other request submitted within the timeout (the server still may reply to a batch of requests in any order). Only the nbd-xmit_timeout != 0 code path is changed so the patch should not affect nbd connections running without an explicit timeout set on the nbd-client command line. There is also no way to enable or disable the timeout on an active (nbd-pid != 0) nbd device, it is however possible to change its value. Otherwise the inflight request counter would have to affect the nbd devices enabled without nbd-client -t timeout. Also move nbd-pid modifications behind nbd-tx_lock wherever possible to avoid races between the concurrent nbd-client invocations. Signed-off-by: Michal Belczyk belc...@bsd.krakow.pl --- drivers/block/nbd.c | 150 +--- include/linux/nbd.h | 6 +++ 2 files changed, 125 insertions(+), 31 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index c09640a..a61f1d3 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -140,11 +140,24 @@ static void sock_shutdown(struct nbd_device *nbd, int lock) static void nbd_xmit_timeout(unsigned long arg) { - struct task_struct *task = (struct task_struct *)arg; + struct nbd_device *nbd = (struct nbd_device *)arg; + struct task_struct *task_ary[2]; + unsigned long flags; + int i; - printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, - task-comm, task-pid); - force_sig(SIGKILL, task); + spin_lock_irqsave(nbd-timer_lock, flags); + nbd-timedout = 1; + task_ary[0] = nbd-sender; + task_ary[1] = nbd-receiver; + for (i = 0; i 2; i++) { + if (task_ary[i] == NULL) + continue; + printk(KERN_WARNING nbd: killing hung xmit (%s, pid: %d)\n, + task_ary[i]-comm, task_ary[i]-pid); + force_sig(SIGKILL, task_ary[i]); + break; + } + spin_unlock_irqrestore(nbd-timer_lock, flags); } /* @@ -158,7 +171,7 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, struct msghdr msg; struct kvec iov; sigset_t blocked, oldset; - unsigned long pflags = current-flags; + unsigned long flags, pflags = current-flags; if (unlikely(!sock)) { dev_err(disk_to_dev(nbd-disk), @@ -183,23 +196,39 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, msg.msg_controllen = 0; msg.msg_flags = msg_flags | MSG_NOSIGNAL; - if (send) { - struct timer_list ti; - - if (nbd-xmit_timeout) { - init_timer(ti); - ti.function = nbd_xmit_timeout; - ti.data = (unsigned long)current; - ti.expires = jiffies + nbd-xmit_timeout; - add_timer(ti); + if (nbd-xmit_timeout)