Re: long cifs timeout when share becomes unavailable
On Thu, 24 May 2012 16:44:58 +0400 Sergey Urushkin urush...@telros.ru wrote: Jeff Layton писал 24.05.2012 14:31: On Thu, 24 May 2012 10:13:29 +0400 Sergey Urushkin urush...@telros.ru wrote: And I can not see any timeout options for mount.cifs (except acl timeout). So, the actual questions are: 1) is there a way to avoid these hangs (analog of 'intr'?) and 2) how can I reduce this unreachable-host timeout (analog of 'timeo'?)? Maybe there are some variables in the sources? You can try setting the echo_retries kernel module parameter to 1, which should cut down the wait time to 60s. In 3.4, we've removed that parm and it's now set to 1 always. The timeout between echo requests (which is how we detect whether the server is still responding) is not currently tunable. That works like a charm, 60s is much better than 5m, thanks a lot. From documentation (for kernels before 3.4) it isn't clear what behavior this parm changes, maybe docs should be fixed some way? Well, this parm has gone away now in recent kernels so I'm not inclined to bother with documenting it. If you'd wish to do, maybe write up a section for the manpage on the socket and reconnect behavior? Could you explain to me why SMB_ECHO_INTERVAL is so big by default? We don't really want to spam the server with a ton of these requests. The idea is that we wait a while for a response and then check with the server to see if it's still alive before timing out the packet. Most servers will respond to most calls within this time period and it's on par with the default timeo= value for NFS over TCP. Some calls however can take a very long time (minutes) writes long past the end of the file, for instance. NTFS doesn't do sparse files so it has to zero-fill them and that can take a long time on slow storage. The echo is primarily to allow us to distinguish between servers that are just slow to respond to certain calls, and those that are truly unreachable. Is there any side effect of changing it to, for example, 30? Does this change means that every 30 secs (twice more often) client will send special packets to the server (even in the case mounted share isn't used by any application at the moment)? Yes, reducing that interval will make it send SMB echoes to the server more frequently. Will that have any side effects? Probably not, but I've not experimented with it. I don't see a lot of value in making this tunable since this is an error condition. What's about my first question? Could I make ls-like applications interruptible somehow? Most of these sleeps are TASK_KILLABLE, so they should respond to fatal signals (SIGKILL). -- Jeff Layton jlay...@samba.org -- To unsubscribe from this list: send the line unsubscribe linux-cifs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
long cifs timeout when share becomes unavailable
Hi, there are issues with cifs share mounted via mount.cifs (with recent kernels): the first 'ls' on the dir where cifs share is mounted after the server becomes unavailable 1) hangs (can't be interrupted with ^C) and 2) lasts about 5 minutes. The first problem appears everywhere I tested (ubuntu 10.04 with any distributed kernel, ubuntu 12.04, fedora 17), but with old kernels (tested with ubuntu 10.04 2.6.32 and 2.6.35) 'ls' is uninterruptable but hangs only for about 25 seconds (which makes this problem really less complex for old kernels). And with new kernels (ubuntu 10.04 3.0, ubuntu 12.04 3.2, fedora 17 3.3) I'm facing very long hangs of 'ls' (the second problem). And many GUI applications (e.g. nautilus, firefox, gnome-panel,mc) that query that directory for some reason appears to act the same way as 'ls', so the system becomes unusable for 5(!) minutes. When the server (tested with samba 3.6, win2003) becomes unavailable nothing is written on the mounted directory, so I can't understand why this timeout is so big. Here is how I tested this: # mount.cifs //fsrv/home /mnt -ouser=test,dom=wg,soft Password: # time ls /mnt Desktop Documents Program Files WINDOWS real 0m0.019s user 0m0.004s sys 0m0.012s # iptables -I OUTPUT -d 172.17.0.65 -j DROP # time ls /mnt # This 'ls' cannot be interrupted ls: cannot access /mnt: Host is down real 4m51.668s user 0m0.004s sys 0m0.016s # time ls /mnt # This 'ls' and all others after can be interrupted ls: cannot access /mnt: Host is down real 0m10.014s user 0m0.008s sys 0m0.004s I see these messages in syslog: kernel: [ 1625.552044] CIFS VFS: Server fsrv has not responded in 300 seconds. Reconnecting... kernel: [ 1655.509422] CIFS VFS: Unexpected lookup error -112 ... And I can not see any timeout options for mount.cifs (except acl timeout). So, the actual questions are: 1) is there a way to avoid these hangs (analog of 'intr'?) and 2) how can I reduce this unreachable-host timeout (analog of 'timeo'?)? Maybe there are some variables in the sources? Thanks. -- Best regards, Sergey Urushkin -- To unsubscribe from this list: send the line unsubscribe linux-cifs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: long cifs timeout when share becomes unavailable
Jeff Layton писал 24.05.2012 14:31: On Thu, 24 May 2012 10:13:29 +0400 Sergey Urushkin urush...@telros.ru wrote: And I can not see any timeout options for mount.cifs (except acl timeout). So, the actual questions are: 1) is there a way to avoid these hangs (analog of 'intr'?) and 2) how can I reduce this unreachable-host timeout (analog of 'timeo'?)? Maybe there are some variables in the sources? You can try setting the echo_retries kernel module parameter to 1, which should cut down the wait time to 60s. In 3.4, we've removed that parm and it's now set to 1 always. The timeout between echo requests (which is how we detect whether the server is still responding) is not currently tunable. That works like a charm, 60s is much better than 5m, thanks a lot. From documentation (for kernels before 3.4) it isn't clear what behavior this parm changes, maybe docs should be fixed some way? Could you explain to me why SMB_ECHO_INTERVAL is so big by default? Is there any side effect of changing it to, for example, 30? Does this change means that every 30 secs (twice more often) client will send special packets to the server (even in the case mounted share isn't used by any application at the moment)? What's about my first question? Could I make ls-like applications interruptible somehow? Thanks a lot again! -- Best regards, Sergey Urushkin -- To unsubscribe from this list: send the line unsubscribe linux-cifs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html