Re: long cifs timeout when share becomes unavailable

2012-05-25 Thread Jeff Layton
On Thu, 24 May 2012 16:44:58 +0400
Sergey Urushkin urush...@telros.ru wrote:

 Jeff Layton писал 24.05.2012 14:31:
  On Thu, 24 May 2012 10:13:29 +0400
  Sergey Urushkin urush...@telros.ru wrote:
 
   And I can not see any timeout options for mount.cifs (except acl 
  timeout).
  So, the actual questions are: 1) is there a way to avoid these hangs
  (analog of 'intr'?) and 2) how can I reduce this unreachable-host 
  timeout
  (analog of 'timeo'?)? Maybe there are some variables in the sources?
 
  You can try setting the echo_retries kernel module parameter to 1,
  which should cut down the wait time to 60s. In 3.4, we've removed
  that parm and it's now set to 1 always. The timeout between echo
  requests (which is how we detect whether the server is still
  responding) is not currently tunable.
 
 That works like a charm, 60s is much better than 5m, thanks a lot. From 
 documentation (for kernels before 3.4) it isn't clear what behavior this 
 parm changes, maybe docs should be fixed some way?

Well, this parm has gone away now in recent kernels so I'm not inclined
to bother with documenting it. If you'd wish to do, maybe write up a
section for the manpage on the socket and reconnect behavior?

 Could you explain to me why SMB_ECHO_INTERVAL is so big by default?

We don't really want to spam the server with a ton of these requests.
The idea is that we wait a while for a response and then check with the
server to see if it's still alive before timing out the packet. Most
servers will respond to most calls within this time period and it's on
par with the default timeo= value for NFS over TCP.

Some calls however can take a very long time (minutes) writes long past
the end of the file, for instance. NTFS doesn't do sparse files so it
has to zero-fill them and that can take a long time on slow storage.

The echo is primarily to allow us to distinguish between servers that
are just slow to respond to certain calls, and those that are truly
unreachable.

 Is 
 there any side effect of changing it to, for example, 30? Does this 
 change means that every 30 secs (twice more often) client will send 
 special packets to the server (even in the case mounted share isn't used 
 by any application at the moment)?

Yes, reducing that interval will make it send SMB echoes to the server
more frequently.

Will that have any side effects? Probably not, but I've not
experimented with it. I don't see a lot of value in making this tunable
since this is an error condition.

 What's about my first question? Could I make ls-like applications 
 interruptible somehow?
 

Most of these sleeps are TASK_KILLABLE, so they should respond to fatal
signals (SIGKILL).

-- 
Jeff Layton jlay...@samba.org
--
To unsubscribe from this list: send the line unsubscribe linux-cifs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


long cifs timeout when share becomes unavailable

2012-05-24 Thread Sergey Urushkin
Hi,

there are issues with cifs share mounted via mount.cifs (with recent
kernels): the first 'ls' on the dir where cifs share is mounted after the
server becomes unavailable 1) hangs (can't be interrupted with ^C) and 2)
lasts about 5 minutes. The first problem appears everywhere I tested
(ubuntu 10.04 with any distributed kernel, ubuntu 12.04, fedora 17), but
with old kernels (tested with ubuntu 10.04 2.6.32 and 2.6.35) 'ls' is
uninterruptable but hangs only for about 25 seconds (which makes this
problem really less complex for old kernels). And with new kernels (ubuntu
10.04 3.0, ubuntu 12.04 3.2, fedora 17 3.3) I'm facing very long hangs of
'ls' (the second problem). And many GUI applications (e.g. nautilus,
firefox, gnome-panel,mc) that query that directory for some reason appears
to act the same way as 'ls', so the system becomes unusable for 5(!)
minutes. When the server (tested with samba 3.6, win2003) becomes
unavailable nothing is written on the mounted directory, so I can't
understand why this timeout is so big. Here is how I tested this:

 # mount.cifs //fsrv/home /mnt -ouser=test,dom=wg,soft
 Password:
 # time ls /mnt
 Desktop Documents Program Files WINDOWS

 real 0m0.019s
 user 0m0.004s
 sys 0m0.012s
 # iptables -I OUTPUT -d 172.17.0.65 -j DROP
 # time ls /mnt # This 'ls' cannot be interrupted
 ls: cannot access /mnt: Host is down

 real 4m51.668s
 user 0m0.004s
 sys 0m0.016s
 # time ls /mnt # This 'ls' and all others after can be interrupted
 ls: cannot access /mnt: Host is down

 real 0m10.014s
 user 0m0.008s
 sys 0m0.004s

I see these messages in syslog:

 kernel: [ 1625.552044] CIFS VFS: Server fsrv has not responded in 300
seconds. Reconnecting...
 kernel: [ 1655.509422] CIFS VFS: Unexpected lookup error -112

 ...

 And I can not see any timeout options for mount.cifs (except acl timeout).
So, the actual questions are: 1) is there a way to avoid these hangs
(analog of 'intr'?) and 2) how can I reduce this unreachable-host timeout
(analog of 'timeo'?)? Maybe there are some variables in the sources?

Thanks.

-- 
Best regards,
Sergey Urushkin
--
To unsubscribe from this list: send the line unsubscribe linux-cifs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: long cifs timeout when share becomes unavailable

2012-05-24 Thread Sergey Urushkin

Jeff Layton писал 24.05.2012 14:31:

On Thu, 24 May 2012 10:13:29 +0400
Sergey Urushkin urush...@telros.ru wrote:


 And I can not see any timeout options for mount.cifs (except acl 
timeout).

So, the actual questions are: 1) is there a way to avoid these hangs
(analog of 'intr'?) and 2) how can I reduce this unreachable-host 
timeout

(analog of 'timeo'?)? Maybe there are some variables in the sources?


You can try setting the echo_retries kernel module parameter to 1,
which should cut down the wait time to 60s. In 3.4, we've removed
that parm and it's now set to 1 always. The timeout between echo
requests (which is how we detect whether the server is still
responding) is not currently tunable.


That works like a charm, 60s is much better than 5m, thanks a lot. From 
documentation (for kernels before 3.4) it isn't clear what behavior this 
parm changes, maybe docs should be fixed some way?
Could you explain to me why SMB_ECHO_INTERVAL is so big by default? Is 
there any side effect of changing it to, for example, 30? Does this 
change means that every 30 secs (twice more often) client will send 
special packets to the server (even in the case mounted share isn't used 
by any application at the moment)?
What's about my first question? Could I make ls-like applications 
interruptible somehow?


Thanks a lot again!

--
Best regards,
Sergey Urushkin
--
To unsubscribe from this list: send the line unsubscribe linux-cifs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html