Package: openafs-client
Version: 1.6.9-2+deb8u4
Severity: normal

Dear Maintainer,

*** Reporter, please consider answering these questions, where appropriate ***

   * What led up to the situation?
   * What exactly did you do (or not do) that was effective (or
     ineffective)?
   * What was the outcome of this action?
   * What outcome did you expect instead?

*** End of the template - remove these template lines ***

Our Debian Jessie machines running OpenAFS occasionally drop lines of the
following form into dmesg:

[Sat Apr  9 16:08:06 2016] afs: Lost contact with file server 128.220.251.36 in
cell acm.jhu.edu (code -1) (multi-homed address; other same-host interfaces
maybe up)
[Sat Apr  9 16:08:06 2016] afs: Lost contact with file server 128.220.251.36 in
cell acm.jhu.edu (code -1) (multi-homed address; other same-host interfaces
maybe up)
[Sat Apr  9 16:08:13 2016] afs: file server 128.220.251.36 in cell acm.jhu.edu
is back up (code 105) (multi-homed address; other same-host interfaces may
still be down)
[Sat Apr  9 16:08:13 2016] afs: file server 128.220.251.36 in cell acm.jhu.edu
is back up (code 105) (multi-homed address; other same-host interfaces may
still be down)

Usually, despite this "spam", AFS seemingly continues to work, probably because
the server in question is multi-homed.

However, yesterday, I was logged into one of our Debian boxes where my home
directory
is stored in AFS, and suddenly the system began to lock up (new bash sessions
were
refusing to spawn, for example). I ran "dmesg -T" and got the following output
(I
grepped it to a single second, because there were a *lot* of messages):
http://paste.debian.net/427707/

I had to reboot the machine in order to get it to un-lock-up.

I asked the #openafs IRC channel for their opinion and got this back:

<secureendpoints> TC01: the cache manager is performing an operation that is
being interrupted.  Hence the ERESTARTSYS error.
<secureendpoints> Since the openafs cache manager doesn't know how to deal with
interrupted operations it is marking the server down.
<TC01> Oh, this is related to the ERESTARTSYS issues that caused the 4.4+
breakage?
<secureendpoints> yes
<TC01> I see. :(
<CybrFyre> TC01 - there are patches you can apply to work around the issue
<CybrFyre> unless you've found new brokenness :P
<kaduk> If you file a debian bug, it is more likely that a fix could make it
into jessie-updates.

Unfortunately, the mentioned patches bring with them a performance hit, but
that might
be preferably if it means stopping problems like this?

Anyway, I am filing a Debian bug, as requested. :)



-- System Information:
Debian Release: 8.4
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages openafs-client depends on:
ii  debconf [debconf-2.0]  1.5.56
ii  libc6                  2.19-18+deb8u4
ii  libcomerr2             1.42.12-1.1
ii  libk5crypto3           1.12.1+dfsg-19+deb8u2
ii  libkrb5-3              1.12.1+dfsg-19+deb8u2
ii  libncurses5            5.9+20140913-1+b1
ii  libtinfo5              5.9+20140913-1+b1

Versions of packages openafs-client recommends:
ii  lsof                  4.86+dfsg-1
ii  openafs-modules-dkms  1.6.9-2+deb8u4

Versions of packages openafs-client suggests:
pn  openafs-doc   <none>
ii  openafs-krb5  1.6.9-2+deb8u4

-- Configuration Files:
/etc/openafs/afs.conf changed:
test -f /etc/openafs/afs.conf.client && . /etc/openafs/afs.conf.client
VERBOSE=
OPTIONS=AUTOMATIC
      
        afs_post_init_hook() {
      
                # AFS hard-mount semantics, on both RO and RW volumes, retries
      
                # every 10 seconds.
      
                sysctl afs.hm_retry_RO=1
      
                sysctl afs.hm_retry_RW=1
      
                sysctl afs.hm_retry_int=10
        }
AFS_POST_INIT=afs_post_init_hook
AFS_PRE_SHUTDOWN=


-- debconf information:
* openafs-client/thiscell: acm.jhu.edu
  openafs-client/cell-info:
  openafs-client/fakestat: true
  openafs-client/run-client: true
  openafs-client/dynroot: Yes
  openafs-client/afsdb: true
* openafs-client/cachesize: 50000
  openafs-client/crypt: true

Reply via email to