On 7/14/2016 6:18 PM, Chad William Seys wrote: > Hi Ben, > > The Scientific Linux clients are using patched (by Redhat) 2.6.32 and > the Debian clients are using patched (by Debian) 3.2.78 and 3.16.7 . > > Do you suspect that a recent security patch, applied to all three > kernels, could have broken the older AFS clients? > > I could certainly test this idea if it appears promising. I guess I'd > start with the server's kernel though: One data point that argues > against it being the client's kernel is that for the Scientific Linux > box I booted up an machine which had not been updated for a long time > (kernel dated Mar 22, 2016) and compiled openafs 1.6.15 (not functional) > and 1.6.16 (functional). > > Chad.
I am dismissive of the notion that the server's kernel version matters since all of the fileserver code is in userland. I believe the Debian and Scientific Linux issues are unrelated because the symptoms are so different. If you said that 1.6.18 was the first version of OpenAFS to work on Debian I would correlate that with the Linux kernel changes to support interrupting splice operations. The splice operations were used by the OpenAFS client for StoreData RPCs to avoid an extra memory copy of every page that is written to the fileserver. The 1.6.18 release removed it. One of the symptoms of the splice change on OpenAFS clients was "git" operations failing in such a fashion that the OpenAFS client marked the fileserver state as "down". When that happens the "Connection timed out" error is logged regardless of the actual cause. Since you indicate that 1.6.16 is the first version to work, something else must be to blame on Debian. For the Scientific Linux issue you should obtain a stack trace for the hung "ls" process and collect cmdebug output for the affected cache manager. Jeffrey Altman
<<attachment: jaltman.vcf>>
smime.p7s
Description: S/MIME Cryptographic Signature