Re: [OpenAFS] some older openafs-client versions have started failing
On Sat, 16 Jul 2016, Benjamin Kaduk wrote: > On Fri, 15 Jul 2016, Jonathan A. Kollasch wrote: > > > Jessie machine: > > > > # uname -a > > Linux eternium 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) > > x86_64 GNU/Linux > > > > # dpkg -l |grep linux-image-|grep -v dummy|grep -v meta > > ii linux-image-3.16.0-4-amd643.16.7-ckt25-2 amd64 > > Linux 3.16 for 64-bit PCs > > (Note that dpkg -l truncates the version field if it's too long; > dpkg-query -W gives the full version. But this is probably enough for > now.) > 3.16.7-ckt25-1 pulled in the "vfs: Make sendfile(2) killable even better" > change that triggered us to remove the use of splice in openafs. I guess > I should figure out how to do an upload to -backports so there's something > usable for jessie, then. To loop back on this, openafs 1.6.18.2-1~bpo8+1 should be appearing shortly in jessie-backports. Something for wheezy will end up being in wheezy-backports-sloppy, which will be a bit more work to backport and require a trip through NEW. -Ben ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
Hi Mark, > Ahh. But what about 1.6.16? Sorry, in my table under "working" I had listed confusingly listed 1.6.16 with "no debian package". This was a remnant from when I was testing Scientific Linux and was tracking the two distro's versions together. I haven't tested 1.6.16 in Debian. Chad. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
On Fri, 15 Jul 2016, Jonathan A. Kollasch wrote: > On Fri, Jul 15, 2016 at 01:34:25AM -0400, Benjamin Kaduk wrote: > > On Thu, 14 Jul 2016, Jonathan A. Kollasch wrote: > > > > > I currently see similar issues with Debian Wheezy and Debian Jessie. > > > > Can you please provide actual exact (debian) versions, for both the > > openafs-client and kernel? Attempting to say anything without them would > > require some level of speculation. > > Wheezy machine: > > # uname -a > Linux tazenda 3.2.0-4-amd64 #1 SMP Debian 3.2.81-1 x86_64 GNU/Linux > > # dpkg -l |grep linux-image-|grep -v dummy|grep -v meta > ii linux-image-3.2.0-4-amd64 3.2.81-1 amd64Linux > 3.2 for 64-bit PCs > > # dpkg -l |grep openafs > ii openafs-client1.6.1-3+deb7u6 amd64AFS > distributed filesystem client support > ii openafs-krb5 1.6.1-3+deb7u6 amd64AFS > distributed filesystem Kerberos 5 integration > ii openafs-modules-dkms 1.6.1-3+deb7u6 all AFS > distributed filesystem kernel module DKMS source Thanks. Normally I would suggest taking openafs from wheezy-backports, since that 1.6.1 version has a bunch of issues that weren't quite severe enough for me to ask for a SRU. But since the -backports version is basically the same as jessie, that won't really help with your current troubles... > # find /lib/modules/3.2.0-4-amd64 -name openafs\* -ls > 213952 1120 -rw-r--r-- 1 root root 1141072 Jul 10 17:29 > /lib/modules/3.2.0-4-amd64/updates/dkms/openafs.ko > > # uptime > 12:06:22 up 4 days, 18:31, 2 users, load average: 0.13, 0.10, 0.12 > > > Jessie machine: > > # uname -a > Linux eternium 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) > x86_64 GNU/Linux > > # dpkg -l |grep linux-image-|grep -v dummy|grep -v meta > ii linux-image-3.16.0-4-amd643.16.7-ckt25-2 amd64Linux > 3.16 for 64-bit PCs (Note that dpkg -l truncates the version field if it's too long; dpkg-query -W gives the full version. But this is probably enough for now.) 3.16.7-ckt25-1 pulled in the "vfs: Make sendfile(2) killable even better" change that triggered us to remove the use of splice in openafs. I guess I should figure out how to do an upload to -backports so there's something usable for jessie, then. (Also, everyone using Debian should feel free to report debian bugs against OpenAFS; that's a good way to let us maintainers know when issues appear.) Thanks, Ben > > # dpkg -l |grep openafs > ii openafs-client1.6.9-2+deb8u5 amd64AFS > distributed filesystem client support > ii openafs-krb5 1.6.9-2+deb8u5 amd64AFS > distributed filesystem Kerberos 5 integration > ii openafs-modules-dkms 1.6.9-2+deb8u5 all AFS > distributed filesystem kernel module DKMS source > > # find /lib/modules/3.16.0-4-amd64 -name openafs\* -ls > 130407 1352 -rw-r--r-- 1 root root 1383176 May 13 15:33 > /lib/modules/3.16.0-4-amd64/updates/dkms/openafs.ko > > # uptime > 12:13:25 up 4 days, 19:20, 2 users, load average: 0.00, 0.04, 0.08 > > > > > git gc consistently fails with ETIMEDOUT for the same path on both > > > machines. My fileservers have not changed recently. > > > > > > When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to > > > think this problem exists in the client/cache manager. > > > > There are new issues in recent versions of the openafs client that can > > manifest like this ... but that would not explain anything if you are > > using the versions from wheezy or even jessie. > > > > -Ben > > ___ > > OpenAFS-info mailing list > > OpenAFS-info@openafs.org > > https://lists.openafs.org/mailman/listinfo/openafs-info > ___ > OpenAFS-info mailing list > OpenAFS-info@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-info > ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
On Jul 16, 2016, at 12:43 AM, Benjamin Kaduk wrote: > On Sat, 16 Jul 2016, Mark Vitale wrote: > >> >> On Jul 15, 2016, at 10:39 PM, Chad William Seys >> wrote: >>> I found the break point in when openafs starts having problems with git >>> checkout on my test repo: >>> >>> First broken: 3.16.7-ckt25-1(compiled 2016-03-06) >>> Last working: 3.16.7-ckt20-1+deb8u4 >>> >>> Here is a changelog in case someone knows what to hunt for in: >>> http://metadata.ftp-master.debian.org/changelogs//main/l/linux/linux_3.16.7-ckt25-2_changelog >> >> thank you, this was very helpful. It's almost certainly: >> >>- vfs: Make sendfile(2) killable even better >>- vfs: Avoid softlockups with sendfile(2) >> >> which are backports from Linux 4.4. OpenAFS had to disable splice() support >> to be able to tolerate these changes. You need OpenAFS 1.6.18 or higher to >> obtain relief for this, and indeed you did report that 1.6.18 is working >> fine for you at this kernel level. >> >> However, this does NOT explain your report of no problems with OpenAFS >> 1.6.17 and 1.6.16. > > Actually, it does -- debian's 1.6.17-2 contains: > debian/patches/Linux-4.4-Do-not-use-splice.patch > which is what made it into upstream openafs 1.6.18. Ahh. But what about 1.6.16? --mark ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
On Sat, 16 Jul 2016, Mark Vitale wrote: > > On Jul 15, 2016, at 10:39 PM, Chad William Seys > wrote: > > I found the break point in when openafs starts having problems with git > > checkout on my test repo: > > > > First broken: 3.16.7-ckt25-1(compiled 2016-03-06) > > Last working: 3.16.7-ckt20-1+deb8u4 > > > > Here is a changelog in case someone knows what to hunt for in: > > http://metadata.ftp-master.debian.org/changelogs//main/l/linux/linux_3.16.7-ckt25-2_changelog > > thank you, this was very helpful. It's almost certainly: > > - vfs: Make sendfile(2) killable even better > - vfs: Avoid softlockups with sendfile(2) > > which are backports from Linux 4.4. OpenAFS had to disable splice() support > to be able to tolerate these changes. You need OpenAFS 1.6.18 or higher to > obtain relief for this, and indeed you did report that 1.6.18 is working fine > for you at this kernel level. > > However, this does NOT explain your report of no problems with OpenAFS 1.6.17 > and 1.6.16. Actually, it does -- debian's 1.6.17-2 contains: debian/patches/Linux-4.4-Do-not-use-splice.patch which is what made it into upstream openafs 1.6.18. -Ben ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
On Jul 15, 2016, at 10:39 PM, Chad William Seys wrote: > I found the break point in when openafs starts having problems with git > checkout on my test repo: > > First broken: 3.16.7-ckt25-1(compiled 2016-03-06) > Last working: 3.16.7-ckt20-1+deb8u4 > > Here is a changelog in case someone knows what to hunt for in: > http://metadata.ftp-master.debian.org/changelogs//main/l/linux/linux_3.16.7-ckt25-2_changelog thank you, this was very helpful. It's almost certainly: - vfs: Make sendfile(2) killable even better - vfs: Avoid softlockups with sendfile(2) which are backports from Linux 4.4. OpenAFS had to disable splice() support to be able to tolerate these changes. You need OpenAFS 1.6.18 or higher to obtain relief for this, and indeed you did report that 1.6.18 is working fine for you at this kernel level. However, this does NOT explain your report of no problems with OpenAFS 1.6.17 and 1.6.16. Could you please confirm that they are working fine? Regards, -- Mark Vitale Sine Nomine Associates___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
Hi all, I found the break point in when openafs starts having problems with git checkout on my test repo: First broken: 3.16.7-ckt25-1(compiled 2016-03-06) Last working: 3.16.7-ckt20-1+deb8u4 Here is a changelog in case someone knows what to hunt for in: http://metadata.ftp-master.debian.org/changelogs//main/l/linux/linux_3.16.7-ckt25-2_changelog Thanks! Chad. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
Hi all, Thanks to your help, I reverted to a previous version of the Debian kernel and was able to successfully git clone the troublesome repository. A working version of the Jessie kernel is: Linux mcd-db 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u5 (2015-10-09) x86_64 GNU/Linux > I am dismissive of the notion that the server's kernel version > matters since all of the fileserver code is in userland. I was vaguely hypothesizing that some networking code changed and goofed networking up in some way that AFS depends on, but not much else. (Not much of a hope?) > Can you please provide actual exact (debian) versions, for both the > openafs-client and kernel? Attempting to say anything without them > would require some level of speculation. See table below --- NOT WORKING --- 1.6.1-3+deb7u6 wheezy 3.2.81-1 1.6.9-2+deb8u4 jessie 3.16.7-ckt25-2 1.6.9-2+deb8u5 jessie 3.16.7-ckt25-2 1.6.14-1 jessie 3.16.7-ckt25-2 1.6.15-1 jessie 3.16.7-ckt25-2 --- WORKING --- 1.6.9-2+deb8u5 Jessie 3.16.7-ckt11-1+deb8u5 1.6.16 no debian package 1.6.17-2 jessie 3.16.7-ckt25-2 So what next? Should I be narrowing down which Debian kernel update broke AFS and report it against their kernel package? > I believe the Debian and Scientific Linux issues are unrelated because > the symptoms are so different. You're most likely right! I tried Scientific Linux again today and 1.6.9 and 1.6.15 from .src.rpm were fine. :/ Thanks again for your help! Chad. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
On Fri, Jul 15, 2016 at 01:34:25AM -0400, Benjamin Kaduk wrote: > On Thu, 14 Jul 2016, Jonathan A. Kollasch wrote: > > > I currently see similar issues with Debian Wheezy and Debian Jessie. > > Can you please provide actual exact (debian) versions, for both the > openafs-client and kernel? Attempting to say anything without them would > require some level of speculation. Wheezy machine: # uname -a Linux tazenda 3.2.0-4-amd64 #1 SMP Debian 3.2.81-1 x86_64 GNU/Linux # dpkg -l |grep linux-image-|grep -v dummy|grep -v meta ii linux-image-3.2.0-4-amd64 3.2.81-1 amd64Linux 3.2 for 64-bit PCs # dpkg -l |grep openafs ii openafs-client1.6.1-3+deb7u6 amd64AFS distributed filesystem client support ii openafs-krb5 1.6.1-3+deb7u6 amd64AFS distributed filesystem Kerberos 5 integration ii openafs-modules-dkms 1.6.1-3+deb7u6 all AFS distributed filesystem kernel module DKMS source # find /lib/modules/3.2.0-4-amd64 -name openafs\* -ls 213952 1120 -rw-r--r-- 1 root root 1141072 Jul 10 17:29 /lib/modules/3.2.0-4-amd64/updates/dkms/openafs.ko # uptime 12:06:22 up 4 days, 18:31, 2 users, load average: 0.13, 0.10, 0.12 Jessie machine: # uname -a Linux eternium 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux # dpkg -l |grep linux-image-|grep -v dummy|grep -v meta ii linux-image-3.16.0-4-amd643.16.7-ckt25-2 amd64Linux 3.16 for 64-bit PCs # dpkg -l |grep openafs ii openafs-client1.6.9-2+deb8u5 amd64AFS distributed filesystem client support ii openafs-krb5 1.6.9-2+deb8u5 amd64AFS distributed filesystem Kerberos 5 integration ii openafs-modules-dkms 1.6.9-2+deb8u5 all AFS distributed filesystem kernel module DKMS source # find /lib/modules/3.16.0-4-amd64 -name openafs\* -ls 130407 1352 -rw-r--r-- 1 root root 1383176 May 13 15:33 /lib/modules/3.16.0-4-amd64/updates/dkms/openafs.ko # uptime 12:13:25 up 4 days, 19:20, 2 users, load average: 0.00, 0.04, 0.08 > > git gc consistently fails with ETIMEDOUT for the same path on both > > machines. My fileservers have not changed recently. > > > > When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to > > think this problem exists in the client/cache manager. > > There are new issues in recent versions of the openafs client that can > manifest like this ... but that would not explain anything if you are > using the versions from wheezy or even jessie. > > -Ben > ___ > OpenAFS-info mailing list > OpenAFS-info@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
On 2016-07-15 07:34, Benjamin Kaduk wrote: > On Thu, 14 Jul 2016, Jonathan A. Kollasch wrote: > >> I currently see similar issues with Debian Wheezy and Debian Jessie. > > Can you please provide actual exact (debian) versions, for both the > openafs-client and kernel? Attempting to say anything without them would > require some level of speculation. > >> git gc consistently fails with ETIMEDOUT for the same path on both >> machines. My fileservers have not changed recently. >> >> When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to >> think this problem exists in the client/cache manager. > > There are new issues in recent versions of the openafs client that can > manifest like this ... but that would not explain anything if you are > using the versions from wheezy or even jessie. I would like to add, one of our debian users in here has a debian jessie system (1.6.9 OpenAFS) and had the timeout issues on git this week, too. We upgraded to OpenAFS from Debian sid (1.6.18) and it works again. All versions out of official debian repos. I do not know if the timeout on git operations started last week or not, it was the first time I got noticed about it. But as we only got very few linux users, I do not have a control group to check against. > -Ben MfG, Lars Schimmer -- - TU Graz, Institut für ComputerGraphik & WissensVisualisierung Tel: +43 316 873-5405 E-Mail: l.schim...@cgv.tugraz.at Fax: +43 316 873-5402 PGP-Key-ID: 0x4A9B1723 signature.asc Description: OpenPGP digital signature
Re: [OpenAFS] some older openafs-client versions have started failing
On Thu, 14 Jul 2016, Chad William Seys wrote: > Hi Ben, > > The Scientific Linux clients are using patched (by Redhat) 2.6.32 and the > Debian clients are using patched (by Debian) 3.2.78 and 3.16.7 . > > Do you suspect that a recent security patch, applied to all three kernels, > could have broken the older AFS clients? It has been known to happen. (In particular, the "Linux kernel changes to support interrupting splice operations." that Jeffrey mentions has been heavily backported, since it is supposed to make some "hung" processes more interruptible. Not quite a security issue, but it made it into a lot of distro kernels.) > I could certainly test this idea if it appears promising. I guess I'd start > with the server's kernel though: One data point that argues against it being I share Jeffrey's skepticism that the server's kernel version is relevant. > the client's kernel is that for the Scientific Linux box I booted up an > machine which had not been updated for a long time (kernel dated Mar 22, 2016) > and compiled openafs 1.6.15 (not functional) and 1.6.16 (functional). Just to be clear: those 1.6.15 and 1.6.16 were from-source builds of the stock OpenAFS releases? -Ben ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
On Thu, 14 Jul 2016, Jonathan A. Kollasch wrote: > I currently see similar issues with Debian Wheezy and Debian Jessie. Can you please provide actual exact (debian) versions, for both the openafs-client and kernel? Attempting to say anything without them would require some level of speculation. > git gc consistently fails with ETIMEDOUT for the same path on both > machines. My fileservers have not changed recently. > > When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to > think this problem exists in the client/cache manager. There are new issues in recent versions of the openafs client that can manifest like this ... but that would not explain anything if you are using the versions from wheezy or even jessie. -Ben ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
On 7/14/2016 6:18 PM, Chad William Seys wrote: > Hi Ben, > > The Scientific Linux clients are using patched (by Redhat) 2.6.32 and > the Debian clients are using patched (by Debian) 3.2.78 and 3.16.7 . > > Do you suspect that a recent security patch, applied to all three > kernels, could have broken the older AFS clients? > > I could certainly test this idea if it appears promising. I guess I'd > start with the server's kernel though: One data point that argues > against it being the client's kernel is that for the Scientific Linux > box I booted up an machine which had not been updated for a long time > (kernel dated Mar 22, 2016) and compiled openafs 1.6.15 (not functional) > and 1.6.16 (functional). > > Chad. I am dismissive of the notion that the server's kernel version matters since all of the fileserver code is in userland. I believe the Debian and Scientific Linux issues are unrelated because the symptoms are so different. If you said that 1.6.18 was the first version of OpenAFS to work on Debian I would correlate that with the Linux kernel changes to support interrupting splice operations. The splice operations were used by the OpenAFS client for StoreData RPCs to avoid an extra memory copy of every page that is written to the fileserver. The 1.6.18 release removed it. One of the symptoms of the splice change on OpenAFS clients was "git" operations failing in such a fashion that the OpenAFS client marked the fileserver state as "down". When that happens the "Connection timed out" error is logged regardless of the actual cause. Since you indicate that 1.6.16 is the first version to work, something else must be to blame on Debian. For the Scientific Linux issue you should obtain a stack trace for the hung "ls" process and collect cmdebug output for the affected cache manager. Jeffrey Altman <> smime.p7s Description: S/MIME Cryptographic Signature
Re: [OpenAFS] some older openafs-client versions have started failing
Hi Ben, The Scientific Linux clients are using patched (by Redhat) 2.6.32 and the Debian clients are using patched (by Debian) 3.2.78 and 3.16.7 . Do you suspect that a recent security patch, applied to all three kernels, could have broken the older AFS clients? I could certainly test this idea if it appears promising. I guess I'd start with the server's kernel though: One data point that argues against it being the client's kernel is that for the Scientific Linux box I booted up an machine which had not been updated for a long time (kernel dated Mar 22, 2016) and compiled openafs 1.6.15 (not functional) and 1.6.16 (functional). Chad. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
On Thu, 14 Jul 2016, Chad William Seys wrote: > Hi Jonathan, > > Well it is good to hear that someone else is having a similar problem! > > > When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to > > think this problem exists in the client/cache manager. > > So a bug in client/cache manager? > Why would it be triggered now? > > It seems as though the server or the network must be involved somehow. > Scientific Linux 6 openafs-client versions have made steps through 1.6.2 - > 1.6.17 while the server was running Wheezy with no problems. Now suddenly the > older versions are not reliable. I can't explain it. :( The other highly important factor which you did not mention was whether the linux kernel version on the clients has changed. -Ben ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
Hi Jonathan, Well it is good to hear that someone else is having a similar problem! When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to think this problem exists in the client/cache manager. So a bug in client/cache manager? Why would it be triggered now? It seems as though the server or the network must be involved somehow. Scientific Linux 6 openafs-client versions have made steps through 1.6.2 - 1.6.17 while the server was running Wheezy with no problems. Now suddenly the older versions are not reliable. I can't explain it. :( Chad. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] some older openafs-client versions have started failing
On Thu, Jul 14, 2016 at 02:55:49PM -0500, Chad William Seys wrote: > Hi all, > We have begun suddenly begun experiencing client failures and are > trying > to determine what is going on. > > openafs-client versions 1.6.9, 1.6.14, 1.6.15 fail in various ways*. On > Debian we can reproduce the problem by 'git checkout' a particular repo. It > fails with a "Connection timed out". On Scientific Linux the problem > manifests sooner: 'ls /afs/ANYCELL' hangs. > > openafs-client 1.6.16, 1.6.17, 1.6.18.1 seem to work normally. > > I've tried changing the server's fileserver version but that has no effect. > (Tried Debian packages with versions 1.6.1-3+deb7u6, 1.6.9+deb8u5, and > 1.6.18.1-1 .) > > We started noticing this problem after a power failure. We think what > happened was that new fileserver code started being used after the servers > rebooted. Probably fileserver code changed from Debian 1.6.1-3+deb7u5 to > 1.6.1-3+deb7u6 . Strangely though reverting back to what we think were the > working versions also does not work. > > Anyone have an idea of what might be going on ? > > Thanks! > Chad. I currently see similar issues with Debian Wheezy and Debian Jessie. git gc consistently fails with ETIMEDOUT for the same path on both machines. My fileservers have not changed recently. When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to think this problem exists in the client/cache manager. Jonathan Kollasch ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info