Re: [OpenAFS] some older openafs-client versions have started failing

2016-08-12 Thread Benjamin Kaduk
On Sat, 16 Jul 2016, Benjamin Kaduk wrote:

> On Fri, 15 Jul 2016, Jonathan A. Kollasch wrote:
>
> > Jessie machine:
> >
> > # uname -a
> > Linux eternium 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) 
> > x86_64 GNU/Linux
> >
> > # dpkg -l |grep linux-image-|grep -v dummy|grep -v meta
> > ii  linux-image-3.16.0-4-amd643.16.7-ckt25-2   amd64
> > Linux 3.16 for 64-bit PCs
>
> (Note that dpkg -l truncates the version field if it's too long;
> dpkg-query -W gives the full version.  But this is probably enough for
> now.)
> 3.16.7-ckt25-1 pulled in the "vfs: Make sendfile(2) killable even better"
> change that triggered us to remove the use of splice in openafs.  I guess
> I should figure out how to do an upload to -backports so there's something
> usable for jessie, then.

To loop back on this, openafs 1.6.18.2-1~bpo8+1 should be appearing
shortly in jessie-backports.

Something for wheezy will end up being in wheezy-backports-sloppy, which
will be a bit more work to backport and require a trip through NEW.

-Ben
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-16 Thread Chad William Seys

Hi Mark,

> Ahh.  But what about 1.6.16?

Sorry, in my table under "working" I had listed confusingly listed 
1.6.16 with "no debian package".  This was a remnant from when I was 
testing Scientific Linux and was tracking the two distro's versions 
together.


I haven't tested 1.6.16 in Debian.

Chad.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-16 Thread Benjamin Kaduk
On Fri, 15 Jul 2016, Jonathan A. Kollasch wrote:

> On Fri, Jul 15, 2016 at 01:34:25AM -0400, Benjamin Kaduk wrote:
> > On Thu, 14 Jul 2016, Jonathan A. Kollasch wrote:
> >
> > > I currently see similar issues with Debian Wheezy and Debian Jessie.
> >
> > Can you please provide actual exact (debian) versions, for both the
> > openafs-client and kernel?  Attempting to say anything without them would
> > require some level of speculation.
>
> Wheezy machine:
>
> # uname -a
> Linux tazenda 3.2.0-4-amd64 #1 SMP Debian 3.2.81-1 x86_64 GNU/Linux
>
> # dpkg -l |grep linux-image-|grep -v dummy|grep -v meta
> ii  linux-image-3.2.0-4-amd64 3.2.81-1 amd64Linux 
> 3.2 for 64-bit PCs
>
> # dpkg -l |grep openafs
> ii  openafs-client1.6.1-3+deb7u6   amd64AFS 
> distributed filesystem client support
> ii  openafs-krb5  1.6.1-3+deb7u6   amd64AFS 
> distributed filesystem Kerberos 5 integration
> ii  openafs-modules-dkms  1.6.1-3+deb7u6   all  AFS 
> distributed filesystem kernel module DKMS source

Thanks.  Normally I would suggest taking openafs from wheezy-backports,
since that 1.6.1 version has a bunch of issues that weren't quite severe
enough for me to ask for a SRU.  But since the -backports version is
basically the same as jessie, that won't really help with your current
troubles...

> # find /lib/modules/3.2.0-4-amd64 -name openafs\* -ls
> 213952 1120 -rw-r--r--   1 root root  1141072 Jul 10 17:29 
> /lib/modules/3.2.0-4-amd64/updates/dkms/openafs.ko
>
> # uptime
>  12:06:22 up 4 days, 18:31,  2 users,  load average: 0.13, 0.10, 0.12
>
>
> Jessie machine:
>
> # uname -a
> Linux eternium 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) 
> x86_64 GNU/Linux
>
> # dpkg -l |grep linux-image-|grep -v dummy|grep -v meta
> ii  linux-image-3.16.0-4-amd643.16.7-ckt25-2   amd64Linux 
> 3.16 for 64-bit PCs

(Note that dpkg -l truncates the version field if it's too long;
dpkg-query -W gives the full version.  But this is probably enough for
now.)
3.16.7-ckt25-1 pulled in the "vfs: Make sendfile(2) killable even better"
change that triggered us to remove the use of splice in openafs.  I guess
I should figure out how to do an upload to -backports so there's something
usable for jessie, then.

(Also, everyone using Debian should feel free to report debian bugs
against OpenAFS; that's a good way to let us maintainers know when issues
appear.)

Thanks,

Ben

>
> # dpkg -l |grep openafs
> ii  openafs-client1.6.9-2+deb8u5   amd64AFS 
> distributed filesystem client support
> ii  openafs-krb5  1.6.9-2+deb8u5   amd64AFS 
> distributed filesystem Kerberos 5 integration
> ii  openafs-modules-dkms  1.6.9-2+deb8u5   all  AFS 
> distributed filesystem kernel module DKMS source
>
> # find /lib/modules/3.16.0-4-amd64 -name openafs\* -ls
> 130407 1352 -rw-r--r--   1 root root  1383176 May 13 15:33 
> /lib/modules/3.16.0-4-amd64/updates/dkms/openafs.ko
>
> # uptime
>  12:13:25 up 4 days, 19:20,  2 users,  load average: 0.00, 0.04, 0.08
>
>
> > > git gc consistently fails with ETIMEDOUT for the same path on both
> > > machines.  My fileservers have not changed recently.
> > >
> > > When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to
> > > think this problem exists in the client/cache manager.
> >
> > There are new issues in recent versions of the openafs client that can
> > manifest like this ... but that would not explain anything if you are
> > using the versions from wheezy or even jessie.
> >
> > -Ben
> > ___
> > OpenAFS-info mailing list
> > OpenAFS-info@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-info
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-15 Thread Mark Vitale

On Jul 16, 2016, at 12:43 AM, Benjamin Kaduk 
 wrote:

> On Sat, 16 Jul 2016, Mark Vitale wrote:
> 
>> 
>> On Jul 15, 2016, at 10:39 PM, Chad William Seys  
>> wrote:
>>> I found the break point in when openafs starts having problems with git 
>>> checkout on my test repo:
>>> 
>>> First broken: 3.16.7-ckt25-1(compiled 2016-03-06)
>>> Last working: 3.16.7-ckt20-1+deb8u4
>>> 
>>> Here is a changelog in case someone knows what to hunt for in:
>>> http://metadata.ftp-master.debian.org/changelogs//main/l/linux/linux_3.16.7-ckt25-2_changelog
>> 
>> thank you, this was very helpful.  It's almost certainly:
>> 
>>- vfs: Make sendfile(2) killable even better
>>- vfs: Avoid softlockups with sendfile(2)
>> 
>> which are backports from Linux 4.4.  OpenAFS had to disable splice() support 
>> to be able to tolerate these changes.  You need OpenAFS 1.6.18 or higher to 
>> obtain relief for this, and indeed you did report that 1.6.18 is working 
>> fine for you at this kernel level.
>> 
>> However, this does NOT explain your report of no problems with OpenAFS 
>> 1.6.17 and 1.6.16.
> 
> Actually, it does -- debian's 1.6.17-2 contains:
> debian/patches/Linux-4.4-Do-not-use-splice.patch
> which is what made it into upstream openafs 1.6.18.

Ahh.  But what about 1.6.16?

--mark

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-15 Thread Benjamin Kaduk
On Sat, 16 Jul 2016, Mark Vitale wrote:

>
> On Jul 15, 2016, at 10:39 PM, Chad William Seys  
> wrote:
> >  I found the break point in when openafs starts having problems with git 
> > checkout on my test repo:
> >
> > First broken: 3.16.7-ckt25-1(compiled 2016-03-06)
> > Last working: 3.16.7-ckt20-1+deb8u4
> >
> > Here is a changelog in case someone knows what to hunt for in:
> > http://metadata.ftp-master.debian.org/changelogs//main/l/linux/linux_3.16.7-ckt25-2_changelog
>
> thank you, this was very helpful.  It's almost certainly:
>
> - vfs: Make sendfile(2) killable even better
> - vfs: Avoid softlockups with sendfile(2)
>
> which are backports from Linux 4.4.  OpenAFS had to disable splice() support 
> to be able to tolerate these changes.  You need OpenAFS 1.6.18 or higher to 
> obtain relief for this, and indeed you did report that 1.6.18 is working fine 
> for you at this kernel level.
>
> However, this does NOT explain your report of no problems with OpenAFS 1.6.17 
> and 1.6.16.

Actually, it does -- debian's 1.6.17-2 contains:
debian/patches/Linux-4.4-Do-not-use-splice.patch
which is what made it into upstream openafs 1.6.18.

-Ben
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-15 Thread Mark Vitale

On Jul 15, 2016, at 10:39 PM, Chad William Seys  wrote:
>  I found the break point in when openafs starts having problems with git 
> checkout on my test repo:
> 
> First broken: 3.16.7-ckt25-1(compiled 2016-03-06)
> Last working: 3.16.7-ckt20-1+deb8u4
> 
> Here is a changelog in case someone knows what to hunt for in:
> http://metadata.ftp-master.debian.org/changelogs//main/l/linux/linux_3.16.7-ckt25-2_changelog

thank you, this was very helpful.  It's almost certainly:

- vfs: Make sendfile(2) killable even better
- vfs: Avoid softlockups with sendfile(2)

which are backports from Linux 4.4.  OpenAFS had to disable splice() support to 
be able to tolerate these changes.  You need OpenAFS 1.6.18 or higher to obtain 
relief for this, and indeed you did report that 1.6.18 is working fine for you 
at this kernel level.

However, this does NOT explain your report of no problems with OpenAFS 1.6.17 
and 1.6.16.
Could you please confirm that they are working fine?   

Regards,
--
Mark Vitale
Sine Nomine Associates___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-15 Thread Chad William Seys

Hi all,
  I found the break point in when openafs starts having problems with 
git checkout on my test repo:


First broken: 3.16.7-ckt25-1(compiled 2016-03-06)
Last working: 3.16.7-ckt20-1+deb8u4

Here is a changelog in case someone knows what to hunt for in:
http://metadata.ftp-master.debian.org/changelogs//main/l/linux/linux_3.16.7-ckt25-2_changelog

Thanks!
Chad.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-15 Thread Chad William Seys

Hi all,

Thanks to your help, I reverted to a previous version of the Debian 
kernel and was able to successfully git clone the troublesome repository.


A working version of the Jessie kernel is:

Linux mcd-db 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u5 
(2015-10-09) x86_64 GNU/Linux



> I am dismissive of the notion that the server's kernel version
> matters since all of the fileserver code is in userland.

I was vaguely hypothesizing that some networking code changed and goofed 
networking up in some way that AFS depends on, but not much else.  (Not 
much of a hope?)


> Can you please provide actual exact (debian) versions, for both the
> openafs-client and kernel?  Attempting to say anything without them
> would require some level of speculation.

See table below

--- NOT WORKING ---
1.6.1-3+deb7u6 wheezy 3.2.81-1
1.6.9-2+deb8u4 jessie 3.16.7-ckt25-2
1.6.9-2+deb8u5 jessie 3.16.7-ckt25-2
1.6.14-1   jessie 3.16.7-ckt25-2
1.6.15-1   jessie 3.16.7-ckt25-2
--- WORKING ---
1.6.9-2+deb8u5 Jessie 3.16.7-ckt11-1+deb8u5
1.6.16 no debian package
1.6.17-2   jessie 3.16.7-ckt25-2

So what next?
Should I be narrowing down which Debian kernel update broke AFS and 
report it against their kernel package?


> I believe the Debian and Scientific Linux issues are unrelated because
> the symptoms are so different.

You're most likely right!  I tried Scientific Linux again today and 
1.6.9 and 1.6.15 from .src.rpm were fine. :/


Thanks again for your help!
Chad.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-15 Thread Jonathan A. Kollasch
On Fri, Jul 15, 2016 at 01:34:25AM -0400, Benjamin Kaduk wrote:
> On Thu, 14 Jul 2016, Jonathan A. Kollasch wrote:
> 
> > I currently see similar issues with Debian Wheezy and Debian Jessie.
> 
> Can you please provide actual exact (debian) versions, for both the
> openafs-client and kernel?  Attempting to say anything without them would
> require some level of speculation.

Wheezy machine:

# uname -a
Linux tazenda 3.2.0-4-amd64 #1 SMP Debian 3.2.81-1 x86_64 GNU/Linux

# dpkg -l |grep linux-image-|grep -v dummy|grep -v meta
ii  linux-image-3.2.0-4-amd64 3.2.81-1 amd64Linux 
3.2 for 64-bit PCs

# dpkg -l |grep openafs
ii  openafs-client1.6.1-3+deb7u6   amd64AFS 
distributed filesystem client support
ii  openafs-krb5  1.6.1-3+deb7u6   amd64AFS 
distributed filesystem Kerberos 5 integration
ii  openafs-modules-dkms  1.6.1-3+deb7u6   all  AFS 
distributed filesystem kernel module DKMS source

# find /lib/modules/3.2.0-4-amd64 -name openafs\* -ls
213952 1120 -rw-r--r--   1 root root  1141072 Jul 10 17:29 
/lib/modules/3.2.0-4-amd64/updates/dkms/openafs.ko

# uptime
 12:06:22 up 4 days, 18:31,  2 users,  load average: 0.13, 0.10, 0.12


Jessie machine:

# uname -a
Linux eternium 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 
GNU/Linux

# dpkg -l |grep linux-image-|grep -v dummy|grep -v meta
ii  linux-image-3.16.0-4-amd643.16.7-ckt25-2   amd64Linux 
3.16 for 64-bit PCs

# dpkg -l |grep openafs
ii  openafs-client1.6.9-2+deb8u5   amd64AFS 
distributed filesystem client support
ii  openafs-krb5  1.6.9-2+deb8u5   amd64AFS 
distributed filesystem Kerberos 5 integration
ii  openafs-modules-dkms  1.6.9-2+deb8u5   all  AFS 
distributed filesystem kernel module DKMS source

# find /lib/modules/3.16.0-4-amd64 -name openafs\* -ls
130407 1352 -rw-r--r--   1 root root  1383176 May 13 15:33 
/lib/modules/3.16.0-4-amd64/updates/dkms/openafs.ko

# uptime
 12:13:25 up 4 days, 19:20,  2 users,  load average: 0.00, 0.04, 0.08


> > git gc consistently fails with ETIMEDOUT for the same path on both
> > machines.  My fileservers have not changed recently.
> >
> > When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to
> > think this problem exists in the client/cache manager.
> 
> There are new issues in recent versions of the openafs client that can
> manifest like this ... but that would not explain anything if you are
> using the versions from wheezy or even jessie.
> 
> -Ben
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-15 Thread Lars Schimmer
On 2016-07-15 07:34, Benjamin Kaduk wrote:
> On Thu, 14 Jul 2016, Jonathan A. Kollasch wrote:
> 
>> I currently see similar issues with Debian Wheezy and Debian Jessie.
> 
> Can you please provide actual exact (debian) versions, for both the
> openafs-client and kernel?  Attempting to say anything without them would
> require some level of speculation.
> 
>> git gc consistently fails with ETIMEDOUT for the same path on both
>> machines.  My fileservers have not changed recently.
>>
>> When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to
>> think this problem exists in the client/cache manager.
> 
> There are new issues in recent versions of the openafs client that can
> manifest like this ... but that would not explain anything if you are
> using the versions from wheezy or even jessie.

I would like to add, one of our debian users in here has a debian jessie
system (1.6.9 OpenAFS) and had the timeout issues on git this week, too.
We upgraded to OpenAFS from Debian sid (1.6.18) and it works again.
All versions out of official debian repos.


I do not know if the timeout on git operations started last week or not,
it was the first time I got noticed about it.
But as we only got very few linux users, I do not have a control group
to check against.

> -Ben

MfG,
Lars Schimmer
-- 
-
TU Graz, Institut für ComputerGraphik & WissensVisualisierung
Tel: +43 316 873-5405   E-Mail: l.schim...@cgv.tugraz.at
Fax: +43 316 873-5402   PGP-Key-ID: 0x4A9B1723





signature.asc
Description: OpenPGP digital signature


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-14 Thread Benjamin Kaduk
On Thu, 14 Jul 2016, Chad William Seys wrote:

> Hi Ben,
>
> The Scientific Linux clients are using patched (by Redhat) 2.6.32 and the
> Debian clients are using patched (by Debian) 3.2.78 and 3.16.7 .
>
> Do you suspect that a recent security patch, applied to all three kernels,
> could have broken the older AFS clients?

It has been known to happen.  (In particular, the "Linux kernel changes to
support interrupting splice operations." that Jeffrey mentions has been
heavily backported, since it is supposed to make some "hung" processes
more interruptible.  Not quite a security issue, but it made it into a lot
of distro kernels.)

> I could certainly test this idea if it appears promising.  I guess I'd start
> with the server's kernel though: One data point that argues against it being

I share Jeffrey's skepticism that the server's kernel version is relevant.

> the client's kernel is that for the Scientific Linux box I booted up an
> machine which had not been updated for a long time (kernel dated Mar 22, 2016)
> and compiled openafs 1.6.15 (not functional) and 1.6.16 (functional).

Just to be clear: those 1.6.15 and 1.6.16 were from-source builds of the
stock OpenAFS releases?

-Ben
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-14 Thread Benjamin Kaduk
On Thu, 14 Jul 2016, Jonathan A. Kollasch wrote:

> I currently see similar issues with Debian Wheezy and Debian Jessie.

Can you please provide actual exact (debian) versions, for both the
openafs-client and kernel?  Attempting to say anything without them would
require some level of speculation.

> git gc consistently fails with ETIMEDOUT for the same path on both
> machines.  My fileservers have not changed recently.
>
> When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to
> think this problem exists in the client/cache manager.

There are new issues in recent versions of the openafs client that can
manifest like this ... but that would not explain anything if you are
using the versions from wheezy or even jessie.

-Ben
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-14 Thread Jeffrey Altman
On 7/14/2016 6:18 PM, Chad William Seys wrote:
> Hi Ben,
> 
> The Scientific Linux clients are using patched (by Redhat) 2.6.32 and
> the Debian clients are using patched (by Debian) 3.2.78 and 3.16.7 .
> 
> Do you suspect that a recent security patch, applied to all three
> kernels, could have broken the older AFS clients?
> 
> I could certainly test this idea if it appears promising.  I guess I'd
> start with the server's kernel though: One data point that argues
> against it being the client's kernel is that for the Scientific Linux
> box I booted up an machine which had not been updated for a long time
> (kernel dated Mar 22, 2016) and compiled openafs 1.6.15 (not functional)
> and 1.6.16 (functional).
> 
> Chad.

I am dismissive of the notion that the server's kernel version matters
since all of the fileserver code is in userland.

I believe the Debian and Scientific Linux issues are unrelated because
the symptoms are so different.

If you said that 1.6.18 was the first version of OpenAFS to work on
Debian I would correlate that with the Linux kernel changes to support
interrupting splice operations.  The splice operations were used by the
OpenAFS client for StoreData RPCs to avoid an extra memory copy of every
page that is written to the fileserver.  The 1.6.18 release removed it.

One of the symptoms of the splice change on OpenAFS clients was "git"
operations failing in such a fashion that the OpenAFS client marked the
fileserver state as "down".  When that happens the

  "Connection timed out"

error is logged regardless of the actual cause.

Since you indicate that 1.6.16 is the first version to work, something
else must be to blame on Debian.

For the Scientific Linux issue you should obtain a stack trace for the
hung "ls" process and collect cmdebug output for the affected cache manager.

Jeffrey Altman
<>

smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-14 Thread Chad William Seys

Hi Ben,

The Scientific Linux clients are using patched (by Redhat) 2.6.32 and 
the Debian clients are using patched (by Debian) 3.2.78 and 3.16.7 .


Do you suspect that a recent security patch, applied to all three 
kernels, could have broken the older AFS clients?


I could certainly test this idea if it appears promising.  I guess I'd 
start with the server's kernel though: One data point that argues 
against it being the client's kernel is that for the Scientific Linux 
box I booted up an machine which had not been updated for a long time 
(kernel dated Mar 22, 2016) and compiled openafs 1.6.15 (not functional) 
and 1.6.16 (functional).


Chad.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-14 Thread Benjamin Kaduk
On Thu, 14 Jul 2016, Chad William Seys wrote:

> Hi Jonathan,
>
> Well it is good to hear that someone else is having a similar problem!
>
> > When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to
> > think this problem exists in the client/cache manager.
>
> So a bug in client/cache manager?
> Why would it be triggered now?
>
> It seems as though the server or the network must be involved somehow.
> Scientific Linux 6 openafs-client versions have made steps through 1.6.2 -
> 1.6.17 while the server was running Wheezy with no problems.  Now suddenly the
> older versions are not reliable.  I can't explain it. :(

The other highly important factor which you did not mention was whether
the linux kernel version on the clients has changed.

-Ben
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-14 Thread Chad William Seys

Hi Jonathan,

Well it is good to hear that someone else is having a similar problem!


When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to
think this problem exists in the client/cache manager.


So a bug in client/cache manager?
Why would it be triggered now?

It seems as though the server or the network must be involved somehow. 
Scientific Linux 6 openafs-client versions have made steps through 1.6.2 
- 1.6.17 while the server was running Wheezy with no problems.  Now 
suddenly the older versions are not reliable.  I can't explain it. :(


Chad.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-14 Thread Jonathan A. Kollasch
On Thu, Jul 14, 2016 at 02:55:49PM -0500, Chad William Seys wrote:
> Hi all,
>   We have begun suddenly begun experiencing client failures and are 
> trying 
> to determine what is going on.
> 
> openafs-client versions 1.6.9, 1.6.14, 1.6.15 fail in various ways*.  On 
> Debian we can reproduce the problem by 'git checkout' a particular repo. It 
> fails with a "Connection timed out".  On Scientific Linux the problem 
> manifests sooner: 'ls /afs/ANYCELL' hangs.  
> 
> openafs-client 1.6.16, 1.6.17, 1.6.18.1 seem to work normally.
> 
> I've tried changing the server's fileserver version but that has no effect.  
> (Tried Debian packages with versions 1.6.1-3+deb7u6, 1.6.9+deb8u5, and 
> 1.6.18.1-1 .)
> 
> We started noticing this problem after a power failure.  We think what 
> happened was that new fileserver code started being used after the servers 
> rebooted.  Probably fileserver code changed from Debian 1.6.1-3+deb7u5 to 
> 1.6.1-3+deb7u6 .  Strangely though reverting back to what we think were the 
> working versions also does not work.
> 
> Anyone have an idea of what might be going on ?
> 
> Thanks!
> Chad. 

I currently see similar issues with Debian Wheezy and Debian Jessie.
git gc consistently fails with ETIMEDOUT for the same path on both
machines.  My fileservers have not changed recently.

When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to
think this problem exists in the client/cache manager.

Jonathan Kollasch
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info