Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-03 Thread Jeffrey Altman
On 2/2/2018 6:04 PM, Kodiak Firesmith wrote: > I'm relatively new to handling OpenAFS.  Are these problems part of a > normal "kernel release; openafs update" cycle and perhaps I'm getting > snagged just by being too early of an adopter?  I wanted to raise the > alarm on this and see if anything

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Benjamin Kaduk
On Fri, Feb 02, 2018 at 04:20:59PM -0500, Kodiak Firesmith wrote: > Not much else to report today other than expanding my test base out to a > few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and am > still seeing the same results universally. Every host fails to boot due to > a

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Kodiak Firesmith
Thanks Stephan, I'm relatively new to handling OpenAFS. Are these problems part of a normal "kernel release; openafs update" cycle and perhaps I'm getting snagged just by being too early of an adopter? I wanted to raise the alarm on this and see if anything else was needed from me as the

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Stephan Wiesand
While additional data points are obviously most welcome, there is no expectation that this issue is fixed with 1.6.22.x or 1.8.x right now. Some serious work will be required to adapt OpenAFS to the changes in this kernel (series), though there's some hope that it won't be quite as hard to fix

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Kodiak Firesmith
Not much else to report today other than expanding my test base out to a few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and am still seeing the same results universally. Every host fails to boot due to a kernel panic when it tries to load the openafs DKMS kernel module. My

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Matt Vander Werf
Just for the sake of testing, I also installed 1.8.0pre4 RPMs on a RHEL 7.5 beta system and still had the same issue when using ls with directories under /afs/... Also (maybe this was already mentioned), it seems to be only directories as well. I can do an ls of a known file in my AFS home

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Stephan Wiesand
> On 2. Feb 2018, at 09:55, Stephan Wiesand wrote: > > >> On 2. Feb 2018, at 02:14, Benjamin Kaduk wrote: >> >> On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote: >>> Comparing the 1.6.22.2 module builds from the SL packaging, where the

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Stephan Wiesand
> On 2. Feb 2018, at 02:14, Benjamin Kaduk wrote: > > On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote: >> Comparing the 1.6.22.2 module builds from the SL packaging, where the kABI >> hashes of the used symbols are stored as a requirement, is seems none of >>

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Benjamin Kaduk
On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote: > Comparing the 1.6.22.2 module builds from the SL packaging, where the kABI > hashes of the used symbols are stored as a requirement, is seems none of > those hashes changed between -693 and -830. > > There are two differences in

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Gary Gatling
I tried testing a work in progress 1.6.22.2 on rhel 7.5 beta by doing git clone git://git.openafs.org/openafs.git cd openafs git checkout remotes/origin/openafs-stable-1_6_x HEAD is now at d25c8e8... Make OpenAFS 1.6.22.2 But it seems to have the same problems with directories so I guess

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Kodiak Firesmith
I just rebuilt off-the-shelf RPMs based off of http://www.openafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm thinking maybe we had some historical patch in our build area that might be causing the problem, but alas, even the off-the-shelf RPMs cause a full wedge and reboot when

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Kodiak Firesmith
Thanks for the replies! We're using DKMS and expected the dynamic re-roll of the kmods to work like any other kernel upgrade but that doesn't seem to be the case. I need to dig deeper, especially now that there is evidence that it's just our site. Thanks a bunch everyone. - Kodiak On Thu,

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Matt Vander Werf
I'm also seeing the same issue as Gary on some RHEL 7.5 beta boxes running OpenAFS 1.6.22.1. Can't run ls under any /afs/.../.../etc directory, including in my AFS home directory when logged in as myself. [mvanderw@ ~]$ ls ls: reading directory .: Not a directory [mvanderw@ ~]$ ls ~ ls: reading

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Stephan Wiesand
Comparing the 1.6.22.2 module builds from the SL packaging, where the kABI hashes of the used symbols are stored as a requirement, is seems none of those hashes changed between -693 and -830. There are two differences in the configure results: -ac_cv_linux_header_sched_signal_h=no

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Gary Gatling
Ok. This gets weirder. Any directory under /afs says Not a directory. But I can read files like /afs/eos.ncsu.edu/software/inventory/software_inventory just fine. On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling wrote: > I don't get a kernel panic but instead I get: > >

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Gary Gatling
I don't get a kernel panic but instead I get: [gsgatlin@localhost ~]$ ls /afs/ ls: reading directory /afs/: Not a directory [gsgatlin@localhost ~]$ which is pretty weird. I don't see anything in the syslog about problems with openafs Feb 1 10:44:24 localhost systemd: Starting OpenAFS Client