Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-03 Thread Jeffrey Altman
On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> I'm relatively new to handling OpenAFS.  Are these problems part of a
> normal "kernel release; openafs update" cycle and perhaps I'm getting
> snagged just by being too early of an adopter?  I wanted to raise the
> alarm on this and see if anything else was needed from me as the
> reporter of the issue, but perhaps that's an overreaction to what is
> just part of a normal process I just haven't been tuned into in prior
> RHEL release cycles?


Kodiak,

On RHEL, DKMS is safe to use for kernel modules that restrict themselves
to using the restricted set of kernel interfaces (the RHEL KABI) that
Red Hat has designated will be supported across the lifespan of the RHEL
major version number.  OpenAFS is not such a kernel module.  As a result
it is vulnerable to breakage each and every time a new kernel is shipped.

There are two types of failures that can occur:

 1. a change results in failure to build the OpenAFS kernel module
for the new kernel

 2. a change results in the OpenAFS kernel module building and
successfully loading but failing to operate correctly

It is the second of these possibilities that has taken place with the
release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5 beta.

Are you an early adopter of RHEL 7.5 beta?  Absolutely, its a beta
release and as such you should expect that there will be bugs and that
third party kernel modules that do not adhere to the KABI functionality
might have compatibility issues.

There was a compatibility issue with RHEL 7.4 kernel
(3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6
release series this past week as part of 1.6.22.2:

  http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2

Jeffrey Altman
AuriStor, Inc.

P.S. - Welcome to the community.

<>

smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Benjamin Kaduk
On Fri, Feb 02, 2018 at 04:20:59PM -0500, Kodiak Firesmith wrote:
> Not much else to report today other than expanding my test base out to a
> few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and am
> still seeing the same results universally.  Every host fails to boot due to
> a kernel panic when it tries to load the openafs DKMS kernel module.

The screen picture you posted earlier had two entries for attempting
to start the openafs client (both failed).  The client is known to
panic if afsd is run a second time without an unload/load of the
kernel module in between.  Is it possible that this is happening in
your setup?

-Ben
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Kodiak Firesmith
Thanks Stephan,
I'm relatively new to handling OpenAFS.  Are these problems part of a
normal "kernel release; openafs update" cycle and perhaps I'm getting
snagged just by being too early of an adopter?  I wanted to raise the alarm
on this and see if anything else was needed from me as the reporter of the
issue, but perhaps that's an overreaction to what is just part of a normal
process I just haven't been tuned into in prior RHEL release cycles?

Should I try to get an account set up at http://rt.central.org and file a
bug?

Thanks!
 - Kodiak

On Fri, Feb 2, 2018 at 4:36 PM, Stephan Wiesand 
wrote:

> While additional data points are obviously most welcome, there is no
> expectation that this issue is fixed with 1.6.22.x or 1.8.x right now. Some
> serious work will be required to adapt OpenAFS to the changes in this
> kernel (series), though there's some hope that it won't be quite as hard to
> fix as the 7.4 getcwd issue.
>
> - Stephan
>
> > On 02.Feb 2018, at 22:20, Kodiak Firesmith  wrote:
> >
> > Not much else to report today other than expanding my test base out to a
> few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and am
> still seeing the same results universally.  Every host fails to boot due to
> a kernel panic when it tries to load the openafs DKMS kernel module.
> >
> > My next move on Monday will be to try an actual kernel-specific kmod
> instead of DKMS.  If that works I'll be kind of sad since we've had great
> luck with DKMS until now.
> >
> >  - Kodiak
> >
> > On Thu, Feb 1, 2018 at 3:26 PM, Kodiak Firesmith 
> wrote:
> > I just rebuilt off-the-shelf RPMs based off of
> http://www.openafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm
> thinking maybe we had some historical patch in our build area that might be
> causing the problem, but alas, even the off-the-shelf RPMs cause a full
> wedge and reboot when openafs-client.service starts up.
> >
> >  - Kodiak
> >
> > On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith 
> wrote:
> > Hello Rich!
> > It's a Dell Optiplex 7020 with an Intel i7-4790.
> >
> > Thanks!
> >  - Kodiak
> >
> > On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow  wrote:
> > On 01/31/2018 09:43 AM, Kodiak Firesmith wrote:
> > https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3
> >
> > Greetings
> >
> > What processor..etc is this machine?
> >
> > Rich
> >
> >
> >
> >
> > On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith  > wrote:
> >
> > Folks, re-sending this because the first try never hit the list -
> perhaps
> > mail with attachments are silently dropped or held for manual
> moderation? I'd originally attached an image of the stack trace.  I'll
> host it and reply
> > to this with a  URL link in case that would also result in a drop or
> moderation.
> >
> >
> >
> > Anyhow:
> >
> > In testing the new RHEL 7.5 beta, we've discovered that hosts using
> AFS fail
> > to boot after the upgrade, with Openafs 1.6.22.1 installed.
> >
> > We are wondering if some of the non-guaranteed kernel ABIs that
> OpenAFS uses
> > might have changed with the latest kernel provided in RHEL 7.
> >
> > I've attached a picture of the trace.
> >
> > Anyone else kicking the tires on the new RHEL yet?
> >
> > Thanks!
>
>


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Stephan Wiesand
While additional data points are obviously most welcome, there is no 
expectation that this issue is fixed with 1.6.22.x or 1.8.x right now. Some 
serious work will be required to adapt OpenAFS to the changes in this kernel 
(series), though there's some hope that it won't be quite as hard to fix as the 
7.4 getcwd issue.

- Stephan

> On 02.Feb 2018, at 22:20, Kodiak Firesmith  wrote:
> 
> Not much else to report today other than expanding my test base out to a few 
> more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and am still 
> seeing the same results universally.  Every host fails to boot due to a 
> kernel panic when it tries to load the openafs DKMS kernel module.
> 
> My next move on Monday will be to try an actual kernel-specific kmod instead 
> of DKMS.  If that works I'll be kind of sad since we've had great luck with 
> DKMS until now.
> 
>  - Kodiak
> 
> On Thu, Feb 1, 2018 at 3:26 PM, Kodiak Firesmith  wrote:
> I just rebuilt off-the-shelf RPMs based off of 
> http://www.openafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm 
> thinking maybe we had some historical patch in our build area that might be 
> causing the problem, but alas, even the off-the-shelf RPMs cause a full wedge 
> and reboot when openafs-client.service starts up.  
> 
>  - Kodiak
> 
> On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith  wrote:
> Hello Rich!
> It's a Dell Optiplex 7020 with an Intel i7-4790.
> 
> Thanks!
>  - Kodiak
> 
> On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow  wrote:
> On 01/31/2018 09:43 AM, Kodiak Firesmith wrote:
> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3
> 
> Greetings
> 
> What processor..etc is this machine?
> 
> Rich
> 
> 
> 
> 
> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith  > wrote:
> 
> Folks, re-sending this because the first try never hit the list - perhaps
> mail with attachments are silently dropped or held for manual moderation? 
> I'd originally attached an image of the stack trace.  I'll host it and 
> reply
> to this with a  URL link in case that would also result in a drop or 
> moderation.
> 
> 
> 
> Anyhow:
> 
> In testing the new RHEL 7.5 beta, we've discovered that hosts using AFS 
> fail
> to boot after the upgrade, with Openafs 1.6.22.1 installed.
> 
> We are wondering if some of the non-guaranteed kernel ABIs that OpenAFS 
> uses
> might have changed with the latest kernel provided in RHEL 7.
> 
> I've attached a picture of the trace.
> 
> Anyone else kicking the tires on the new RHEL yet?
> 
> Thanks!

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Kodiak Firesmith
Not much else to report today other than expanding my test base out to a
few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and am
still seeing the same results universally.  Every host fails to boot due to
a kernel panic when it tries to load the openafs DKMS kernel module.

My next move on Monday will be to try an actual kernel-specific kmod
instead of DKMS.  If that works I'll be kind of sad since we've had great
luck with DKMS until now.

 - Kodiak

On Thu, Feb 1, 2018 at 3:26 PM, Kodiak Firesmith 
wrote:

> I just rebuilt off-the-shelf RPMs based off of http://www.openafs.org/dl/
> openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm thinking maybe we had some
> historical patch in our build area that might be causing the problem, but
> alas, even the off-the-shelf RPMs cause a full wedge and reboot when
> openafs-client.service starts up.
>
>  - Kodiak
>
> On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith 
> wrote:
>
>> Hello Rich!
>> It's a Dell Optiplex 7020 with an Intel i7-4790.
>>
>> Thanks!
>>  - Kodiak
>>
>> On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow  wrote:
>>
>>> On 01/31/2018 09:43 AM, Kodiak Firesmith wrote:
>>>
 https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3

>>>
>>> Greetings
>>>
>>> What processor..etc is this machine?
>>>
>>> Rich
>>>
>>>
>>>

 On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith > wrote:

 Folks, re-sending this because the first try never hit the list -
 perhaps
 mail with attachments are silently dropped or held for manual
 moderation? I'd originally attached an image of the stack trace.  I'll
 host it and reply
 to this with a  URL link in case that would also result in a drop
 or moderation.



 Anyhow:

 In testing the new RHEL 7.5 beta, we've discovered that hosts using
 AFS fail
 to boot after the upgrade, with Openafs 1.6.22.1 installed.

 We are wondering if some of the non-guaranteed kernel ABIs that
 OpenAFS uses
 might have changed with the latest kernel provided in RHEL 7.

 I've attached a picture of the trace.

 Anyone else kicking the tires on the new RHEL yet?

 Thanks!



>>>
>>> --
>>> Rich Sudlow
>>> University of Notre Dame
>>> Center for Research Computing - Union Station
>>> 506 W. South St
>>> South Bend, In 46601
>>>
>>> (574) 631-7258 (office)
>>> (574) 807-1046 (cell)
>>>
>>
>>
>


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Matt Vander Werf
Just for the sake of testing, I also installed 1.8.0pre4 RPMs on a RHEL 7.5
beta system and still had the same issue when using ls with directories
under /afs/...

Also (maybe this was already mentioned), it seems to be only directories as
well. I can do an ls of a known file in my AFS home directory just fine:

[mvanderw@ ~]$ echo testing > /afs/crc.nd.edu/user/m/mvanderw/testing
[mvanderw@ ~]$ cat /afs/crc.nd.edu/user/m/mvanderw/testing
testing
[mvanderw@ ~]$ ls -al /afs/crc.nd.edu/user/m/mvanderw/testing
-rw-r--r-- 1 mvanderw campus 8 Feb  2 11:20 /afs/
crc.nd.edu/user/m/mvanderw/testing

vs

[mvanderw@ ~]$ ls -al /afs/crc.nd.edu/user/m/mvanderw
ls: reading directory /afs/crc.nd.edu/user/m/mvanderw: Not a directory
total 0

Any ideas? Or anything we can test/do that would help?

Thanks!

--
Matt Vander Werf
HPC System Administrator
University of Notre Dame
Center for Research Computing - Union Station
506 W. South Street
South Bend, IN 46601
Phone: (574) 631-0692

On Fri, Feb 2, 2018 at 4:05 AM, Stephan Wiesand 
wrote:

>
> > On 2. Feb 2018, at 09:55, Stephan Wiesand 
> wrote:
> >
> >
> >> On 2. Feb 2018, at 02:14, Benjamin Kaduk  wrote:
> >>
> >> On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote:
> >>> Comparing the 1.6.22.2 module builds from the SL packaging, where the
> kABI hashes of the used symbols are stored as a requirement, is seems none
> of those hashes changed between -693 and -830.
> >>>
> >>> There are two differences in the configure results:
> >>>
> >>> -ac_cv_linux_header_sched_signal_h=no
> >>> +ac_cv_linux_header_sched_signal_h=yes
> >>>
> >>> -ac_cv_linux_struct_file_operations_has_iterate=no
> >>> +ac_cv_linux_struct_file_operations_has_iterate=yes
> >>
> >> That's very helpful to know.
> >>
> >> Does the new tree actually have a sched/signal.h header?
> >
> > Yes it does. The only content is a guarded include of 
> >
> >> Does the new struct file_operations have an 'iterate' member
> >> function?
> >
> > Yes it does, wrapped in a RH_KABI_ITERATE macro.
>
> er, nonsense, that's RH_KABI_EXTEND, sorry
>
> >
> >> (The idea being to tell whether they changed something in new and
> >> interesting ways or our configure test(s) are broken.)
> >
> > It's the former :-(
>
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Stephan Wiesand

> On 2. Feb 2018, at 09:55, Stephan Wiesand  wrote:
> 
> 
>> On 2. Feb 2018, at 02:14, Benjamin Kaduk  wrote:
>> 
>> On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote:
>>> Comparing the 1.6.22.2 module builds from the SL packaging, where the kABI 
>>> hashes of the used symbols are stored as a requirement, is seems none of 
>>> those hashes changed between -693 and -830.
>>> 
>>> There are two differences in the configure results:
>>> 
>>> -ac_cv_linux_header_sched_signal_h=no
>>> +ac_cv_linux_header_sched_signal_h=yes
>>> 
>>> -ac_cv_linux_struct_file_operations_has_iterate=no
>>> +ac_cv_linux_struct_file_operations_has_iterate=yes
>> 
>> That's very helpful to know.
>> 
>> Does the new tree actually have a sched/signal.h header?
> 
> Yes it does. The only content is a guarded include of 
> 
>> Does the new struct file_operations have an 'iterate' member
>> function?
> 
> Yes it does, wrapped in a RH_KABI_ITERATE macro.

er, nonsense, that's RH_KABI_EXTEND, sorry

> 
>> (The idea being to tell whether they changed something in new and
>> interesting ways or our configure test(s) are broken.)
> 
> It's the former :-(

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-02 Thread Stephan Wiesand

> On 2. Feb 2018, at 02:14, Benjamin Kaduk  wrote:
> 
> On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote:
>> Comparing the 1.6.22.2 module builds from the SL packaging, where the kABI 
>> hashes of the used symbols are stored as a requirement, is seems none of 
>> those hashes changed between -693 and -830.
>> 
>> There are two differences in the configure results:
>> 
>> -ac_cv_linux_header_sched_signal_h=no
>> +ac_cv_linux_header_sched_signal_h=yes
>> 
>> -ac_cv_linux_struct_file_operations_has_iterate=no
>> +ac_cv_linux_struct_file_operations_has_iterate=yes
> 
> That's very helpful to know.
> 
> Does the new tree actually have a sched/signal.h header?

Yes it does. The only content is a guarded include of 

> Does the new struct file_operations have an 'iterate' member
> function?

Yes it does, wrapped in a RH_KABI_ITERATE macro.

> (The idea being to tell whether they changed something in new and
> interesting ways or our configure test(s) are broken.)

It's the former :-(

-- 
Stephan Wiesand
DESY -DV-
Platanenallee 6
15738 Zeuthen, Germany



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Benjamin Kaduk
On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote:
> Comparing the 1.6.22.2 module builds from the SL packaging, where the kABI 
> hashes of the used symbols are stored as a requirement, is seems none of 
> those hashes changed between -693 and -830.
> 
> There are two differences in the configure results:
> 
> -ac_cv_linux_header_sched_signal_h=no
> +ac_cv_linux_header_sched_signal_h=yes
> 
> -ac_cv_linux_struct_file_operations_has_iterate=no
> +ac_cv_linux_struct_file_operations_has_iterate=yes

That's very helpful to know.

Does the new tree actually have a sched/signal.h header?
Does the new struct file_operations have an 'iterate' member
function?
(The idea being to tell whether they changed something in new and
interesting ways or our configure test(s) are broken.)

-Ben

> And there's quite a bit of churn in include/linux.fs.h (and some in key.h).
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Gary Gatling
I tried testing a work in progress 1.6.22.2 on rhel 7.5 beta by doing

git clone git://git.openafs.org/openafs.git
cd openafs
git checkout remotes/origin/openafs-stable-1_6_x
HEAD is now at d25c8e8... Make OpenAFS 1.6.22.2


But it seems to have the same problems with directories so I guess further
changes will need to be made to get it to work on rhel 7.5 kernel. Not a
kernel hacker so I'll wait to see what you guys come up with. :)

Thanks,

On Thu, Feb 1, 2018 at 11:11 AM, Stephan Wiesand 
wrote:

> Comparing the 1.6.22.2 module builds from the SL packaging, where the kABI
> hashes of the used symbols are stored as a requirement, is seems none of
> those hashes changed between -693 and -830.
>
> There are two differences in the configure results:
>
> -ac_cv_linux_header_sched_signal_h=no
> +ac_cv_linux_header_sched_signal_h=yes
>
> -ac_cv_linux_struct_file_operations_has_iterate=no
> +ac_cv_linux_struct_file_operations_has_iterate=yes
>
> And there's quite a bit of churn in include/linux.fs.h (and some in key.h).
>
> > On 1. Feb 2018, at 16:58, Gary Gatling  wrote:
> >
> > Ok. This gets weirder. Any directory under /afs says Not a directory.
> But I can read files like
> >
> > /afs/eos.ncsu.edu/software/inventory/software_inventory
> >
> > just fine.
> >
> > On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling  wrote:
> > I don't get a kernel panic but instead I get:
> >
> > [gsgatlin@localhost ~]$ ls /afs/
> > ls: reading directory /afs/: Not a directory
> > [gsgatlin@localhost ~]$
> >
> >
> > which is pretty weird. I don't see anything in the syslog about problems
> with openafs
> >
> > Feb  1 10:44:24 localhost systemd: Starting OpenAFS Client Service...
> > Feb  1 10:44:24 localhost kernel: libafs: loading out-of-tree module
> taints kernel.
> > Feb  1 10:44:24 localhost kernel: libafs: module license '
> http://www.openafs.org/dl/license10.html' taints kernel.
> > Feb  1 10:44:24 localhost kernel: Disabling lock debugging due to kernel
> taint
> > Feb  1 10:44:24 localhost kernel: libafs: module verification failed:
> signature and/or required key missing - tainting kernel
> > Feb  1 10:44:24 localhost kernel: Key type afs_pag registered
> > Feb  1 10:44:24 localhost kernel: enabling dynamically allocated vcaches
> > Feb  1 10:44:24 localhost kernel: Starting AFS cache scan...Memory
> cache: Allocating 1600 dcache entries...found 0 non-empty cache files (0%).
> > Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
> > Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
> > Feb  1 10:44:24 localhost systemd: Started OpenAFS Client Service.
> >
> > I am using openafs-1.6.22
> >
> >
> > with
> >
> > correct-m4-conditionals-in-curses.m4.patch
> > linux-test-for-vfswrite-rather-than-vfsread.patch
> > linux-use-kernelread-kernelwrite-when-vfs-varian.patch
> >
> > from the arch linux distro in my rpm packages.
> >
> > Anyone know what
> >
> > ls: reading directory /afs/: Not a directory
> >
> > means and is there some way around it?
> >
> > Also, is 1.6.22.2 coming out soon?
> >
> > Thanks so much,
> >
> > On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith 
> wrote:
> > https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3
> >
> >
> > On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith 
> wrote:
> > Folks, re-sending this because the first try never hit the list -
> perhaps mail with attachments are silently dropped or held for manual
> moderation?  I'd originally attached an image of the stack trace.  I'll
> host it and reply to this with a  URL link in case that would also result
> in a drop or moderation.
> >
> >
> >
> > Anyhow:
> >
> > In testing the new RHEL 7.5 beta, we've discovered that hosts using AFS
> fail to boot after the upgrade, with Openafs 1.6.22.1 installed.
> >
> > We are wondering if some of the non-guaranteed kernel ABIs that OpenAFS
> uses might have changed with the latest kernel provided in RHEL 7.
> >
> > I've attached a picture of the trace.
> >
> > Anyone else kicking the tires on the new RHEL yet?
> >
> > Thanks!
> >
> >
> >
> >
>
> --
> Stephan Wiesand
> DESY -DV-
> Platanenallee 6
> 15738 Zeuthen, Germany
>
>
>
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Kodiak Firesmith
I just rebuilt off-the-shelf RPMs based off of
http://www.openafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm
thinking maybe we had some historical patch in our build area that might be
causing the problem, but alas, even the off-the-shelf RPMs cause a full
wedge and reboot when openafs-client.service starts up.

 - Kodiak

On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith 
wrote:

> Hello Rich!
> It's a Dell Optiplex 7020 with an Intel i7-4790.
>
> Thanks!
>  - Kodiak
>
> On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow  wrote:
>
>> On 01/31/2018 09:43 AM, Kodiak Firesmith wrote:
>>
>>> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3
>>>
>>
>> Greetings
>>
>> What processor..etc is this machine?
>>
>> Rich
>>
>>
>>
>>>
>>> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith >> > wrote:
>>>
>>> Folks, re-sending this because the first try never hit the list -
>>> perhaps
>>> mail with attachments are silently dropped or held for manual
>>> moderation? I'd originally attached an image of the stack trace.  I'll
>>> host it and reply
>>> to this with a  URL link in case that would also result in a drop or
>>> moderation.
>>>
>>>
>>>
>>> Anyhow:
>>>
>>> In testing the new RHEL 7.5 beta, we've discovered that hosts using
>>> AFS fail
>>> to boot after the upgrade, with Openafs 1.6.22.1 installed.
>>>
>>> We are wondering if some of the non-guaranteed kernel ABIs that
>>> OpenAFS uses
>>> might have changed with the latest kernel provided in RHEL 7.
>>>
>>> I've attached a picture of the trace.
>>>
>>> Anyone else kicking the tires on the new RHEL yet?
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>> --
>> Rich Sudlow
>> University of Notre Dame
>> Center for Research Computing - Union Station
>> 506 W. South St
>> South Bend, In 46601
>>
>> (574) 631-7258 (office)
>> (574) 807-1046 (cell)
>>
>
>


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Kodiak Firesmith
Thanks for the replies!

We're using DKMS and expected the dynamic re-roll of the kmods to work like
any other kernel upgrade but that doesn't seem to be the case.  I need to
dig deeper, especially now that there is evidence that it's just our site.

Thanks a bunch everyone.

 - Kodiak

On Thu, Feb 1, 2018 at 11:13 AM, Matt Vander Werf  wrote:

> I'm also seeing the same issue as Gary on some RHEL 7.5 beta boxes running
> OpenAFS 1.6.22.1. Can't run ls under any /afs/.../.../etc directory,
> including in my AFS home directory when logged in as myself.
>
> [mvanderw@ ~]$ ls
> ls: reading directory .: Not a directory
> [mvanderw@ ~]$ ls ~
> ls: reading directory /afs/crc.nd.edu/user/m/mvanderw: Not a directory
>
> [mvanderw@ ~]$ ls /afs/
> ls: reading directory /afs/: Not a directory
> [mvanderw@ ~]$ ls /afs/crc.nd.edu
> ls: reading directory /afs/crc.nd.edu: Not a directory
>
> But no kernel panics here either.
>
> @Kodiak: Is it possible you were running a kmod-openafs from an older
> kernel? I compiled a new kmod-openafs RPM on a RHEL 7.5 beta system and it
> works well.
>
> I compiled all the OpenAFS packages from the source RPM on the RHEL 7.5
> beta system itself and didn't run into any issues with the compile.
>
> Besides this, AFS seems to be running correctly with nothing in the logs
> indicating any problems (like Gary mentioned).
>
> Any idea what might be causing this? Some semantic changes like with the
> getcwd issue in RHEL 7.4?
>
> Thanks.
>
> --
> Matt Vander Werf
> HPC System Administrator
> University of Notre Dame
> Center for Research Computing - Union Station
> 506 W. South Street
> 
> South Bend, IN 46601
> 
> Phone: (574) 631-0692
>
> On Thu, Feb 1, 2018 at 10:58 AM, Gary Gatling  wrote:
>
>> Ok. This gets weirder. Any directory under /afs says Not a directory. But
>> I can read files like
>>
>> /afs/eos.ncsu.edu/software/inventory/software_inventory
>>
>> just fine.
>>
>> On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling  wrote:
>>
>>> I don't get a kernel panic but instead I get:
>>>
>>> [gsgatlin@localhost ~]$ ls /afs/
>>> ls: reading directory /afs/: Not a directory
>>> [gsgatlin@localhost ~]$
>>>
>>>
>>> which is pretty weird. I don't see anything in the syslog about problems
>>> with openafs
>>>
>>> Feb  1 10:44:24 localhost systemd: Starting OpenAFS Client Service...
>>> Feb  1 10:44:24 localhost kernel: libafs: loading out-of-tree module
>>> taints kernel.
>>> Feb  1 10:44:24 localhost kernel: libafs: module license '
>>> http://www.openafs.org/dl/license10.html' taints kernel.
>>> Feb  1 10:44:24 localhost kernel: Disabling lock debugging due to kernel
>>> taint
>>> Feb  1 10:44:24 localhost kernel: libafs: module verification failed:
>>> signature and/or required key missing - tainting kernel
>>> Feb  1 10:44:24 localhost kernel: Key type afs_pag registered
>>> Feb  1 10:44:24 localhost kernel: enabling dynamically allocated vcaches
>>> Feb  1 10:44:24 localhost kernel: Starting AFS cache scan...Memory
>>> cache: Allocating 1600 dcache entries...found 0 non-empty cache files (0%).
>>> Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
>>> Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
>>> Feb  1 10:44:24 localhost systemd: Started OpenAFS Client Service.
>>>
>>> I am using openafs-1.6.22
>>>
>>>
>>> with
>>>
>>> correct-m4-conditionals-in-curses.m4.patch
>>> linux-test-for-vfswrite-rather-than-vfsread.patch
>>> linux-use-kernelread-kernelwrite-when-vfs-varian.patch
>>>
>>> from the arch linux distro in my rpm packages.
>>>
>>> Anyone know what
>>>
>>> ls: reading directory /afs/: Not a directory
>>>
>>> means and is there some way around it?
>>>
>>> Also, is 1.6.22.2 coming out soon?
>>>
>>> Thanks so much,
>>>
>>> On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith 
>>> wrote:
>>>
 https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3


 On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith  wrote:

> Folks, re-sending this because the first try never hit the list -
> perhaps mail with attachments are silently dropped or held for manual
> moderation?  I'd originally attached an image of the stack trace.  I'll
> host it and reply to this with a  URL link in case that would also result
> in a drop or moderation.
>
>
>
> Anyhow:
>
> In testing the new RHEL 7.5 beta, we've discovered that hosts using
> AFS fail to boot after the upgrade, with Openafs 1.6.22.1 installed.
>
> We are wondering if some of the non-guaranteed kernel ABIs that
> OpenAFS uses might have changed with the latest kernel provided in RHEL 7.
>
> I've attached a picture of the trace.
>
> Anyone else kicking the tires on the new RHEL yet?
>
> 

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Matt Vander Werf
I'm also seeing the same issue as Gary on some RHEL 7.5 beta boxes running
OpenAFS 1.6.22.1. Can't run ls under any /afs/.../.../etc directory,
including in my AFS home directory when logged in as myself.

[mvanderw@ ~]$ ls
ls: reading directory .: Not a directory
[mvanderw@ ~]$ ls ~
ls: reading directory /afs/crc.nd.edu/user/m/mvanderw: Not a directory

[mvanderw@ ~]$ ls /afs/
ls: reading directory /afs/: Not a directory
[mvanderw@ ~]$ ls /afs/crc.nd.edu
ls: reading directory /afs/crc.nd.edu: Not a directory

But no kernel panics here either.

@Kodiak: Is it possible you were running a kmod-openafs from an older
kernel? I compiled a new kmod-openafs RPM on a RHEL 7.5 beta system and it
works well.

I compiled all the OpenAFS packages from the source RPM on the RHEL 7.5
beta system itself and didn't run into any issues with the compile.

Besides this, AFS seems to be running correctly with nothing in the logs
indicating any problems (like Gary mentioned).

Any idea what might be causing this? Some semantic changes like with the
getcwd issue in RHEL 7.4?

Thanks.

--
Matt Vander Werf
HPC System Administrator
University of Notre Dame
Center for Research Computing - Union Station
506 W. South Street
South Bend, IN 46601
Phone: (574) 631-0692

On Thu, Feb 1, 2018 at 10:58 AM, Gary Gatling  wrote:

> Ok. This gets weirder. Any directory under /afs says Not a directory. But
> I can read files like
>
> /afs/eos.ncsu.edu/software/inventory/software_inventory
>
> just fine.
>
> On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling  wrote:
>
>> I don't get a kernel panic but instead I get:
>>
>> [gsgatlin@localhost ~]$ ls /afs/
>> ls: reading directory /afs/: Not a directory
>> [gsgatlin@localhost ~]$
>>
>>
>> which is pretty weird. I don't see anything in the syslog about problems
>> with openafs
>>
>> Feb  1 10:44:24 localhost systemd: Starting OpenAFS Client Service...
>> Feb  1 10:44:24 localhost kernel: libafs: loading out-of-tree module
>> taints kernel.
>> Feb  1 10:44:24 localhost kernel: libafs: module license '
>> http://www.openafs.org/dl/license10.html' taints kernel.
>> Feb  1 10:44:24 localhost kernel: Disabling lock debugging due to kernel
>> taint
>> Feb  1 10:44:24 localhost kernel: libafs: module verification failed:
>> signature and/or required key missing - tainting kernel
>> Feb  1 10:44:24 localhost kernel: Key type afs_pag registered
>> Feb  1 10:44:24 localhost kernel: enabling dynamically allocated vcaches
>> Feb  1 10:44:24 localhost kernel: Starting AFS cache scan...Memory cache:
>> Allocating 1600 dcache entries...found 0 non-empty cache files (0%).
>> Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
>> Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
>> Feb  1 10:44:24 localhost systemd: Started OpenAFS Client Service.
>>
>> I am using openafs-1.6.22
>>
>>
>> with
>>
>> correct-m4-conditionals-in-curses.m4.patch
>> linux-test-for-vfswrite-rather-than-vfsread.patch
>> linux-use-kernelread-kernelwrite-when-vfs-varian.patch
>>
>> from the arch linux distro in my rpm packages.
>>
>> Anyone know what
>>
>> ls: reading directory /afs/: Not a directory
>>
>> means and is there some way around it?
>>
>> Also, is 1.6.22.2 coming out soon?
>>
>> Thanks so much,
>>
>> On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith 
>> wrote:
>>
>>> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3
>>>
>>>
>>> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith 
>>> wrote:
>>>
 Folks, re-sending this because the first try never hit the list -
 perhaps mail with attachments are silently dropped or held for manual
 moderation?  I'd originally attached an image of the stack trace.  I'll
 host it and reply to this with a  URL link in case that would also result
 in a drop or moderation.



 Anyhow:

 In testing the new RHEL 7.5 beta, we've discovered that hosts using AFS
 fail to boot after the upgrade, with Openafs 1.6.22.1 installed.

 We are wondering if some of the non-guaranteed kernel ABIs that OpenAFS
 uses might have changed with the latest kernel provided in RHEL 7.

 I've attached a picture of the trace.

 Anyone else kicking the tires on the new RHEL yet?

 Thanks!


>>>
>>
>


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Stephan Wiesand
Comparing the 1.6.22.2 module builds from the SL packaging, where the kABI 
hashes of the used symbols are stored as a requirement, is seems none of those 
hashes changed between -693 and -830.

There are two differences in the configure results:

-ac_cv_linux_header_sched_signal_h=no
+ac_cv_linux_header_sched_signal_h=yes

-ac_cv_linux_struct_file_operations_has_iterate=no
+ac_cv_linux_struct_file_operations_has_iterate=yes

And there's quite a bit of churn in include/linux.fs.h (and some in key.h).

> On 1. Feb 2018, at 16:58, Gary Gatling  wrote:
> 
> Ok. This gets weirder. Any directory under /afs says Not a directory. But I 
> can read files like
> 
> /afs/eos.ncsu.edu/software/inventory/software_inventory
> 
> just fine. 
> 
> On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling  wrote:
> I don't get a kernel panic but instead I get:
> 
> [gsgatlin@localhost ~]$ ls /afs/
> ls: reading directory /afs/: Not a directory
> [gsgatlin@localhost ~]$ 
> 
> 
> which is pretty weird. I don't see anything in the syslog about problems with 
> openafs
> 
> Feb  1 10:44:24 localhost systemd: Starting OpenAFS Client Service...
> Feb  1 10:44:24 localhost kernel: libafs: loading out-of-tree module taints 
> kernel.
> Feb  1 10:44:24 localhost kernel: libafs: module license 
> 'http://www.openafs.org/dl/license10.html' taints kernel.
> Feb  1 10:44:24 localhost kernel: Disabling lock debugging due to kernel taint
> Feb  1 10:44:24 localhost kernel: libafs: module verification failed: 
> signature and/or required key missing - tainting kernel
> Feb  1 10:44:24 localhost kernel: Key type afs_pag registered
> Feb  1 10:44:24 localhost kernel: enabling dynamically allocated vcaches
> Feb  1 10:44:24 localhost kernel: Starting AFS cache scan...Memory cache: 
> Allocating 1600 dcache entries...found 0 non-empty cache files (0%).
> Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
> Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
> Feb  1 10:44:24 localhost systemd: Started OpenAFS Client Service.
> 
> I am using openafs-1.6.22
> 
> 
> with
> 
> correct-m4-conditionals-in-curses.m4.patch
> linux-test-for-vfswrite-rather-than-vfsread.patch
> linux-use-kernelread-kernelwrite-when-vfs-varian.patch
> 
> from the arch linux distro in my rpm packages.
> 
> Anyone know what 
> 
> ls: reading directory /afs/: Not a directory
> 
> means and is there some way around it?
> 
> Also, is 1.6.22.2 coming out soon?
> 
> Thanks so much,
> 
> On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith  
> wrote:
> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3
> 
> 
> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith  
> wrote:
> Folks, re-sending this because the first try never hit the list - perhaps 
> mail with attachments are silently dropped or held for manual moderation?  
> I'd originally attached an image of the stack trace.  I'll host it and reply 
> to this with a  URL link in case that would also result in a drop or 
> moderation.
> 
> 
> 
> Anyhow:  
> 
> In testing the new RHEL 7.5 beta, we've discovered that hosts using AFS fail 
> to boot after the upgrade, with Openafs 1.6.22.1 installed.  
> 
> We are wondering if some of the non-guaranteed kernel ABIs that OpenAFS uses 
> might have changed with the latest kernel provided in RHEL 7.  
> 
> I've attached a picture of the trace.
> 
> Anyone else kicking the tires on the new RHEL yet?
> 
> Thanks!
> 
> 
> 
> 

-- 
Stephan Wiesand
DESY -DV-
Platanenallee 6
15738 Zeuthen, Germany



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Gary Gatling
Ok. This gets weirder. Any directory under /afs says Not a directory. But I
can read files like

/afs/eos.ncsu.edu/software/inventory/software_inventory

just fine.

On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling  wrote:

> I don't get a kernel panic but instead I get:
>
> [gsgatlin@localhost ~]$ ls /afs/
> ls: reading directory /afs/: Not a directory
> [gsgatlin@localhost ~]$
>
>
> which is pretty weird. I don't see anything in the syslog about problems
> with openafs
>
> Feb  1 10:44:24 localhost systemd: Starting OpenAFS Client Service...
> Feb  1 10:44:24 localhost kernel: libafs: loading out-of-tree module
> taints kernel.
> Feb  1 10:44:24 localhost kernel: libafs: module license '
> http://www.openafs.org/dl/license10.html' taints kernel.
> Feb  1 10:44:24 localhost kernel: Disabling lock debugging due to kernel
> taint
> Feb  1 10:44:24 localhost kernel: libafs: module verification failed:
> signature and/or required key missing - tainting kernel
> Feb  1 10:44:24 localhost kernel: Key type afs_pag registered
> Feb  1 10:44:24 localhost kernel: enabling dynamically allocated vcaches
> Feb  1 10:44:24 localhost kernel: Starting AFS cache scan...Memory cache:
> Allocating 1600 dcache entries...found 0 non-empty cache files (0%).
> Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
> Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
> Feb  1 10:44:24 localhost systemd: Started OpenAFS Client Service.
>
> I am using openafs-1.6.22
>
>
> with
>
> correct-m4-conditionals-in-curses.m4.patch
> linux-test-for-vfswrite-rather-than-vfsread.patch
> linux-use-kernelread-kernelwrite-when-vfs-varian.patch
>
> from the arch linux distro in my rpm packages.
>
> Anyone know what
>
> ls: reading directory /afs/: Not a directory
>
> means and is there some way around it?
>
> Also, is 1.6.22.2 coming out soon?
>
> Thanks so much,
>
> On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith 
> wrote:
>
>> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3
>>
>>
>> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith 
>> wrote:
>>
>>> Folks, re-sending this because the first try never hit the list -
>>> perhaps mail with attachments are silently dropped or held for manual
>>> moderation?  I'd originally attached an image of the stack trace.  I'll
>>> host it and reply to this with a  URL link in case that would also result
>>> in a drop or moderation.
>>>
>>>
>>>
>>> Anyhow:
>>>
>>> In testing the new RHEL 7.5 beta, we've discovered that hosts using AFS
>>> fail to boot after the upgrade, with Openafs 1.6.22.1 installed.
>>>
>>> We are wondering if some of the non-guaranteed kernel ABIs that OpenAFS
>>> uses might have changed with the latest kernel provided in RHEL 7.
>>>
>>> I've attached a picture of the trace.
>>>
>>> Anyone else kicking the tires on the new RHEL yet?
>>>
>>> Thanks!
>>>
>>>
>>
>


Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-01 Thread Gary Gatling
I don't get a kernel panic but instead I get:

[gsgatlin@localhost ~]$ ls /afs/
ls: reading directory /afs/: Not a directory
[gsgatlin@localhost ~]$


which is pretty weird. I don't see anything in the syslog about problems
with openafs

Feb  1 10:44:24 localhost systemd: Starting OpenAFS Client Service...
Feb  1 10:44:24 localhost kernel: libafs: loading out-of-tree module taints
kernel.
Feb  1 10:44:24 localhost kernel: libafs: module license '
http://www.openafs.org/dl/license10.html' taints kernel.
Feb  1 10:44:24 localhost kernel: Disabling lock debugging due to kernel
taint
Feb  1 10:44:24 localhost kernel: libafs: module verification failed:
signature and/or required key missing - tainting kernel
Feb  1 10:44:24 localhost kernel: Key type afs_pag registered
Feb  1 10:44:24 localhost kernel: enabling dynamically allocated vcaches
Feb  1 10:44:24 localhost kernel: Starting AFS cache scan...Memory cache:
Allocating 1600 dcache entries...found 0 non-empty cache files (0%).
Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
Feb  1 10:44:24 localhost afsd: afsd: All AFS daemons started.
Feb  1 10:44:24 localhost systemd: Started OpenAFS Client Service.

I am using openafs-1.6.22


with

correct-m4-conditionals-in-curses.m4.patch
linux-test-for-vfswrite-rather-than-vfsread.patch
linux-use-kernelread-kernelwrite-when-vfs-varian.patch

from the arch linux distro in my rpm packages.

Anyone know what

ls: reading directory /afs/: Not a directory

means and is there some way around it?

Also, is 1.6.22.2 coming out soon?

Thanks so much,

On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith 
wrote:

> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3
>
>
> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith 
> wrote:
>
>> Folks, re-sending this because the first try never hit the list - perhaps
>> mail with attachments are silently dropped or held for manual moderation?
>> I'd originally attached an image of the stack trace.  I'll host it and
>> reply to this with a  URL link in case that would also result in a drop or
>> moderation.
>>
>>
>>
>> Anyhow:
>>
>> In testing the new RHEL 7.5 beta, we've discovered that hosts using AFS
>> fail to boot after the upgrade, with Openafs 1.6.22.1 installed.
>>
>> We are wondering if some of the non-guaranteed kernel ABIs that OpenAFS
>> uses might have changed with the latest kernel provided in RHEL 7.
>>
>> I've attached a picture of the trace.
>>
>> Anyone else kicking the tires on the new RHEL yet?
>>
>> Thanks!
>>
>>
>