Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-04-10 Thread Matt Vander Werf
RHEL 7.5 GA was released earlier this morning.

Just to confirm, I tested Gary's 1.6.x patch again with the new kernel for
RHEL 7.5 GA (3.10.0-862.el7.x86_64) and it still fixes the ENOTDIR issue
and I also don't see any other issues with OpenAFS and this new RHEL
release (at least not at this time).

Looking forward to a 1.6.22.3 release soon!

Thanks.

--
Matt Vander Werf
HPC System Administrator
University of Notre Dame
Center for Research Computing - Union Station
506 W. South Street
South Bend, IN 46601
Phone: (574) 631-0692

On Fri, Mar 23, 2018 at 9:50 AM, Stephan Wiesand 
wrote:

>
> > On 23. Mar 2018, at 12:27, Kodiak Firesmith 
> wrote:
> >
> > I've also tested gsgatlin's 7.5beta RPMs and they work great.  Any
> chance we'll see the rh75enotdir patch integrated into a release of
> 1.6.22.3 soon?  I'm wondering if it'll be worth it to manually apply that
> patch to a rebuild of the official OpenAFS RPMs if this isn't on the block
> for being merged and released soon - but I don't want to blow the time
> applying that patch to a re-roll if a fixed official release is forthcoming.
>
> We are planning to release a 1.6.22.3 addressing the ENOTDIR issue with
> the EL7.5 kernel soon after the EL7.5 GA release.
>
> - Stephan
>
> > Thanks!
> >  - Kodiak
> >
> >
> > On Fri, Mar 2, 2018 at 3:47 AM, Anders Nordin 
> wrote:
> > Hello,
> >
> > Is there any progress on this issue? Can we expect a stable release for
> RHEL 7.5?
> >
> > MVH
> > Anders
> >
> > -Original Message-
> > From: openafs-info-ad...@openafs.org [mailto:openafs-info-admin@ope
> nafs.org] On Behalf Of Benjamin Kaduk
> > Sent: den 9 februari 2018 01:02
> > To: Kodiak Firesmith 
> > Cc: openafs-info 
> > Subject: Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock
> up
> >
> > On Wed, Feb 07, 2018 at 11:46:28AM -0500, Kodiak Firesmith wrote:
> > > Hello again All,
> > >
> > > As part of continued testing, I've been able to confirm that the
> > > SystemD double-service startup thing only happens to my hosts when
> > > going from RHEL
> > > 7.4 to RHEL 7.5beta.  On a test host installed directly as RHEL
> > > 7.5beta, I get a bit farther with 1.6.18.22, in that I get to the
> > > point where OpenAFS "kind of" works.
> >
> > Thanks for tracking this down.  The rpm packaging maintainers may want
> to try to track down why the double-start happens in the upgrade scenario,
> as that's pretty nasty behavior.
> >
> > > What I'm observing is that the openafs client Kernel module (built by
> > > DKMS) loads fine, and just so long as you know where you need to go in
> > > /afs, you can get there, and you can read and write files and the
> OpenAFS 'fs'
> > > command works.  But doing an 'ls' of /afs or any path underneath
> > > results in
> > > "ls: reading directory /afs/: Not a directory".
> > >
> > > I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL
> > > 7.5beta host running ls on /afs and have created pastebins of both, as
> > > well as an inline diff.
> > >
> > > All can be seen at the following locations:
> > >
> > > works
> > > https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ
> > >
> > > fails
> > > https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg
> > >
> > >
> > > diff
> > > https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A
> > >
> > > Hopefully this might help the OpenAFS devs, or someone might know what
> > > might be borking on every RHEL 7.5 beta host.  It does fit with what
> > > other
> > > 7.5 beta users have observed OpenAFS doing.
> >
> > Yes, now it seems like all our reports are consistent, and we just have
> to wait for a developer to get a better look at what Red Hat changed in the
> kernel that we need to adapt to.
> >
> > -Ben
> >
> > > Thanks!
> > >  - Kodiak
> > >
> > > On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand
> > > 
> > > wrote:
> > >
> > > >
> > > > > On 04.Feb 2018, at 02:11, Jeffrey Altman 
> wrote:
> > > > >
> > > > > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> > > > >> I'm relatively new to handling OpenAFS.  Are these problems part
> > > > >> of a normal "kernel release; openafs update" cycle and perhaps
> &g

Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-03-23 Thread Stephan Wiesand

> On 23. Mar 2018, at 12:27, Kodiak Firesmith  wrote:
> 
> I've also tested gsgatlin's 7.5beta RPMs and they work great.  Any chance 
> we'll see the rh75enotdir patch integrated into a release of 1.6.22.3 soon?  
> I'm wondering if it'll be worth it to manually apply that patch to a rebuild 
> of the official OpenAFS RPMs if this isn't on the block for being merged and 
> released soon - but I don't want to blow the time applying that patch to a 
> re-roll if a fixed official release is forthcoming.

We are planning to release a 1.6.22.3 addressing the ENOTDIR issue with the 
EL7.5 kernel soon after the EL7.5 GA release.

- Stephan

> Thanks!
>  - Kodiak
> 
> 
> On Fri, Mar 2, 2018 at 3:47 AM, Anders Nordin  wrote:
> Hello,
> 
> Is there any progress on this issue? Can we expect a stable release for RHEL 
> 7.5?
> 
> MVH
> Anders
> 
> -Original Message-
> From: openafs-info-ad...@openafs.org [mailto:openafs-info-ad...@openafs.org] 
> On Behalf Of Benjamin Kaduk
> Sent: den 9 februari 2018 01:02
> To: Kodiak Firesmith 
> Cc: openafs-info 
> Subject: Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up
> 
> On Wed, Feb 07, 2018 at 11:46:28AM -0500, Kodiak Firesmith wrote:
> > Hello again All,
> >
> > As part of continued testing, I've been able to confirm that the
> > SystemD double-service startup thing only happens to my hosts when
> > going from RHEL
> > 7.4 to RHEL 7.5beta.  On a test host installed directly as RHEL
> > 7.5beta, I get a bit farther with 1.6.18.22, in that I get to the
> > point where OpenAFS "kind of" works.
> 
> Thanks for tracking this down.  The rpm packaging maintainers may want to try 
> to track down why the double-start happens in the upgrade scenario, as that's 
> pretty nasty behavior.
> 
> > What I'm observing is that the openafs client Kernel module (built by
> > DKMS) loads fine, and just so long as you know where you need to go in
> > /afs, you can get there, and you can read and write files and the OpenAFS 
> > 'fs'
> > command works.  But doing an 'ls' of /afs or any path underneath
> > results in
> > "ls: reading directory /afs/: Not a directory".
> >
> > I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL
> > 7.5beta host running ls on /afs and have created pastebins of both, as
> > well as an inline diff.
> >
> > All can be seen at the following locations:
> >
> > works
> > https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ
> >
> > fails
> > https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg
> >
> >
> > diff
> > https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A
> >
> > Hopefully this might help the OpenAFS devs, or someone might know what
> > might be borking on every RHEL 7.5 beta host.  It does fit with what
> > other
> > 7.5 beta users have observed OpenAFS doing.
> 
> Yes, now it seems like all our reports are consistent, and we just have to 
> wait for a developer to get a better look at what Red Hat changed in the 
> kernel that we need to adapt to.
> 
> -Ben
> 
> > Thanks!
> >  - Kodiak
> >
> > On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand
> > 
> > wrote:
> >
> > >
> > > > On 04.Feb 2018, at 02:11, Jeffrey Altman  wrote:
> > > >
> > > > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> > > >> I'm relatively new to handling OpenAFS.  Are these problems part
> > > >> of a normal "kernel release; openafs update" cycle and perhaps
> > > >> I'm getting snagged just by being too early of an adopter?  I
> > > >> wanted to raise the alarm on this and see if anything else was
> > > >> needed from me as the reporter of the issue, but perhaps that's
> > > >> an overreaction to what is just part of a normal process I just
> > > >> haven't been tuned into in prior RHEL release cycles?
> > > >
> > > >
> > > > Kodiak,
> > > >
> > > > On RHEL, DKMS is safe to use for kernel modules that restrict
> > > > themselves to using the restricted set of kernel interfaces (the
> > > > RHEL KABI) that Red Hat has designated will be supported across
> > > > the lifespan of the RHEL major version number.  OpenAFS is not
> > > > such a kernel module.  As a result it is vulnerable to breakage each 
> > > > and every time a new kernel is shipped.
> > >

Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-03-23 Thread Kodiak Firesmith
I've also tested gsgatlin's 7.5beta RPMs and they work great.  Any chance
we'll see the rh75enotdir patch integrated into a release of 1.6.22.3
soon?  I'm wondering if it'll be worth it to manually apply that patch to a
rebuild of the official OpenAFS RPMs if this isn't on the block for being
merged and released soon - but I don't want to blow the time applying that
patch to a re-roll if a fixed official release is forthcoming.

Thanks!
 - Kodiak


On Fri, Mar 2, 2018 at 3:47 AM, Anders Nordin 
wrote:

> Hello,
>
> Is there any progress on this issue? Can we expect a stable release for
> RHEL 7.5?
>
> MVH
> Anders
>
> -Original Message-
> From: openafs-info-ad...@openafs.org [mailto:openafs-info-admin@ope
> nafs.org] On Behalf Of Benjamin Kaduk
> Sent: den 9 februari 2018 01:02
> To: Kodiak Firesmith 
> Cc: openafs-info 
> Subject: Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up
>
> On Wed, Feb 07, 2018 at 11:46:28AM -0500, Kodiak Firesmith wrote:
> > Hello again All,
> >
> > As part of continued testing, I've been able to confirm that the
> > SystemD double-service startup thing only happens to my hosts when
> > going from RHEL
> > 7.4 to RHEL 7.5beta.  On a test host installed directly as RHEL
> > 7.5beta, I get a bit farther with 1.6.18.22, in that I get to the
> > point where OpenAFS "kind of" works.
>
> Thanks for tracking this down.  The rpm packaging maintainers may want to
> try to track down why the double-start happens in the upgrade scenario, as
> that's pretty nasty behavior.
>
> > What I'm observing is that the openafs client Kernel module (built by
> > DKMS) loads fine, and just so long as you know where you need to go in
> > /afs, you can get there, and you can read and write files and the
> OpenAFS 'fs'
> > command works.  But doing an 'ls' of /afs or any path underneath
> > results in
> > "ls: reading directory /afs/: Not a directory".
> >
> > I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL
> > 7.5beta host running ls on /afs and have created pastebins of both, as
> > well as an inline diff.
> >
> > All can be seen at the following locations:
> >
> > works
> > https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ
> >
> > fails
> > https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg
> >
> >
> > diff
> > https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A
> >
> > Hopefully this might help the OpenAFS devs, or someone might know what
> > might be borking on every RHEL 7.5 beta host.  It does fit with what
> > other
> > 7.5 beta users have observed OpenAFS doing.
>
> Yes, now it seems like all our reports are consistent, and we just have to
> wait for a developer to get a better look at what Red Hat changed in the
> kernel that we need to adapt to.
>
> -Ben
>
> > Thanks!
> >  - Kodiak
> >
> > On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand
> > 
> > wrote:
> >
> > >
> > > > On 04.Feb 2018, at 02:11, Jeffrey Altman 
> wrote:
> > > >
> > > > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> > > >> I'm relatively new to handling OpenAFS.  Are these problems part
> > > >> of a normal "kernel release; openafs update" cycle and perhaps
> > > >> I'm getting snagged just by being too early of an adopter?  I
> > > >> wanted to raise the alarm on this and see if anything else was
> > > >> needed from me as the reporter of the issue, but perhaps that's
> > > >> an overreaction to what is just part of a normal process I just
> > > >> haven't been tuned into in prior RHEL release cycles?
> > > >
> > > >
> > > > Kodiak,
> > > >
> > > > On RHEL, DKMS is safe to use for kernel modules that restrict
> > > > themselves to using the restricted set of kernel interfaces (the
> > > > RHEL KABI) that Red Hat has designated will be supported across
> > > > the lifespan of the RHEL major version number.  OpenAFS is not
> > > > such a kernel module.  As a result it is vulnerable to breakage each
> and every time a new kernel is shipped.
> > >
> > > Jeffrey,
> > >
> > > the usual way to use DKMS is to either have it build a module for a
> > > newly installed kernel or install a prebuilt module for that kernel.
> > > It may be possible to abuse it for providing a module built for
&

Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-03-03 Thread Matt Vander Werf
Yes, that patch was added to the master branch. They usually have to
backport patches into the 1.6.x branch before they will work in that
codebase as well.

But...I was able to apply that patch from Gerrit to the 1.8.0pre5 release
and build RPMs off of that. From my testing, that fix appears to work great
for 1.8.x on RHEL 7.5 beta! I am able to ls in any /afs directory
successfully now!

Using Gary's (unofficial) 1.6.x patch, I was also able to replicate Gary's
success on RHEL 7.5 beta when applying the patch to the latest 1.6.x
release! If the "official" 1.6.x backport fix differs from Gary's (for
whatever reason...not saying it will), I'd be happy to test out that
backported patch as well.

Thanks for all your great work! Looking forward to a new 1.8.x and 1.6.x
release with this fix in place!

--
Matt Vander Werf
HPC System Administrator
University of Notre Dame
Center for Research Computing - Union Station
506 W. South Street
South Bend, IN 46601
Phone: (574) 631-0692

On Fri, Mar 2, 2018 at 5:11 PM, Gary Gatling  wrote:

> I tried to copy/paste the patch at:
>
>
> http://git.openafs.org/?p=openafs.git;a=blobdiff;f=src/afs/
> LINUX/osi_vnodeops.c;h=969a27b271ed3b809f1ddaa462099a5cc09d7
> 886;hp=c1acca962337dff1cf66916c1e3e876bd8468e54;hb=a72dafafd
> daaa5bfe86c067a605aeffa16572c51;hpb=6d74e3d6a1becf86cec30efc
> 2d01a5692167afe1
>
> But it failed for me with openafs-1.6.22.2-src.tar.bz2
>
> patching file src/afs/LINUX/osi_vnodeops.c
> Hunk #1 FAILED at 53.
> Hunk #2 succeeded at 296 (offset -6 lines).
> Hunk #3 succeeded at 378 (offset -7 lines).
> Hunk #4 FAILED at 455.
> Hunk #5 FAILED at 475.
> Hunk #6 FAILED at 798.
> 4 out of 6 hunks FAILED -- saving rejects to file
> src/afs/LINUX/osi_vnodeops.c.rej
>
>
> So I made my own patch based on that one.
>
> https://pastebin.com/NZsUz9Jg
>
> In RHEL 7.5 beta edition vm on kernel 3.10.0-830.el7.x86_64 openafs Works
> like a CHAMP. :) Can list directories again. Can also edit files in afs.
> Whew.
>
> Did further testing to be paranoid. Testing listing directories and
> editing a file in afs path.
>
> Patch was applied across all distros below...
>
> centos 6 32 bit kernel 2.6.32-696.20.1.el6.i686: works
> centos 6 64 bit kernel 2.6.32-696.20.1.el6.x86_64: works
> centos 7.4 64 bit kernel 3.10.0-693.17.1.el7.x86_64: works
> fedora 26 64 bit kernel 4.15.6-200.fc26.x86_64: works
> fedora 27 64 bit kernel  4.15.6-300.fc27.x86_64: works
>
> Since all tests succeeded I went ahead and committed and pushed to
> github.com for my packages.
>
> https://github.com/gsgatlin/openafs-rpms/commit/fd61c9ff2c21
> 404fba5276d7f3919ef1e6ab545d
>
> Thank you very much!
>
>
>
>
>
> On Fri, Mar 2, 2018 at 11:05 AM, Stephan Wiesand 
> wrote:
>
>>
>> > On 02.Mar 2018, at 12:40, Gary Gatling  wrote:
>> >
>> >> On Fri, Mar 2, 2018 at 4:14 AM, Stephan Wiesand <
>> stephan.wies...@desy.de> wrote:
>> >>
>> >>
>> >> Once we have a change confirmed to fix the EL7.5 issue and not break
>> other platforms, yes. Whether it will be available quite in time for 7.5 GA
>> is hard to say. You can help...
>> >>
>> >>
>> >> I will test this patch out later today and let you guys know what I
>> find out. Thanks a lot.
>>
>> Make sure you grab the patch from set 3 (the latest revision). It might
>> be the final solution.
>>
>> --
>> Stephan Wiesand
>> DESY - DV -
>> Platanenallee 6
>> 
>> 15738 Zeuthen, Germany
>>
>>
>>
>


Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-03-02 Thread Gary Gatling
I tried to copy/paste the patch at:


http://git.openafs.org/?p=openafs.git;a=blobdiff;f=src/afs/LINUX/osi_vnodeops.c;h=969a27b271ed3b809f1ddaa462099a5cc09d7886;hp=c1acca962337dff1cf66916c1e3e876bd8468e54;hb=a72dafafddaaa5bfe86c067a605aeffa16572c51;hpb=6d74e3d6a1becf86cec30efc2d01a5692167afe1

But it failed for me with openafs-1.6.22.2-src.tar.bz2

patching file src/afs/LINUX/osi_vnodeops.c
Hunk #1 FAILED at 53.
Hunk #2 succeeded at 296 (offset -6 lines).
Hunk #3 succeeded at 378 (offset -7 lines).
Hunk #4 FAILED at 455.
Hunk #5 FAILED at 475.
Hunk #6 FAILED at 798.
4 out of 6 hunks FAILED -- saving rejects to file
src/afs/LINUX/osi_vnodeops.c.rej


So I made my own patch based on that one.

https://pastebin.com/NZsUz9Jg

In RHEL 7.5 beta edition vm on kernel 3.10.0-830.el7.x86_64 openafs Works
like a CHAMP. :) Can list directories again. Can also edit files in afs.
Whew.

Did further testing to be paranoid. Testing listing directories and editing
a file in afs path.

Patch was applied across all distros below...

centos 6 32 bit kernel 2.6.32-696.20.1.el6.i686: works
centos 6 64 bit kernel 2.6.32-696.20.1.el6.x86_64: works
centos 7.4 64 bit kernel 3.10.0-693.17.1.el7.x86_64: works
fedora 26 64 bit kernel 4.15.6-200.fc26.x86_64: works
fedora 27 64 bit kernel  4.15.6-300.fc27.x86_64: works

Since all tests succeeded I went ahead and committed and pushed to
github.com for my packages.

https://github.com/gsgatlin/openafs-rpms/commit/fd61c9ff2c21404fba5276d7f3919ef1e6ab545d

Thank you very much!





On Fri, Mar 2, 2018 at 11:05 AM, Stephan Wiesand 
wrote:

>
> > On 02.Mar 2018, at 12:40, Gary Gatling  wrote:
> >
> >> On Fri, Mar 2, 2018 at 4:14 AM, Stephan Wiesand <
> stephan.wies...@desy.de> wrote:
> >>
> >>
> >> Once we have a change confirmed to fix the EL7.5 issue and not break
> other platforms, yes. Whether it will be available quite in time for 7.5 GA
> is hard to say. You can help...
> >>
> >>
> >> I will test this patch out later today and let you guys know what I
> find out. Thanks a lot.
>
> Make sure you grab the patch from set 3 (the latest revision). It might be
> the final solution.
>
> --
> Stephan Wiesand
> DESY - DV -
> Platanenallee 6
> 15738 Zeuthen, Germany
>
>
>


Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-03-02 Thread Stephan Wiesand

> On 02.Mar 2018, at 12:40, Gary Gatling  wrote:
> 
>> On Fri, Mar 2, 2018 at 4:14 AM, Stephan Wiesand  
>> wrote:
>> 
>> 
>> Once we have a change confirmed to fix the EL7.5 issue and not break other 
>> platforms, yes. Whether it will be available quite in time for 7.5 GA is 
>> hard to say. You can help...
>> 
>> 
>> I will test this patch out later today and let you guys know what I find 
>> out. Thanks a lot.

Make sure you grab the patch from set 3 (the latest revision). It might be the 
final solution.

-- 
Stephan Wiesand
DESY - DV -
Platanenallee 6
15738 Zeuthen, Germany


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-03-02 Thread Gary Gatling
On Fri, Mar 2, 2018 at 4:14 AM, Stephan Wiesand 
wrote:

>
>
> Once we have a change confirmed to fix the EL7.5 issue and not break other
> platforms, yes. Whether it will be available quite in time for 7.5 GA is
> hard to say. You can help...
>
>
I will test this patch out later today and let you guys know what I find
out. Thanks a lot.


Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-03-02 Thread Stephan Wiesand
Hello,

> On 2. Mar 2018, at 09:47, Anders Nordin  wrote:
> 
> Hello,
> 
> Is there any progress on this issue?

incidentally, Mark uploaded https://gerrit.openafs.org/12935 a couple of hours 
ago. It's probably not final since it seems to cause build failures on some 
older platforms. But it's certainly worth a try on EL7.5 beta systems. It would 
also be interesting to know on which other platforms it fails to build (or 
work).

> Can we expect a stable release for RHEL 7.5?

Once we have a change confirmed to fix the EL7.5 issue and not break other 
platforms, yes. Whether it will be available quite in time for 7.5 GA is hard 
to say. You can help...

Best regards,

Stephan


> MVH
> Anders
> 
> -Original Message-
> From: openafs-info-ad...@openafs.org [mailto:openafs-info-ad...@openafs.org] 
> On Behalf Of Benjamin Kaduk
> Sent: den 9 februari 2018 01:02
> To: Kodiak Firesmith 
> Cc: openafs-info 
> Subject: Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up
> 
> On Wed, Feb 07, 2018 at 11:46:28AM -0500, Kodiak Firesmith wrote:
>> Hello again All,
>> 
>> As part of continued testing, I've been able to confirm that the 
>> SystemD double-service startup thing only happens to my hosts when 
>> going from RHEL
>> 7.4 to RHEL 7.5beta.  On a test host installed directly as RHEL 
>> 7.5beta, I get a bit farther with 1.6.18.22, in that I get to the 
>> point where OpenAFS "kind of" works.
> 
> Thanks for tracking this down.  The rpm packaging maintainers may want to try 
> to track down why the double-start happens in the upgrade scenario, as that's 
> pretty nasty behavior.
> 
>> What I'm observing is that the openafs client Kernel module (built by 
>> DKMS) loads fine, and just so long as you know where you need to go in 
>> /afs, you can get there, and you can read and write files and the OpenAFS 
>> 'fs'
>> command works.  But doing an 'ls' of /afs or any path underneath 
>> results in
>> "ls: reading directory /afs/: Not a directory".
>> 
>> I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL 
>> 7.5beta host running ls on /afs and have created pastebins of both, as 
>> well as an inline diff.
>> 
>> All can be seen at the following locations:
>> 
>> works
>> https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ
>> 
>> fails
>> https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg
>> 
>> 
>> diff
>> https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A
>> 
>> Hopefully this might help the OpenAFS devs, or someone might know what 
>> might be borking on every RHEL 7.5 beta host.  It does fit with what 
>> other
>> 7.5 beta users have observed OpenAFS doing.
> 
> Yes, now it seems like all our reports are consistent, and we just have to 
> wait for a developer to get a better look at what Red Hat changed in the 
> kernel that we need to adapt to.
> 
> -Ben
> 
>> Thanks!
>> - Kodiak
>> 
>> On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand 
>> 
>> wrote:
>> 
>>> 
>>>> On 04.Feb 2018, at 02:11, Jeffrey Altman  wrote:
>>>> 
>>>> On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
>>>>> I'm relatively new to handling OpenAFS.  Are these problems part 
>>>>> of a normal "kernel release; openafs update" cycle and perhaps 
>>>>> I'm getting snagged just by being too early of an adopter?  I 
>>>>> wanted to raise the alarm on this and see if anything else was 
>>>>> needed from me as the reporter of the issue, but perhaps that's 
>>>>> an overreaction to what is just part of a normal process I just 
>>>>> haven't been tuned into in prior RHEL release cycles?
>>>> 
>>>> 
>>>> Kodiak,
>>>> 
>>>> On RHEL, DKMS is safe to use for kernel modules that restrict 
>>>> themselves to using the restricted set of kernel interfaces (the 
>>>> RHEL KABI) that Red Hat has designated will be supported across 
>>>> the lifespan of the RHEL major version number.  OpenAFS is not 
>>>> such a kernel module.  As a result it is vulnerable to breakage each and 
>>>> every time a new kernel is shipped.
>>> 
>>> Jeffrey,
>>> 
>>> the usual way to use DKMS is to either have it build a module for a 
>>> newly installed kernel or install a prebuilt module for that kernel. 
>>> It may be possible to abuse it for providing 

RE: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-03-02 Thread Anders Nordin
Hello,

Is there any progress on this issue? Can we expect a stable release for RHEL 
7.5?

MVH
Anders

-Original Message-
From: openafs-info-ad...@openafs.org [mailto:openafs-info-ad...@openafs.org] On 
Behalf Of Benjamin Kaduk
Sent: den 9 februari 2018 01:02
To: Kodiak Firesmith 
Cc: openafs-info 
Subject: Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

On Wed, Feb 07, 2018 at 11:46:28AM -0500, Kodiak Firesmith wrote:
> Hello again All,
> 
> As part of continued testing, I've been able to confirm that the 
> SystemD double-service startup thing only happens to my hosts when 
> going from RHEL
> 7.4 to RHEL 7.5beta.  On a test host installed directly as RHEL 
> 7.5beta, I get a bit farther with 1.6.18.22, in that I get to the 
> point where OpenAFS "kind of" works.

Thanks for tracking this down.  The rpm packaging maintainers may want to try 
to track down why the double-start happens in the upgrade scenario, as that's 
pretty nasty behavior.

> What I'm observing is that the openafs client Kernel module (built by 
> DKMS) loads fine, and just so long as you know where you need to go in 
> /afs, you can get there, and you can read and write files and the OpenAFS 'fs'
> command works.  But doing an 'ls' of /afs or any path underneath 
> results in
> "ls: reading directory /afs/: Not a directory".
> 
> I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL 
> 7.5beta host running ls on /afs and have created pastebins of both, as 
> well as an inline diff.
> 
> All can be seen at the following locations:
> 
> works
> https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ
> 
> fails
> https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg
> 
> 
> diff
> https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A
> 
> Hopefully this might help the OpenAFS devs, or someone might know what 
> might be borking on every RHEL 7.5 beta host.  It does fit with what 
> other
> 7.5 beta users have observed OpenAFS doing.

Yes, now it seems like all our reports are consistent, and we just have to wait 
for a developer to get a better look at what Red Hat changed in the kernel that 
we need to adapt to.

-Ben

> Thanks!
>  - Kodiak
> 
> On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand 
> 
> wrote:
> 
> >
> > > On 04.Feb 2018, at 02:11, Jeffrey Altman  wrote:
> > >
> > > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> > >> I'm relatively new to handling OpenAFS.  Are these problems part 
> > >> of a normal "kernel release; openafs update" cycle and perhaps 
> > >> I'm getting snagged just by being too early of an adopter?  I 
> > >> wanted to raise the alarm on this and see if anything else was 
> > >> needed from me as the reporter of the issue, but perhaps that's 
> > >> an overreaction to what is just part of a normal process I just 
> > >> haven't been tuned into in prior RHEL release cycles?
> > >
> > >
> > > Kodiak,
> > >
> > > On RHEL, DKMS is safe to use for kernel modules that restrict 
> > > themselves to using the restricted set of kernel interfaces (the 
> > > RHEL KABI) that Red Hat has designated will be supported across 
> > > the lifespan of the RHEL major version number.  OpenAFS is not 
> > > such a kernel module.  As a result it is vulnerable to breakage each and 
> > > every time a new kernel is shipped.
> >
> > Jeffrey,
> >
> > the usual way to use DKMS is to either have it build a module for a 
> > newly installed kernel or install a prebuilt module for that kernel. 
> > It may be possible to abuse it for providing a module built for 
> > another kernel, but I think that won't happen accidentally.
> >
> > You may be confusing DKMS with RHEL's "KABI tracking kmods". Those 
> > should be safe to use within a RHEL minor release (and the SL 
> > packaging has been using them like this since EL6.4), but aren't 
> > across minor releases (and that's why the SL packaging modifies the 
> > kmod handling to require a build for the minor release in question.
> >
> > > There are two types of failures that can occur:
> > >
> > > 1. a change results in failure to build the OpenAFS kernel module
> > >for the new kernel
> > >
> > > 2. a change results in the OpenAFS kernel module building and
> > >successfully loading but failing to operate correctly
> >
> > The latter shouldn't happen within a minor release, but can across 
> > minor rel

Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-09 Thread Benjamin Kaduk
On Wed, Feb 07, 2018 at 11:46:28AM -0500, Kodiak Firesmith wrote:
> Hello again All,
> 
> As part of continued testing, I've been able to confirm that the SystemD
> double-service startup thing only happens to my hosts when going from RHEL
> 7.4 to RHEL 7.5beta.  On a test host installed directly as RHEL 7.5beta, I
> get a bit farther with 1.6.18.22, in that I get to the point where OpenAFS
> "kind of" works.

Thanks for tracking this down.  The rpm packaging maintainers may
want to try to track down why the double-start happens in the
upgrade scenario, as that's pretty nasty behavior.

> What I'm observing is that the openafs client Kernel module (built by DKMS)
> loads fine, and just so long as you know where you need to go in /afs, you
> can get there, and you can read and write files and the OpenAFS 'fs'
> command works.  But doing an 'ls' of /afs or any path underneath results in
> "ls: reading directory /afs/: Not a directory".
> 
> I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL
> 7.5beta host running ls on /afs and have created pastebins of both, as well
> as an inline diff.
> 
> All can be seen at the following locations:
> 
> works
> https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ
> 
> fails
> https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg
> 
> 
> diff
> https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A
> 
> Hopefully this might help the OpenAFS devs, or someone might know what
> might be borking on every RHEL 7.5 beta host.  It does fit with what other
> 7.5 beta users have observed OpenAFS doing.

Yes, now it seems like all our reports are consistent, and we just
have to wait for a developer to get a better look at what Red Hat
changed in the kernel that we need to adapt to.

-Ben

> Thanks!
>  - Kodiak
> 
> On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand 
> wrote:
> 
> >
> > > On 04.Feb 2018, at 02:11, Jeffrey Altman  wrote:
> > >
> > > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> > >> I'm relatively new to handling OpenAFS.  Are these problems part of a
> > >> normal "kernel release; openafs update" cycle and perhaps I'm getting
> > >> snagged just by being too early of an adopter?  I wanted to raise the
> > >> alarm on this and see if anything else was needed from me as the
> > >> reporter of the issue, but perhaps that's an overreaction to what is
> > >> just part of a normal process I just haven't been tuned into in prior
> > >> RHEL release cycles?
> > >
> > >
> > > Kodiak,
> > >
> > > On RHEL, DKMS is safe to use for kernel modules that restrict themselves
> > > to using the restricted set of kernel interfaces (the RHEL KABI) that
> > > Red Hat has designated will be supported across the lifespan of the RHEL
> > > major version number.  OpenAFS is not such a kernel module.  As a result
> > > it is vulnerable to breakage each and every time a new kernel is shipped.
> >
> > Jeffrey,
> >
> > the usual way to use DKMS is to either have it build a module for a newly
> > installed kernel or install a prebuilt module for that kernel. It may be
> > possible to abuse it for providing a module built for another kernel, but
> > I think that won't happen accidentally.
> >
> > You may be confusing DKMS with RHEL's "KABI tracking kmods". Those should
> > be safe to use within a RHEL minor release (and the SL packaging has been
> > using them like this since EL6.4), but aren't across minor releases (and
> > that's why the SL packaging modifies the kmod handling to require a build
> > for the minor release in question.
> >
> > > There are two types of failures that can occur:
> > >
> > > 1. a change results in failure to build the OpenAFS kernel module
> > >for the new kernel
> > >
> > > 2. a change results in the OpenAFS kernel module building and
> > >successfully loading but failing to operate correctly
> >
> > The latter shouldn't happen within a minor release, but can across
> > minor releases.
> >
> > > It is the second of these possibilities that has taken place with the
> > > release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5
> > beta.
> > >
> > > Are you an early adopter of RHEL 7.5 beta?  Absolutely, its a beta
> > > release and as such you should expect that there will be bugs and that
> > > third party kernel modules that do not adhere to the KABI functionality
> > > might have compatibility issues.
> >
> > The -830 kernel can break 3rd-party modules using non-whitelisted ABIs,
> > whether or not they adhere to the "KABI functionality".
> >
> > > There was a compatibility issue with RHEL 7.4 kernel
> > > (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6
> > > release series this past week as part of 1.6.22.2:
> > >
> > >  http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2
> >
> > Yes, and this one was hard to fix. Thanks are due to Mark Vitale for
> > developing the fix and all those who reviewed and tested it.
> >
> > > Jeffrey Altman
> > > AuriStor, Inc.
> > >
> > > P.S. - 

Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-07 Thread Kodiak Firesmith
Hello again All,

As part of continued testing, I've been able to confirm that the SystemD
double-service startup thing only happens to my hosts when going from RHEL
7.4 to RHEL 7.5beta.  On a test host installed directly as RHEL 7.5beta, I
get a bit farther with 1.6.18.22, in that I get to the point where OpenAFS
"kind of" works.

What I'm observing is that the openafs client Kernel module (built by DKMS)
loads fine, and just so long as you know where you need to go in /afs, you
can get there, and you can read and write files and the OpenAFS 'fs'
command works.  But doing an 'ls' of /afs or any path underneath results in
"ls: reading directory /afs/: Not a directory".

I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL
7.5beta host running ls on /afs and have created pastebins of both, as well
as an inline diff.

All can be seen at the following locations:

works
https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ

fails
https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg


diff
https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A

Hopefully this might help the OpenAFS devs, or someone might know what
might be borking on every RHEL 7.5 beta host.  It does fit with what other
7.5 beta users have observed OpenAFS doing.

Thanks!
 - Kodiak

On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand 
wrote:

>
> > On 04.Feb 2018, at 02:11, Jeffrey Altman  wrote:
> >
> > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> >> I'm relatively new to handling OpenAFS.  Are these problems part of a
> >> normal "kernel release; openafs update" cycle and perhaps I'm getting
> >> snagged just by being too early of an adopter?  I wanted to raise the
> >> alarm on this and see if anything else was needed from me as the
> >> reporter of the issue, but perhaps that's an overreaction to what is
> >> just part of a normal process I just haven't been tuned into in prior
> >> RHEL release cycles?
> >
> >
> > Kodiak,
> >
> > On RHEL, DKMS is safe to use for kernel modules that restrict themselves
> > to using the restricted set of kernel interfaces (the RHEL KABI) that
> > Red Hat has designated will be supported across the lifespan of the RHEL
> > major version number.  OpenAFS is not such a kernel module.  As a result
> > it is vulnerable to breakage each and every time a new kernel is shipped.
>
> Jeffrey,
>
> the usual way to use DKMS is to either have it build a module for a newly
> installed kernel or install a prebuilt module for that kernel. It may be
> possible to abuse it for providing a module built for another kernel, but
> I think that won't happen accidentally.
>
> You may be confusing DKMS with RHEL's "KABI tracking kmods". Those should
> be safe to use within a RHEL minor release (and the SL packaging has been
> using them like this since EL6.4), but aren't across minor releases (and
> that's why the SL packaging modifies the kmod handling to require a build
> for the minor release in question.
>
> > There are two types of failures that can occur:
> >
> > 1. a change results in failure to build the OpenAFS kernel module
> >for the new kernel
> >
> > 2. a change results in the OpenAFS kernel module building and
> >successfully loading but failing to operate correctly
>
> The latter shouldn't happen within a minor release, but can across
> minor releases.
>
> > It is the second of these possibilities that has taken place with the
> > release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5
> beta.
> >
> > Are you an early adopter of RHEL 7.5 beta?  Absolutely, its a beta
> > release and as such you should expect that there will be bugs and that
> > third party kernel modules that do not adhere to the KABI functionality
> > might have compatibility issues.
>
> The -830 kernel can break 3rd-party modules using non-whitelisted ABIs,
> whether or not they adhere to the "KABI functionality".
>
> > There was a compatibility issue with RHEL 7.4 kernel
> > (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6
> > release series this past week as part of 1.6.22.2:
> >
> >  http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2
>
> Yes, and this one was hard to fix. Thanks are due to Mark Vitale for
> developing the fix and all those who reviewed and tested it.
>
> > Jeffrey Altman
> > AuriStor, Inc.
> >
> > P.S. - Welcome to the community.
>
> Seconded. In particular, the problem report regarding the EL7.5beta
> kernel was absolutely appropriate.
>
> --
> Stephan Wiesand
> DESY - DV -
> Platanenallee 6
> 15738 Zeuthen, Germany
>
>
>


Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-05 Thread Jeffrey Altman
On 2/5/2018 12:31 PM, Stephan Wiesand wrote:
> the usual way to use DKMS is to either have it build a module for a newly
> installed kernel or install a prebuilt module for that kernel. It may be
> possible to abuse it for providing a module built for another kernel, but
> I think that won't happen accidentally.
> 
> You may be confusing DKMS with RHEL's "KABI tracking kmods". Those should
> be safe to use within a RHEL minor release (and the SL packaging has been
> using them like this since EL6.4), but aren't across minor releases (and
> that's why the SL packaging modifies the kmod handling to require a build
> for the minor release in question.

On RHEL DKMS and KABI are tightly related because of the way in which
Red Hat engineers back port feature and functionality changes.  During
mainline kernel development a change is likely to break an existing
interface.  Doing so is encouraged so that compilation errors will
identify where code modifications are required.

On RHEL there is a strong desire to maintain KABI compatibility.
Whenever possible, backports are altered to preserve the existing binary
interfaces at the risk of changing the interface semantics.  As a
result, compilation failures do not occur but semantic differences can
result in breakage for third party kernel modules that have not been
modified at the source level to be aware of the change.

The breakage of OpenAFS by RHEL 7.4 and 7.5 (minor releases) were both
due to back porting functionality in this manner.  Such
incompatibilities can result in system panics or silent data corruption
depending upon the change.

Jeffrey Altman
AuriStor, Inc.
<>

smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-05 Thread Stephan Wiesand

> On 04.Feb 2018, at 02:11, Jeffrey Altman  wrote:
> 
> On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
>> I'm relatively new to handling OpenAFS.  Are these problems part of a
>> normal "kernel release; openafs update" cycle and perhaps I'm getting
>> snagged just by being too early of an adopter?  I wanted to raise the
>> alarm on this and see if anything else was needed from me as the
>> reporter of the issue, but perhaps that's an overreaction to what is
>> just part of a normal process I just haven't been tuned into in prior
>> RHEL release cycles?
> 
> 
> Kodiak,
> 
> On RHEL, DKMS is safe to use for kernel modules that restrict themselves
> to using the restricted set of kernel interfaces (the RHEL KABI) that
> Red Hat has designated will be supported across the lifespan of the RHEL
> major version number.  OpenAFS is not such a kernel module.  As a result
> it is vulnerable to breakage each and every time a new kernel is shipped.

Jeffrey,

the usual way to use DKMS is to either have it build a module for a newly
installed kernel or install a prebuilt module for that kernel. It may be
possible to abuse it for providing a module built for another kernel, but
I think that won't happen accidentally.

You may be confusing DKMS with RHEL's "KABI tracking kmods". Those should
be safe to use within a RHEL minor release (and the SL packaging has been
using them like this since EL6.4), but aren't across minor releases (and
that's why the SL packaging modifies the kmod handling to require a build
for the minor release in question.

> There are two types of failures that can occur:
> 
> 1. a change results in failure to build the OpenAFS kernel module
>for the new kernel
> 
> 2. a change results in the OpenAFS kernel module building and
>successfully loading but failing to operate correctly

The latter shouldn't happen within a minor release, but can across
minor releases.

> It is the second of these possibilities that has taken place with the
> release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5 beta.
> 
> Are you an early adopter of RHEL 7.5 beta?  Absolutely, its a beta
> release and as such you should expect that there will be bugs and that
> third party kernel modules that do not adhere to the KABI functionality
> might have compatibility issues.

The -830 kernel can break 3rd-party modules using non-whitelisted ABIs,
whether or not they adhere to the "KABI functionality".

> There was a compatibility issue with RHEL 7.4 kernel
> (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6
> release series this past week as part of 1.6.22.2:
> 
>  http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2

Yes, and this one was hard to fix. Thanks are due to Mark Vitale for
developing the fix and all those who reviewed and tested it.

> Jeffrey Altman
> AuriStor, Inc.
> 
> P.S. - Welcome to the community.

Seconded. In particular, the problem report regarding the EL7.5beta
kernel was absolutely appropriate.

-- 
Stephan Wiesand
DESY - DV -
Platanenallee 6
15738 Zeuthen, Germany


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info