Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM guests

2020-01-17 Thread Stephen Ulmer
Having a sanctioned way to compile targeting a version of the kernel that is 
installed — but not running — would be helpful in many circumstances.

— 
Stephen

> On Jan 17, 2020, at 11:58 AM, Ryan Novosielski  wrote:
> 
> Yeah, support got back to me with a similar response earlier today that I’d 
> not seen yet that made it a lot clearer what I “did wrong". This would appear 
> to be the cause in my case:
> 
> [root@master config]# diff env.mcr env.mcr-1062.9.1 
> 4,5c4,5
> < #define LINUX_KERNEL_VERSION 31000999
> < #define LINUX_KERNEL_VERSION_VERBOSE 310001062009001
> ---
>> #define LINUX_KERNEL_VERSION 31001062
>> #define LINUX_KERNEL_VERSION_VERBOSE 31001062009001
> 
> 
> …the former having been generated by “make Autoconfig” and the latter 
> generated by my brain. I’m surprised at the first line — I’d have caught 
> myself that something different might have been needed if 3.10.0-1062 didn’t 
> already fit in the number of digits.
> 
> Anyway, I explained to support that the reason I do this is that I maintain a 
> couple of copies of env.mcr because occasionally there will be reasons to 
> need gpfs.gplbin for a few different kernel versions (other software that 
> doesn't want to be upgraded, etc.). I see I originally got this practice from 
> the README (or possibly our original installer consultants).
> 
> Basically what’s missing here, so far as I can see, is a way to use 
> mmbuildgpl/make Autoconfig but specify a target kernel version (and I guess 
> an update to the docs or at least /usr/lpp/mmfs/src/README) that doesn’t 
> suggest manually editing. Is there a way to at least find out what "make 
> Autoconfig” would use for a target LINUX_KERNEL_VERSION_VERBOSE? From what I 
> can see of makefile and config/configure, there’s no option for specifying 
> anything.
> 
> --
> 
> || \\UTGERS,   
> |---*O*---
> ||_// the State| Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ| Office of Advanced Research Computing - MSB C630, 
> Newark
> `'
> 
>> On Jan 17, 2020, at 11:36 AM, Felipe Knop  wrote:
>> 
>> Hi Ryan,
>> 
>> My interpretation of the analysis so far is that the content of 
>> LINUX_KERNEL_VERSION_VERBOSE in ' env.mcr' became incorrect. That is, it 
>> used to work well in a prior release of Scale, but not with 5.0.4.1 . This 
>> is because of a code change that added another digit to the version in 
>> LINUX_KERNEL_VERSION_VERBOSE to account for the 4-digit "fix level"  
>> (3.10.0-1000+) . Then, when the GPL layer was built, its sources saw the 
>> content of LINUX_KERNEL_VERSION_VERBOSE with the missing extra digit and 
>> compiled the 'wrong' pieces in -- in particular the incorrect value of 
>> SECURITY_INODE_INIT_SECURITY() . And that led to the crash.
>> 
>> The problem did not happen when mmbuildgpl was used since the correct value 
>> of LINUX_KERNEL_VERSION_VERBOSE was then set up.
>> 
>>  Felipe
>> 
>> 
>> Felipe Knop k...@us.ibm.com
>> GPFS Development and Security
>> IBM Systems
>> IBM Building 008
>> 2455 South Rd, Poughkeepsie, NY 12601
>> (845) 433-9314 T/L 293-9314
>> 
>> 
>> 
>> - Original message -
>> From: Ryan Novosielski 
>> Sent by: gpfsug-discuss-boun...@spectrumscale.org
>> To: gpfsug main discussion list 
>> Cc:
>> Subject: [EXTERNAL] Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 
>> on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM 
>> guests
>> Date: Fri, Jan 17, 2020 10:56 AM
>> 
>> That /is/ interesting. 
>> 
>> I’m a little confused about how that could be playing out in a case where 
>> I’m building on -1062.9.1, building for -1062.9.1, and running on -1062.9.1. 
>> Is there something inherent in the RPM building process that hasn’t caught 
>> up, or am I misunderstanding that change’s impact on it?
>> 
>> --
>> 
>> || \\UTGERS,   |---*O*---
>> ||_// the State | Ryan Novosielski - novos...@rutgers.edu
>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>> ||  \\of NJ | Office of Advanced Research Computing - MSB C630, 
>> Newark
>>`'
>> 
>>> On Jan 17, 2020, at 10:35, Felipe Knop  wrote:
>>> 
>>> 
>>> Hi Ryan,
>>> 
>>> Some interesting IBM-internal communication overnight. The problems seems 
>>> related to a change made to LINUX_KERNEL_VERSION_VERBOSE to handle the 
>>> additional digit in the kernel numbering (3.10.0-1000+) . The GPL layer 
>>> expected LINUX_KERNEL_VERSION_VERBOSE to have that extra digit, and its 
>>> absence resulted in an incorrect function being compiled in, which led to 
>>> the crash.
>>> 
>>> This, at least, seems to make sense, in terms of matching to the symptoms 
>>> of the problem.
>>> 
>>> We are still in internal debates on whether/how update our guidelines for 
>>> gplbin generation ...
>>> 
>>> 

Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM guests

2020-01-17 Thread Ryan Novosielski
Yeah, support got back to me with a similar response earlier today that I’d not 
seen yet that made it a lot clearer what I “did wrong". This would appear to be 
the cause in my case:

[root@master config]# diff env.mcr env.mcr-1062.9.1 
4,5c4,5
< #define LINUX_KERNEL_VERSION 31000999
< #define LINUX_KERNEL_VERSION_VERBOSE 310001062009001
---
> #define LINUX_KERNEL_VERSION 31001062
> #define LINUX_KERNEL_VERSION_VERBOSE 31001062009001


…the former having been generated by “make Autoconfig” and the latter generated 
by my brain. I’m surprised at the first line — I’d have caught myself that 
something different might have been needed if 3.10.0-1062 didn’t already fit in 
the number of digits.

Anyway, I explained to support that the reason I do this is that I maintain a 
couple of copies of env.mcr because occasionally there will be reasons to need 
gpfs.gplbin for a few different kernel versions (other software that doesn't 
want to be upgraded, etc.). I see I originally got this practice from the 
README (or possibly our original installer consultants).

Basically what’s missing here, so far as I can see, is a way to use 
mmbuildgpl/make Autoconfig but specify a target kernel version (and I guess an 
update to the docs or at least /usr/lpp/mmfs/src/README) that doesn’t suggest 
manually editing. Is there a way to at least find out what "make Autoconfig” 
would use for a target LINUX_KERNEL_VERSION_VERBOSE? From what I can see of 
makefile and config/configure, there’s no option for specifying anything.

--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
 `'

> On Jan 17, 2020, at 11:36 AM, Felipe Knop  wrote:
> 
> Hi Ryan,
>  
> My interpretation of the analysis so far is that the content of 
> LINUX_KERNEL_VERSION_VERBOSE in ' env.mcr' became incorrect. That is, it used 
> to work well in a prior release of Scale, but not with 5.0.4.1 . This is 
> because of a code change that added another digit to the version in 
> LINUX_KERNEL_VERSION_VERBOSE to account for the 4-digit "fix level"  
> (3.10.0-1000+) . Then, when the GPL layer was built, its sources saw the 
> content of LINUX_KERNEL_VERSION_VERBOSE with the missing extra digit and 
> compiled the 'wrong' pieces in -- in particular the incorrect value of 
> SECURITY_INODE_INIT_SECURITY() . And that led to the crash.
>  
> The problem did not happen when mmbuildgpl was used since the correct value 
> of LINUX_KERNEL_VERSION_VERBOSE was then set up.
>  
>   Felipe
>  
> 
> Felipe Knop k...@us.ibm.com
> GPFS Development and Security
> IBM Systems
> IBM Building 008
> 2455 South Rd, Poughkeepsie, NY 12601
> (845) 433-9314 T/L 293-9314
>  
>  
>  
> - Original message -
> From: Ryan Novosielski 
> Sent by: gpfsug-discuss-boun...@spectrumscale.org
> To: gpfsug main discussion list 
> Cc:
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 
> on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM 
> guests
> Date: Fri, Jan 17, 2020 10:56 AM
>  
> That /is/ interesting. 
>  
> I’m a little confused about how that could be playing out in a case where I’m 
> building on -1062.9.1, building for -1062.9.1, and running on -1062.9.1. Is 
> there something inherent in the RPM building process that hasn’t caught up, 
> or am I misunderstanding that change’s impact on it?
>  
> --
> 
> || \\UTGERS,   |---*O*---
> ||_// the State | Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ | Office of Advanced Research Computing - MSB C630, Newark
> `'
>  
>> On Jan 17, 2020, at 10:35, Felipe Knop  wrote:
>>  
>> 
>> Hi Ryan,
>>  
>> Some interesting IBM-internal communication overnight. The problems seems 
>> related to a change made to LINUX_KERNEL_VERSION_VERBOSE to handle the 
>> additional digit in the kernel numbering (3.10.0-1000+) . The GPL layer 
>> expected LINUX_KERNEL_VERSION_VERBOSE to have that extra digit, and its 
>> absence resulted in an incorrect function being compiled in, which led to 
>> the crash.
>>  
>> This, at least, seems to make sense, in terms of matching to the symptoms of 
>> the problem.
>>  
>> We are still in internal debates on whether/how update our guidelines for 
>> gplbin generation ...
>>  
>> Regards,
>>  
>>   Felipe
>>  
>> 
>> Felipe Knop k...@us.ibm.com
>> GPFS Development and Security
>> IBM Systems
>> IBM Building 008
>> 2455 South Rd, Poughkeepsie, NY 12601
>> (845) 433-9314 T/L 293-9314
>>  
>>  
>>  
>> - Original message -
>> From: Ryan Novosielski 
>> Sent by: gpfsug-discuss-boun...@spectrumscale.org
>> To: "gpfsug-discuss@spectrumscale.org" 
>> 

Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM guests

2020-01-17 Thread Felipe Knop
Hi Ryan,
 
My interpretation of the analysis so far is that the content of LINUX_KERNEL_VERSION_VERBOSE in ' env.mcr' became incorrect. That is, it used to work well in a prior release of Scale, but not with 5.0.4.1 . This is because of a code change that added another digit to the version in LINUX_KERNEL_VERSION_VERBOSE to account for the 4-digit "fix level"  (3.10.0-1000+) . Then, when the GPL layer was built, its sources saw the content of LINUX_KERNEL_VERSION_VERBOSE with the missing extra digit and compiled the 'wrong' pieces in -- in particular the incorrect value of SECURITY_INODE_INIT_SECURITY() . And that led to the crash.
 
The problem did not happen when mmbuildgpl was used since the correct value of LINUX_KERNEL_VERSION_VERBOSE was then set up.
 
  Felipe
 
Felipe Knop k...@us.ibm.comGPFS Development and SecurityIBM SystemsIBM Building 0082455 South Rd, Poughkeepsie, NY 12601(845) 433-9314 T/L 293-9314 
 
 
- Original message -From: Ryan Novosielski Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: gpfsug main discussion list Cc:Subject: [EXTERNAL] Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM guestsDate: Fri, Jan 17, 2020 10:56 AM  That /is/ interesting. 
 
I’m a little confused about how that could be playing out in a case where I’m building on -1062.9.1, building for -1062.9.1, and running on -1062.9.1. Is there something inherent in the RPM building process that hasn’t caught up, or am I misunderstanding that change’s impact on it? 
--|| \\UTGERS,       |---*O*---||_// the State     | Ryan Novosielski - novos...@rutgers.edu|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark`'
 
On Jan 17, 2020, at 10:35, Felipe Knop  wrote: 

Hi Ryan,
 
Some interesting IBM-internal communication overnight. The problems seems related to a change made to LINUX_KERNEL_VERSION_VERBOSE to handle the additional digit in the kernel numbering (3.10.0-1000+) . The GPL layer expected LINUX_KERNEL_VERSION_VERBOSE to have that extra digit, and its absence resulted in an incorrect function being compiled in, which led to the crash.
 
This, at least, seems to make sense, in terms of matching to the symptoms of the problem.
 
We are still in internal debates on whether/how update our guidelines for gplbin generation ...
 
Regards,
 
  Felipe
 
Felipe Knop k...@us.ibm.comGPFS Development and SecurityIBM SystemsIBM Building 0082455 South Rd, Poughkeepsie, NY 12601(845) 433-9314 T/L 293-9314 
 
 
- Original message -From: Ryan Novosielski Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: "gpfsug-discuss@spectrumscale.org" Cc:Subject: [EXTERNAL] Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM guestsDate: Thu, Jan 16, 2020 4:33 PM 
-BEGIN PGP SIGNED MESSAGE-Hash: SHA1Hi Felipe,I either misunderstood support or convinced them to take furtheraction. It at first looked like they were suggesting "mmbuildgpl fixedit: case closed" (I know they wanted to close the SalesForce caseanyway, which would prevent communication on the issue). At thispoint, they've asked for a bunch more information.Support is asking similar questions re: the speculations, and I'llprovide them with the relevant output ASAP, but I did confirm all ofthat, including that there were no stray mmfs26/tracedev kernelmodules anywhere else in the relevant /lib/modules PATHs. In theoriginal case, I built on a machine running 3.10.0-957.27.2, butpointed to the 3.10.0-1062.9.1 source code/defined the relevantportions of usr/lpp/mmfs/src/config/env.mcr. That's always workedbefore, and rebuilding once the build system was running3.10.0-1062.9.1 as well did not change anything either. In all cases,the GPFS version was Spectrum Scale Data Access Edition 5.0.4-1. Ifyou build against either the wrong kernel version or the wrong GPFSversion, both will appear right in the filename of the gpfs.gplbin RPMyou build. Mine is called:gpfs.gplbin-3.10.0-1062.9.1.el7.x86_64-5.0.4-1.x86_64.rpmAnyway, thanks for your response; I know you might not befollowing/working on this directly, but I figured the extra info mightbe of interest.On 1/16/20 8:41 AM, Felipe Knop wrote:> Hi Ryan,>> I'm aware of this ticket, and I understand that there has been> active communication with the service team on this problem.>> The crash itself, as you indicate, looks like a problem that has> been fixed:>>  https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0>>  The fact that the problem goes away when *mmbuildgpl* is issued> appears to point to some incompatibility with kernel levels and/or> Scale version levels. Just speculating, some possible areas 

Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM guests

2020-01-17 Thread Ryan Novosielski
That /is/ interesting.

I’m a little confused about how that could be playing out in a case where I’m 
building on -1062.9.1, building for -1062.9.1, and running on -1062.9.1. Is 
there something inherent in the RPM building process that hasn’t caught up, or 
am I misunderstanding that change’s impact on it?

--

|| \\UTGERS,   |---*O*---
||_// the State | Ryan Novosielski - 
novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'

On Jan 17, 2020, at 10:35, Felipe Knop  wrote:


Hi Ryan,

Some interesting IBM-internal communication overnight. The problems seems 
related to a change made to LINUX_KERNEL_VERSION_VERBOSE to handle the 
additional digit in the kernel numbering (3.10.0-1000+) . The GPL layer 
expected LINUX_KERNEL_VERSION_VERBOSE to have that extra digit, and its absence 
resulted in an incorrect function being compiled in, which led to the crash.

This, at least, seems to make sense, in terms of matching to the symptoms of 
the problem.

We are still in internal debates on whether/how update our guidelines for 
gplbin generation ...

Regards,

  Felipe


Felipe Knop k...@us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314



- Original message -
From: Ryan Novosielski 
Sent by: gpfsug-discuss-boun...@spectrumscale.org
To: "gpfsug-discuss@spectrumscale.org" 
Cc:
Subject: [EXTERNAL] Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on 
Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM guests
Date: Thu, Jan 16, 2020 4:33 PM

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Felipe,

I either misunderstood support or convinced them to take further
action. It at first looked like they were suggesting "mmbuildgpl fixed
it: case closed" (I know they wanted to close the SalesForce case
anyway, which would prevent communication on the issue). At this
point, they've asked for a bunch more information.

Support is asking similar questions re: the speculations, and I'll
provide them with the relevant output ASAP, but I did confirm all of
that, including that there were no stray mmfs26/tracedev kernel
modules anywhere else in the relevant /lib/modules PATHs. In the
original case, I built on a machine running 3.10.0-957.27.2, but
pointed to the 3.10.0-1062.9.1 source code/defined the relevant
portions of usr/lpp/mmfs/src/config/env.mcr. That's always worked
before, and rebuilding once the build system was running
3.10.0-1062.9.1 as well did not change anything either. In all cases,
the GPFS version was Spectrum Scale Data Access Edition 5.0.4-1. If
you build against either the wrong kernel version or the wrong GPFS
version, both will appear right in the filename of the gpfs.gplbin RPM
you build. Mine is called:

gpfs.gplbin-3.10.0-1062.9.1.el7.x86_64-5.0.4-1.x86_64.rpm

Anyway, thanks for your response; I know you might not be
following/working on this directly, but I figured the extra info might
be of interest.

On 1/16/20 8:41 AM, Felipe Knop wrote:
> Hi Ryan,
>
> I'm aware of this ticket, and I understand that there has been
> active communication with the service team on this problem.
>
> The crash itself, as you indicate, looks like a problem that has
> been fixed:
>
> https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-423
13-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0
>
>  The fact that the problem goes away when *mmbuildgpl* is issued
> appears to point to some incompatibility with kernel levels and/or
> Scale version levels. Just speculating, some possible areas may
> be:
>
>
> * The RPM might have been built on a version of Scale without the
> fix * The RPM might have been built on a different (minor) version
> of the kernel * Somehow the VM picked a "leftover" GPFS kernel
> module, as opposed to the one included in gpfs.gplbin   -- given
> that mmfsd never complained about a missing GPL kernel module
>
>
> Felipe
>
>  Felipe Knop k...@us.ibm.com GPFS Development and Security IBM
> Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601
> (845) 433-9314 T/L 293-9314
>
>
>
>
> - Original message - From: Ryan Novosielski
>  Sent by:
> gpfsug-discuss-boun...@spectrumscale.org To: gpfsug main discussion
> list  Cc: Subject: [EXTERNAL]
> [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum
> Scale Data Access Edition installed via gpfs.gplbin RPM on KVM
> guests Date: Wed, Jan 15, 2020 4:11 PM
>
> Hi there,
>
> I know some of the Spectrum Scale developers look at this list.
> I’m having a little trouble with support on this problem.
>
> We are seeing crashes with GPFS 5.0.4-1 Data Access Edition on KVM
> guests with a portability layer that has been installed via
> gpfs.gplbin RPMs that we 

Re: [gpfsug-discuss] How to install efix with yum ?

2020-01-17 Thread Skylar Thompson
Thanks for the pointer! We're in the process of upgrading from 4.2.3-6 to
4.2.3-19 so I'll make a note that we should start setting that environment
variable when we build gplbin.

On Thu, Jan 16, 2020 at 05:59:14PM -0500, IBM Spectrum Scale wrote:
> On Spectrum Scale 4.2.3.15 or later and 5.0.2.2 or later, you can install 
> gplbin without stopping GPFS by using the following step:
> 
> Build gpfs.gplbin using mmbuildgpl --build-packge
> Set environment variable MM_INSTALL_ONLY to 1 before install gpfs.gplbin 
> package with rpm -i gpfs.gplbin*.rpm 
>  
> Regards, The Spectrum Scale (GPFS) team
> 
> --
> If you feel that your question can benefit other users of  Spectrum Scale 
> (GPFS), then please post it to the public IBM developerWroks Forum at 
> https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479
> . 
> 
> If your query concerns a potential software error in Spectrum Scale (GPFS) 
> and you have an IBM software maintenance contract please contact 
> 1-800-237-5511 in the United States or your local IBM Service Center in 
> other countries. 
> 
> The forum is informally monitored as time permits and should not be used 
> for priority messages to the Spectrum Scale (GPFS) team.
> 
> gpfsug-discuss-boun...@spectrumscale.org wrote on 01/16/2020 10:32:27 AM:
> 
> > From: Skylar Thompson 
> > To: gpfsug-discuss@spectrumscale.org
> > Date: 01/16/2020 10:35 AM
> > Subject: [EXTERNAL] Re: [gpfsug-discuss] How to install efix with yum ?
> > Sent by: gpfsug-discuss-boun...@spectrumscale.org
> > 
> > Another problem we've run into with automating GPFS installs/upgrades is
> > that the gplbin (kernel module) packages have a post-install script that
> > will unmount the filesystem *even if the package isn't for the running
> > kernel*. We needed to write some custom reporting in our configuration
> > management system to only install gplbin if GPFS was already stopped on 
> the
> > node.
> > 
> > On Wed, Jan 15, 2020 at 10:35:23PM +, Sanchez, Paul wrote:
> > > This reminds me that there is one more thing which drives the 
> > convoluted process I described earlier???
> > > 
> > > Automation.  Deployment solutions which use yum to build new hosts
> > are often the place where one notices the problem.  They would need 
> > to determine that they should install both the base-version and efix
> > RPMS and in that order.  IIRC, there were no RPM dependencies 
> > connecting the  efix RPMs to their base-version equivalents, so 
> > there was nothing to signal YUM that installing the efix requires 
> > that the base-version be installed first.
> > > 
> > > (Our particular case is worse than just this though, since we 
> > prohibit installing two versions/releases for the same (non-kernel) 
> > package name.  But that???s not the case for everyone.)
> > > 
> > > -Paul
> > > 
> > > From: gpfsug-discuss-boun...@spectrumscale.org  > boun...@spectrumscale.org> On Behalf Of IBM Spectrum Scale
> > > Sent: Wednesday, January 15, 2020 16:00
> > > To: gpfsug main discussion list 
> > > Cc: gpfsug-discuss-boun...@spectrumscale.org
> > > Subject: Re: [gpfsug-discuss] How to install efix with yum ?
> > > 
> > > 
> > > This message was sent by an external party.
> > > 
> > > 
> > > >> I don't see any yum options which match rpm's '--force' option.
> > > Actually, you do not need to use --force option since efix RPMs 
> > have incremental efix number in rpm name.
> > > 
> > > Efix package provides update RPMs to be installed on top of 
> > corresponding PTF GA version. When you install 5.0.4.1 efix9, if 5.
> > 0.4.1 is already installed on your system, "yum update" should work.
> > > 
> > > Regards, The Spectrum Scale (GPFS) team
> > > 
> > > 
> > 
> --
> > > If you feel that your question can benefit other users of Spectrum
> > Scale (GPFS), then please post it to the public IBM developerWroks Forum 
> at 
> > https://www.ibm.com/developerworks/community/forums/html/forum?
> > id=----0479.
> > > 
> > > If your query concerns a potential software error in Spectrum 
> > Scale (GPFS) and you have an IBM software maintenance contract 
> > please contact 1-800-237-5511 in the United States or your local IBM
> > Service Center in other countries.
> > > 
> > > The forum is informally monitored as time permits and should not 
> > be used for priority messages to the Spectrum Scale (GPFS) team.
> > > 
> > > [Inactive hide details for Jonathan Buzzard ---01/15/2020 02:09:33
> > PM---On 15/01/2020 18:30, Sanchez, Paul wrote: > Yum 
> > generall]Jonathan Buzzard ---01/15/2020 02:09:33 PM---On 15/01/2020 
> > 18:30, Sanchez, Paul wrote: > Yum generally only wants there to be 
> > single version of a
> > > 
> > > From: Jonathan Buzzard  > 

Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM guests

2020-01-17 Thread Felipe Knop
Hi Ryan,
 
Some interesting IBM-internal communication overnight. The problems seems related to a change made to LINUX_KERNEL_VERSION_VERBOSE to handle the additional digit in the kernel numbering (3.10.0-1000+) . The GPL layer expected LINUX_KERNEL_VERSION_VERBOSE to have that extra digit, and its absence resulted in an incorrect function being compiled in, which led to the crash.
 
This, at least, seems to make sense, in terms of matching to the symptoms of the problem.
 
We are still in internal debates on whether/how update our guidelines for gplbin generation ...
 
Regards,
 
  Felipe
 
Felipe Knop k...@us.ibm.comGPFS Development and SecurityIBM SystemsIBM Building 0082455 South Rd, Poughkeepsie, NY 12601(845) 433-9314 T/L 293-9314 
 
 
- Original message -From: Ryan Novosielski Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: "gpfsug-discuss@spectrumscale.org" Cc:Subject: [EXTERNAL] Re: [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum Scale Data Access Edition installed via gpfs.gplbin RPM on KVM guestsDate: Thu, Jan 16, 2020 4:33 PM 
-BEGIN PGP SIGNED MESSAGE-Hash: SHA1Hi Felipe,I either misunderstood support or convinced them to take furtheraction. It at first looked like they were suggesting "mmbuildgpl fixedit: case closed" (I know they wanted to close the SalesForce caseanyway, which would prevent communication on the issue). At thispoint, they've asked for a bunch more information.Support is asking similar questions re: the speculations, and I'llprovide them with the relevant output ASAP, but I did confirm all ofthat, including that there were no stray mmfs26/tracedev kernelmodules anywhere else in the relevant /lib/modules PATHs. In theoriginal case, I built on a machine running 3.10.0-957.27.2, butpointed to the 3.10.0-1062.9.1 source code/defined the relevantportions of usr/lpp/mmfs/src/config/env.mcr. That's always workedbefore, and rebuilding once the build system was running3.10.0-1062.9.1 as well did not change anything either. In all cases,the GPFS version was Spectrum Scale Data Access Edition 5.0.4-1. Ifyou build against either the wrong kernel version or the wrong GPFSversion, both will appear right in the filename of the gpfs.gplbin RPMyou build. Mine is called:gpfs.gplbin-3.10.0-1062.9.1.el7.x86_64-5.0.4-1.x86_64.rpmAnyway, thanks for your response; I know you might not befollowing/working on this directly, but I figured the extra info mightbe of interest.On 1/16/20 8:41 AM, Felipe Knop wrote:> Hi Ryan,>> I'm aware of this ticket, and I understand that there has been> active communication with the service team on this problem.>> The crash itself, as you indicate, looks like a problem that has> been fixed:>> https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0>>  The fact that the problem goes away when *mmbuildgpl* is issued> appears to point to some incompatibility with kernel levels and/or> Scale version levels. Just speculating, some possible areas may> be:>>> * The RPM might have been built on a version of Scale without the> fix * The RPM might have been built on a different (minor) version> of the kernel * Somehow the VM picked a "leftover" GPFS kernel> module, as opposed to the one included in gpfs.gplbin   -- given> that mmfsd never complained about a missing GPL kernel module>>> Felipe>>  Felipe Knop k...@us.ibm.com GPFS Development and Security IBM> Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601> (845) 433-9314 T/L 293-9314> - Original message - From: Ryan Novosielski>  Sent by:> gpfsug-discuss-boun...@spectrumscale.org To: gpfsug main discussion> list  Cc: Subject: [EXTERNAL]> [gpfsug-discuss] Kernel BUG/panic in mm/slub.c:3772 on Spectrum> Scale Data Access Edition installed via gpfs.gplbin RPM on KVM> guests Date: Wed, Jan 15, 2020 4:11 PM>> Hi there,>> I know some of the Spectrum Scale developers look at this list.> I’m having a little trouble with support on this problem.>> We are seeing crashes with GPFS 5.0.4-1 Data Access Edition on KVM> guests with a portability layer that has been installed via> gpfs.gplbin RPMs that we built at our site and have used to> install GPFS all over our environment. We’ve not seen this problem> so far on any physical hosts, but have now experienced it on guests> running on number of our KVM hypervisors, across vendors and> firmware versions, etc. At one time I thought it was all happening> on systems using Mellanox virtual functions for Infiniband, but> we’ve now seen it on VMs without VFs. There may be an SELinux> interaction, but some of our hosts have it disabled outright, some> are Permissive, and some were working successfully with 5.0.2.x> GPFS.>> What I’ve been instructed to try to solve this problem has been to> run “mmbuildgpl”, and it has solved the problem. I don’t consider> running "mmbuildgpl" a real solution, however. If RPMs are a> supported means of