Re: [gpfsug-discuss] Unkillable snapshots

2020-02-20 Thread Sanchez, Paul
Another possibility is to try increasing the timeouts. We used to have problems with this all of the time on clusters with thousands of nodes, but now we run with the following settings increased from their [defaults]… sqtBusyThreadTimeout [10] = 120 sqtCommandRetryDelay [60] = 120

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Valdis Klētnieks
On Thu, 20 Feb 2020 23:38:15 +, Jonathan Buzzard said: > For us, it is a Scottish government mandate that all public funded > bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days > from a critical vulnerability till your patched. No if's no buts, just > do it. Is that 10

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Jonathan Buzzard
On 20/02/2020 16:59, Skylar Thompson wrote: [SNIP] > > We have this problem too, but at the same time the same people require us > to run supported software and remove software versions with known > vulnerabilities. For us, it is a Scottish government mandate that all public funded bodies in

Re: [gpfsug-discuss] Unkillable snapshots

2020-02-20 Thread Sven Oehme
Filesystem quiesce failed has nothing to do with open files. What it means is that the filesystem couldn’t flush dirty data and metadata within a defined time to take a snapshot. This can be caused by to high maxfilestocache or pagepool settings. To give you an simplified example (its more

Re: [gpfsug-discuss] Policy REGEX question

2020-02-20 Thread Peter Serocka
Sorry, I believe you had nailed it already -- I didn't read carefully to the end. > On Feb 20, 2020, at 23:17, Peter Serocka wrote: > > Looking at the example '*/xy_survey_*/name/*.tif': > that's not a "real" (POSIX) regular expression but a use of > a much simpler "wildcard pattern" as

Re: [gpfsug-discuss] Policy REGEX question

2020-02-20 Thread Peter Serocka
Looking at the example '*/xy_survey_*/name/*.tif': that's not a "real" (POSIX) regular expression but a use of a much simpler "wildcard pattern" as commonly used in the UNIX shell when matching filenames. So I would assume that the 'f' parameter just mandates that REGEX() must apply "filename

Re: [gpfsug-discuss] Unkillable snapshots

2020-02-20 Thread Nathan Falk
Good point, Simon. Yes, it is a "file system quiesce" not a "fileset quiesce" so it is certainly possible that mmfsd is unable to quiesce because there are processes keeping files open in another fileset. Nate Falk IBM Spectrum Scale Level 2 Support Software Defined Infrastructure, IBM

Re: [gpfsug-discuss] Unkillable snapshots

2020-02-20 Thread Simon Thompson
Hi Nate, So we're trying to clean up snapshots from the GUI ... we've found that if it fails to delete one night for whatever reason, it then doesn't go back another day and clean up  But yes, essentially running this by hand to clean up. What I have found is that lsof hangs on some of

Re: [gpfsug-discuss] Unkillable snapshots

2020-02-20 Thread Nathan Falk
Hello Simon, Sadly, that "1036" is not a node ID, but just a counter. These are tricky to troubleshoot. Usually, by the time you realize it's happening and try to collect some data, things have already timed out. Since this mmdelsnapshot isn't something that's on a schedule from cron or the

[gpfsug-discuss] Policy REGEX question

2020-02-20 Thread Todd Ruston
Greetings, I've been working on creating some new policy rules that will require regular expression matching on path names. As a crutch to help me along, I've used the mmfind command to do some searches and used its policy output as a model. Interestingly, it creates REGEX() functions with an

Re: [gpfsug-discuss] Encryption - checking key server health (SKLM)

2020-02-20 Thread Stephen Ulmer
It seems like this belongs in mmhealth if it were to be bundled. If you need to use a third party tool, maybe fetch a particular key that is only used for fetching, so it’s compromise would represent no risk. -- Stephen Ulmer Sent from a mobile device; please excuse auto-correct silliness. >

Re: [gpfsug-discuss] Encryption - checking key server health (SKLM)

2020-02-20 Thread Valdis Klētnieks
On Wed, 19 Feb 2020 22:07:50 +, "Felipe Knop" said: > Having a tool that can retrieve keys independently from mmfsd would be useful > capability to have. Could you submit an RFE to request such function? Note that care needs to be taken to do this in a secure manner. pgppKeBauN2ww.pgp

Re: [gpfsug-discuss] Unkillable snapshots

2020-02-20 Thread Simon Thompson
Hmm ... mmdiag --tokenmgr shows: Server stats: requests 195417431 ServerSideRevokes 120140 nTokens 2146923 nranges 4124507 designated mnode appointed 55481 mnode thrashing detected 1036 So how do I convert "1036" to a node? Simon

Re: [gpfsug-discuss] Unkillable snapshots

2020-02-20 Thread Luke Raimbach
Move the file system manager :) On Thu, 20 Feb 2020, 19:45 Simon Thompson, wrote: > Hi, > > > We have a snapshot which is stuck in the state "DeleteRequired". When > deleting, it goes through the motions but eventually gives up with: > > Unable to quiesce all nodes; some processes are busy or

[gpfsug-discuss] Unkillable snapshots

2020-02-20 Thread Simon Thompson
Hi, We have a snapshot which is stuck in the state "DeleteRequired". When deleting, it goes through the motions but eventually gives up with: Unable to quiesce all nodes; some processes are busy or holding required resources. mmdelsnapshot: Command failed. Examine previous error messages to

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Skylar Thompson
On Thu, Feb 20, 2020 at 12:14:40PM -0500, David Johnson wrote: > Instead of keeping whole legacy systems around, could they achieve the same > with a container built from the legacy software? That is our hope, at least once we can get off CentOS 6 and run containers. :) Though containers aren't

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread David Johnson
Instead of keeping whole legacy systems around, could they achieve the same with a container built from the legacy software? > On Feb 20, 2020, at 11:59 AM, Skylar Thompson wrote: > > On Thu, Feb 20, 2020 at 04:29:40PM +, Ken Atkinson wrote: >> Fred, >> It may be that some HPC users "have

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson)

2020-02-20 Thread Maloney, J.D.
I assisted in a migration a couple years ago when we pushed teams to RHEL 7 and the science pipeline folks weren’t really concerned with the version of Scale we were using, but more what the new OS did to their code stack with the newer version of things like gcc and other libraries. They

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Skylar Thompson
On Thu, Feb 20, 2020 at 04:29:40PM +, Ken Atkinson wrote: > Fred, > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Renata Maria Dart
Hi Frederick, ours is a physics research lab with a mix of new eperiments and ongoing research. While some users embrace and desire the latest that tech has to offer and are actively writing code to take advantage of it, we also have users running older code on data from older experiments which

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS (Ken Atkinson)

2020-02-20 Thread Carl Zetie - ca...@us.ibm.com
Ken wrote: > It may be that some HPC users "have to" > reverify the results of their computations as being exactly the same as a > previous software stack and that is not a minor task. Any change may > require this verification process. How deep does “any change” go? Mod level? PTF? Efix? OS

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Ken Atkinson
Fred, It may be that some HPC users "have to" reverify the results of their computations as being exactly the same as a previous software stack and that is not a minor task. Any change may require this verification process. Ken Atkjnson On Thu, 20 Feb 2020, 14:35 Frederick Stock, wrote: >

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Renata Maria Dart
Thanks very much for your response Carl, this is the information I was looking for. Renata On Thu, 20 Feb 2020, Carl Zetie - ca...@us.ibm.com wrote: >To reiterate what?s been said on this thread, and to reaffirm the official IBM >position: > > > * Scale 4.2 reaches EOS in September 2020,

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Skylar Thompson
On Thu, Feb 20, 2020 at 11:23:52AM +, Jonathan Buzzard wrote: > On 20/02/2020 10:41, Simon Thompson wrote: > > Well, if you were buying some form of extended Life Support for > > Scale, then you might also be expecting to buy extended life for > > RedHat. RHEL6 has extended life support until

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Frederick Stock
This is a bit off the point of this discussion but it seemed like an appropriate context for me to post this question.  IMHO the state of software is such that it is expected to change rather frequently, for example the OS on your laptop/tablet/smartphone and your web browser.  It is correct to

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Carl Zetie - ca...@us.ibm.com
To reiterate what’s been said on this thread, and to reaffirm the official IBM position: * Scale 4.2 reaches EOS in September 2020, and RHEL6 not long after. In fact, the reason we have postponed 4.2 EOS for so long is precisely because it is the last Scale release to support RHEL6, and

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Felipe Knop
All,   (not an official IBM answer yet)   > 1.  Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients?   I believe extended support for 4.2.3 will be available, but no PTFs or efixes are provided for a release after end-of-service.   > 2.  Is gpfs 5.0 unsupported for

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Jonathan Buzzard
On 20/02/2020 10:41, Simon Thompson wrote: > Well, if you were buying some form of extended Life Support for > Scale, then you might also be expecting to buy extended life for > RedHat. RHEL6 has extended life support until June 2024. Sure its an > add on subscription cost, but some people might

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Simon Thompson
Well, if you were buying some form of extended Life Support for Scale, then you might also be expecting to buy extended life for RedHat. RHEL6 has extended life support until June 2024. Sure its an add on subscription cost, but some people might be prepared to do that over OS upgrades. Simon

Re: [gpfsug-discuss] GPFS 5 and supported rhel OS

2020-02-20 Thread Jonathan Buzzard
On 19/02/2020 23:34, Renata Maria Dart wrote: > Hi, I understand gpfs 4.2.3 is end of support this coming September. The > support page > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable > > indicates that gpfs version 5.0 will not run on rhel6