Another possibility is to try increasing the timeouts. We used to have
problems with this all of the time on clusters with thousands of nodes, but now
we run with the following settings increased from their [defaults]…
sqtBusyThreadTimeout [10] = 120
sqtCommandRetryDelay [60] = 120
On Thu, 20 Feb 2020 23:38:15 +, Jonathan Buzzard said:
> For us, it is a Scottish government mandate that all public funded
> bodies in Scotland are Cyber Essentials Plus compliant. That's 10 days
> from a critical vulnerability till your patched. No if's no buts, just
> do it.
Is that 10
On 20/02/2020 16:59, Skylar Thompson wrote:
[SNIP]
>
> We have this problem too, but at the same time the same people require us
> to run supported software and remove software versions with known
> vulnerabilities.
For us, it is a Scottish government mandate that all public funded
bodies in
Filesystem quiesce failed has nothing to do with open files.
What it means is that the filesystem couldn’t flush dirty data and metadata
within a defined time to take a snapshot. This can be caused by to high
maxfilestocache or pagepool settings.
To give you an simplified example (its more
Sorry, I believe you had nailed it already -- I didn't
read carefully to the end.
> On Feb 20, 2020, at 23:17, Peter Serocka wrote:
>
> Looking at the example '*/xy_survey_*/name/*.tif':
> that's not a "real" (POSIX) regular expression but a use of
> a much simpler "wildcard pattern" as
Looking at the example '*/xy_survey_*/name/*.tif':
that's not a "real" (POSIX) regular expression but a use of
a much simpler "wildcard pattern" as commonly used in the UNIX shell
when matching filenames.
So I would assume that the 'f' parameter just mandates that
REGEX() must apply "filename
Good point, Simon. Yes, it is a "file system quiesce" not a "fileset
quiesce" so it is certainly possible that mmfsd is unable to quiesce
because there are processes keeping files open in another fileset.
Nate Falk
IBM Spectrum Scale Level 2 Support
Software Defined Infrastructure, IBM
Hi Nate,
So we're trying to clean up snapshots from the GUI ... we've found that if it
fails to delete one night for whatever reason, it then doesn't go back another
day and clean up
But yes, essentially running this by hand to clean up.
What I have found is that lsof hangs on some of
Hello Simon,
Sadly, that "1036" is not a node ID, but just a counter.
These are tricky to troubleshoot. Usually, by the time you realize it's
happening and try to collect some data, things have already timed out.
Since this mmdelsnapshot isn't something that's on a schedule from cron or
the
Greetings,
I've been working on creating some new policy rules that will require regular
expression matching on path names. As a crutch to help me along, I've used the
mmfind command to do some searches and used its policy output as a model.
Interestingly, it creates REGEX() functions with an
It seems like this belongs in mmhealth if it were to be bundled.
If you need to use a third party tool, maybe fetch a particular key that is
only used for fetching, so it’s compromise would represent no risk.
--
Stephen Ulmer
Sent from a mobile device; please excuse auto-correct silliness.
>
On Wed, 19 Feb 2020 22:07:50 +, "Felipe Knop" said:
> Having a tool that can retrieve keys independently from mmfsd would be useful
> capability to have. Could you submit an RFE to request such function?
Note that care needs to be taken to do this in a secure manner.
pgppKeBauN2ww.pgp
Hmm ... mmdiag --tokenmgr shows:
Server stats: requests 195417431 ServerSideRevokes 120140
nTokens 2146923 nranges 4124507
designated mnode appointed 55481 mnode thrashing detected 1036
So how do I convert "1036" to a node?
Simon
Move the file system manager :)
On Thu, 20 Feb 2020, 19:45 Simon Thompson, wrote:
> Hi,
>
>
> We have a snapshot which is stuck in the state "DeleteRequired". When
> deleting, it goes through the motions but eventually gives up with:
>
> Unable to quiesce all nodes; some processes are busy or
Hi,
We have a snapshot which is stuck in the state "DeleteRequired". When deleting,
it goes through the motions but eventually gives up with:
Unable to quiesce all nodes; some processes are busy or holding required
resources.
mmdelsnapshot: Command failed. Examine previous error messages to
On Thu, Feb 20, 2020 at 12:14:40PM -0500, David Johnson wrote:
> Instead of keeping whole legacy systems around, could they achieve the same
> with a container built from the legacy software?
That is our hope, at least once we can get off CentOS 6 and run containers.
:)
Though containers aren't
Instead of keeping whole legacy systems around, could they achieve the same
with a container built from the legacy software?
> On Feb 20, 2020, at 11:59 AM, Skylar Thompson wrote:
>
> On Thu, Feb 20, 2020 at 04:29:40PM +, Ken Atkinson wrote:
>> Fred,
>> It may be that some HPC users "have
I assisted in a migration a couple years ago when we pushed teams to RHEL 7 and
the science pipeline folks weren’t really concerned with the version of Scale
we were using, but more what the new OS did to their code stack with the newer
version of things like gcc and other libraries. They
On Thu, Feb 20, 2020 at 04:29:40PM +, Ken Atkinson wrote:
> Fred,
> It may be that some HPC users "have to"
> reverify the results of their computations as being exactly the same as a
> previous software stack and that is not a minor task. Any change may
> require this verification
Hi Frederick, ours is a physics research lab with a mix of new
eperiments and ongoing research. While some users embrace and desire
the latest that tech has to offer and are actively writing code to
take advantage of it, we also have users running older code on data
from older experiments which
Ken wrote:
> It may be that some HPC users "have to"
> reverify the results of their computations as being exactly the same as a
> previous software stack and that is not a minor task. Any change may
> require this verification process.
How deep does “any change” go? Mod level? PTF? Efix? OS
Fred,
It may be that some HPC users "have to"
reverify the results of their computations as being exactly the same as a
previous software stack and that is not a minor task. Any change may
require this verification process.
Ken Atkjnson
On Thu, 20 Feb 2020, 14:35 Frederick Stock, wrote:
>
Thanks very much for your response Carl, this is the information I was looking
for.
Renata
On Thu, 20 Feb 2020, Carl Zetie - ca...@us.ibm.com wrote:
>To reiterate what?s been said on this thread, and to reaffirm the official IBM
>position:
>
>
> * Scale 4.2 reaches EOS in September 2020,
On Thu, Feb 20, 2020 at 11:23:52AM +, Jonathan Buzzard wrote:
> On 20/02/2020 10:41, Simon Thompson wrote:
> > Well, if you were buying some form of extended Life Support for
> > Scale, then you might also be expecting to buy extended life for
> > RedHat. RHEL6 has extended life support until
This is a bit off the point of this discussion but it seemed like an appropriate context for me to post this question. IMHO the state of software is such that it is expected to change rather frequently, for example the OS on your laptop/tablet/smartphone and your web browser. It is correct to
To reiterate what’s been said on this thread, and to reaffirm the official IBM
position:
* Scale 4.2 reaches EOS in September 2020, and RHEL6 not long after. In
fact, the reason we have postponed 4.2 EOS for so long is precisely because it
is the last Scale release to support RHEL6, and
All,
(not an official IBM answer yet)
> 1. Is there extended support available for 4.2.3 on rhel6 for gpfs servers and clients?
I believe extended support for 4.2.3 will be available, but no PTFs or efixes are provided for a release after end-of-service.
> 2. Is gpfs 5.0 unsupported for
On 20/02/2020 10:41, Simon Thompson wrote:
> Well, if you were buying some form of extended Life Support for
> Scale, then you might also be expecting to buy extended life for
> RedHat. RHEL6 has extended life support until June 2024. Sure its an
> add on subscription cost, but some people might
Well, if you were buying some form of extended Life Support for Scale, then you
might also be expecting to buy extended life for RedHat. RHEL6 has extended
life support until June 2024. Sure its an add on subscription cost, but some
people might be prepared to do that over OS upgrades.
Simon
On 19/02/2020 23:34, Renata Maria Dart wrote:
> Hi, I understand gpfs 4.2.3 is end of support this coming September. The
> support page
>
> https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux__rhelkerntable
>
> indicates that gpfs version 5.0 will not run on rhel6
30 matches
Mail list logo