Re: [gpfsug-discuss] slow filesystem

2019-07-10 Thread Buterbaugh, Kevin L
Hi Damir,

Have you checked to see whether gssio4 might have a failing internal HD / SSD?  
Thanks…

Kevin

On Jul 10, 2019, at 7:16 AM, Damir Krstic 
mailto:damir.krs...@gmail.com>> wrote:

Over last couple of days our reads and writes on our compute cluster are 
experiencing real slow reads and writes on one of the filesystems in the 
cluster. We are talking with IBM and have Sev. 1 ticket open, but I figured to 
ask here about the warning message we are seeing in GPFS logs.

The cluster is configured as following:

4 IO servers in the main gpfs cluster
700+ compute nodes in the gpfs cluster

home filesystem is slow but projects filesystem seems to be fast. Not many 
waiters on the IO servers (almost none) but a lot of waiters on the remote 
cluster.

The message that is giving us a pause is the following:
Jul 10 07:05:31 gssio4 mmfs: [N] Writing into file 
/var/mmfs/gen/LastLeaseRequestSent took 10.5 seconds

Why is taking so long to write to to the local file?

Thanks,
Damir

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cede76a6c2bd743c836b708d705307a86%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636983578133164831sdata=ATtqXDDChaZTouZZRkjf%2F79pIK%2Fc1DAwq6KUU%2FKYca4%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Adding to an existing GPFS ACL

2019-03-27 Thread Buterbaugh, Kevin L
Hi Jonathan,

Thanks.  We have done a very similar thing when we’re dealing with a situation 
where:  1) all files and directories in the fileset are starting out with the 
same existing ACL, and 2) all need the same modification made to them.

Unfortunately, in this situation item 2 is true, but item 1 is _not_.  That’s 
what’s making this one a bit thorny…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Mar 27, 2019, at 11:33 AM, Fosburgh,Jonathan 
mailto:jfosb...@mdanderson.org>> wrote:

I misunderstood you.

Pretty much what we've been doing is maintaining "ACL template" files based on 
how our filesystem hierarchy is set up.  Basically, fileset foo has a foo.acl 
file that contains what the ACL is supposed to be.  If we need to change the 
ACL, we modify that file with the new ACL and then pass it through a simple 
(and expensive, I'm sure) script.  This wouldn't be necessary if in heritance 
flowed down on existing files and directories.  If you have CIFS access, you 
can also use Windows to do this, but it is MUCH slower.

--
Jonathan Fosburgh
Principal Application Systems Analyst
IT Operations Storage Team
The University of Texas MD Anderson Cancer Center
(713) 745-9346
[X]

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>>
Sent: Wednesday, March 27, 2019 11:19:03 AM
To: gpfsug main discussion list
Subject: [EXT] Re: [gpfsug-discuss] Adding to an existing GPFS ACL

WARNING: This email originated from outside of MD Anderson. Please validate the 
sender's email address before clicking on links or attachments as they may not 
be safe.
Hi Jonathan,

Thanks for the response.  I did look at mmeditacl, but unless I’m missing 
something it’s interactive (kind of like mmedquota is by default).  If I had 
only a handful of files / directories to modify that would be fine, but in this 
case there are thousands of ACL’s that need modifying.

Am I missing something?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Mar 27, 2019, at 11:02 AM, Fosburgh,Jonathan 
mailto:jfosb...@mdanderson.org>> wrote:

Try mmeditacl.

--
Jonathan Fosburgh
Principal Application Systems Analyst
IT Operations Storage Team
The University of Texas MD Anderson Cancer Center
(713) 745-9346
[X]

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>>
Sent: Wednesday, March 27, 2019 10:59:17 AM
To: gpfsug main discussion list
Subject: [EXT] [gpfsug-discuss] Adding to an existing GPFS ACL

WARNING: This email originated from outside of MD Anderson. Please validate the 
sender's email address before clicking on links or attachments as they may not 
be safe.
Hi All,

First off, I have very limited experience with GPFS ACL’s, so please forgive me 
if I’m missing something obvious here.  AFAIK, this is the first time we’ve hit 
something like this…

We have a fileset where all the files / directories have GPFS NFSv4 ACL’s set 
on them.  However, unlike most of our filesets where the same ACL is applied to 
every file / directory in the share, this one has different ACL’s on different 
files / directories.  Now we have the need to add to the existing ACL’s … 
another group needs access.  Unlike regular Unix / Linux ACL’s where setfacl 
can be used to just add to an ACL (i.e. setfacl -R g:group_name:rwx), I’m not 
seeing where GPFS has a similar command … i.e. mmputacl seems to expect the 
_entire_ new ACL to be supplied via either manual entry or an input file.  
That’s obviously problematic in this scenario.

So am I missing something?  Is there an easier solution than writing a script 
which recurses over the fileset, gets the existing ACL with mmgetacl and 
outputs that to a file, edits that file to add in the new group, and passes 
that as input to mmputacl?  That seems very cumbersome and error prone, 
especially if I’m the one writing the script!

Thanks…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

The information contained in this e-mail message may be privileged, 
confidential, and/or protected from disclosure. This e-mail m

Re: [gpfsug-discuss] Adding to an existing GPFS ACL

2019-03-27 Thread Buterbaugh, Kevin L
Hi Jonathan,

Thanks for the response.  I did look at mmeditacl, but unless I’m missing 
something it’s interactive (kind of like mmedquota is by default).  If I had 
only a handful of files / directories to modify that would be fine, but in this 
case there are thousands of ACL’s that need modifying.

Am I missing something?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Mar 27, 2019, at 11:02 AM, Fosburgh,Jonathan 
mailto:jfosb...@mdanderson.org>> wrote:

Try mmeditacl.

--
Jonathan Fosburgh
Principal Application Systems Analyst
IT Operations Storage Team
The University of Texas MD Anderson Cancer Center
(713) 745-9346
[X]

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>>
Sent: Wednesday, March 27, 2019 10:59:17 AM
To: gpfsug main discussion list
Subject: [EXT] [gpfsug-discuss] Adding to an existing GPFS ACL

WARNING: This email originated from outside of MD Anderson. Please validate the 
sender's email address before clicking on links or attachments as they may not 
be safe.
Hi All,

First off, I have very limited experience with GPFS ACL’s, so please forgive me 
if I’m missing something obvious here.  AFAIK, this is the first time we’ve hit 
something like this…

We have a fileset where all the files / directories have GPFS NFSv4 ACL’s set 
on them.  However, unlike most of our filesets where the same ACL is applied to 
every file / directory in the share, this one has different ACL’s on different 
files / directories.  Now we have the need to add to the existing ACL’s … 
another group needs access.  Unlike regular Unix / Linux ACL’s where setfacl 
can be used to just add to an ACL (i.e. setfacl -R g:group_name:rwx), I’m not 
seeing where GPFS has a similar command … i.e. mmputacl seems to expect the 
_entire_ new ACL to be supplied via either manual entry or an input file.  
That’s obviously problematic in this scenario.

So am I missing something?  Is there an easier solution than writing a script 
which recurses over the fileset, gets the existing ACL with mmgetacl and 
outputs that to a file, edits that file to add in the new group, and passes 
that as input to mmputacl?  That seems very cumbersome and error prone, 
especially if I’m the one writing the script!

Thanks…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

The information contained in this e-mail message may be privileged, 
confidential, and/or protected from disclosure. This e-mail message may contain 
protected health information (PHI); dissemination of PHI should comply with 
applicable federal and state laws. If you are not the intended recipient, or an 
authorized representative of the intended recipient, any further review, 
disclosure, use, dissemination, distribution, or copying of this message or any 
attachment (or the information contained therein) is strictly prohibited. If 
you think that you have received this e-mail message in error, please notify 
the sender by return e-mail and delete all references to it and its contents 
from your systems.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb2040f23087c4aac0b4908d6b2cf11ed%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636892999763011551sdata=pXhLlRfQuJ4bKfib4bQBlWY4OP5WoZh1YQ%2Bjne2ycEY%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Adding to an existing GPFS ACL

2019-03-27 Thread Buterbaugh, Kevin L
Hi All,

First off, I have very limited experience with GPFS ACL’s, so please forgive me 
if I’m missing something obvious here.  AFAIK, this is the first time we’ve hit 
something like this…

We have a fileset where all the files / directories have GPFS NFSv4 ACL’s set 
on them.  However, unlike most of our filesets where the same ACL is applied to 
every file / directory in the share, this one has different ACL’s on different 
files / directories.  Now we have the need to add to the existing ACL’s … 
another group needs access.  Unlike regular Unix / Linux ACL’s where setfacl 
can be used to just add to an ACL (i.e. setfacl -R g:group_name:rwx), I’m not 
seeing where GPFS has a similar command … i.e. mmputacl seems to expect the 
_entire_ new ACL to be supplied via either manual entry or an input file.  
That’s obviously problematic in this scenario.

So am I missing something?  Is there an easier solution than writing a script 
which recurses over the fileset, gets the existing ACL with mmgetacl and 
outputs that to a file, edits that file to add in the new group, and passes 
that as input to mmputacl?  That seems very cumbersome and error prone, 
especially if I’m the one writing the script!

Thanks…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] GPFS v5: Blocksizes and subblocks

2019-03-27 Thread Buterbaugh, Kevin L
Hi All,

So I was looking at the presentation referenced below and it states - on 
multiple slides - that there is one system storage pool per cluster.  Really?  
Shouldn’t that be one system storage pool per filesystem?!?  If not, please 
explain how in my GPFS cluster with two (local) filesystems I see two different 
system pools with two different sets of NSDs, two different capacities, and two 
different percentages full???

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Mar 26, 2019, at 11:27 AM, Dorigo Alvise (PSI) 
mailto:alvise.dor...@psi.ch>> wrote:

Hi Marc,
"Indirect block size" is well explained in this presentation:

http://files.gpfsug.org/presentations/2016/south-bank/D2_P2_A_spectrum_scale_metadata_dark_V2a.pdf

pages 37-41

Cheers,

   Alvise


From: 
gpfsug-discuss-boun...@spectrumscale.org
 
[gpfsug-discuss-boun...@spectrumscale.org]
 on behalf of Caubet Serrabou Marc (PSI) 
[marc.cau...@psi.ch]
Sent: Tuesday, March 26, 2019 4:39 PM
To: gpfsug main discussion list
Subject: [gpfsug-discuss] GPFS v5: Blocksizes and subblocks

Hi all,

according to several GPFS presentations as well as according to the man pages:

 Table 1. Block sizes and subblock sizes

+---+---+
| Block size| Subblock size |
+---+---+
| 64 KiB| 2 KiB |
+---+---+
| 128 KiB   | 4 KiB |
+---+---+
| 256 KiB, 512 KiB, 1 MiB, 2| 8 KiB |
| MiB, 4 MiB|   |
+---+---+
| 8 MiB, 16 MiB | 16 KiB|
+---+---+

A block size of 8MiB or 16MiB should contain subblocks of 16KiB.

However, when creating a new filesystem with 16MiB blocksize, looks like is 
using 128KiB subblocks:

[root@merlindssio01 ~]# mmlsfs merlin
flagvaluedescription
---  ---
 -f 8192 Minimum fragment (subblock) size 
in bytes (system pool)
131072   Minimum fragment (subblock) size 
in bytes (other pools)
 -i 4096 Inode size in bytes
 -I 32768Indirect block size in bytes
.
.
.
 -n 128  Estimated number of nodes that 
will mount file system
 -B 1048576  Block size (system pool)
16777216 Block size (other pools)
.
.
.

What am I missing? According to documentation, I expect this to be a fixed 
value, or it isn't at all?

On the other hand, I don't really understand the concept 'Indirect block size 
in bytes', can somebody clarify or provide some details about this setting?

Thanks a lot and best regards,
Marc
_
Paul Scherrer Institut
High Performance Computing
Marc Caubet Serrabou
Building/Room: WHGA/019A
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.cau...@psi.ch
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C5b28a9a0d39a47fd3f0608d6b208186a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636892145179836634sdata=23F22sUiyCYEg0H3AdbkBAnhPpLVBVTh39zRr%2FLYCmc%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] SSDs for data - DWPD?

2019-03-18 Thread Buterbaugh, Kevin L
Thanks for the suggestion, Simon.  Yes, we’ve looked at that, but we think that 
we’re going to potentially be in a situation where we’re using fairly big SSDs 
already.  For example, if we bought 30 6.4 TB SSDs rated at 1 DWPD and 
configured them as 6 4+1P RAID 5 LUNs, then we’d end up with a usable capacity 
of 6 * 4 * 6 = ~144 TB usable space in our “hot” pool.  That would satisfy our 
capacity needs and also not exceed the 1 DWPD rating of the drives.

BTW, we noticed with one particular vendor that their 3 DWPD drives were 
exactly 1/3rd the size of their 1 DWPD drives … which makes us wonder if that’s 
coincidence or not.  Anybody know for sure?

Thanks…

Kevin

> On Mar 18, 2019, at 4:13 PM, Simon Thompson  wrote:
> 
> Did you look at pricing larger SSDs than you need and only using partial 
> capacity to get more DWPD out of them?
> 
> I.e. 1TB drive 3dpwd = 3TBpd
> 2TB drive (using 1/2 capacity) = 6TBpd
> 
> Simon
> 
> From: gpfsug-discuss-boun...@spectrumscale.org 
> [gpfsug-discuss-boun...@spectrumscale.org] on behalf of Buterbaugh, Kevin L 
> [kevin.buterba...@vanderbilt.edu]
> Sent: 18 March 2019 19:09
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] SSDs for data - DWPD?
> 
> Hi All,
> 
> Just wanted to follow up with the results of my survey … I received a grand 
> total of two responses (Thanks Alex and John).  In their case, they’re using 
> SSDs with a 10 DWPD rating.
> 
> The motivation behind my asking this question was … money!  ;-).  Seriously, 
> 10 DWPD drives are still very expensive, while 3 DWPD drives are 
> significantly less expensive and 1 DWPD drives are even cheaper still.  While 
> we would NOT feel comfortable using anything less than 10 DWPD drives for 
> metadata, we’re wondering about using less expensive drives for data.
> 
> For example, let’s just say that you’re getting ready to set up a brand new 
> GPFS 5 formatted filesystem of 1-2 PB in size.  You’re considering having 3 
> pools:
> 
> 1) a metadata only system pool of 10 DWPD SSDs.  4K inodes, and a ton of 
> small files that’ll fit in the inode.
> 2) a data only “hot” pool (i.e. the default pool for writes) of SSDs.
> 3) a data only “capacity” pool of 12 TB spinning disks.
> 
> And let’s just say that you have looked back at the historical data you’ve 
> collected and you see that over the last 6 months or so you’ve been averaging 
> 10-12 TB of data being written into your existing filesystem per day.  You 
> want to do migrations between pools only on the weekends if at all possible.
> 
> 12 * 7 = 84 TB.  So if you had somewhere between 125 - 150 TB of SSDs ... 1 
> DWPD SSDs … then in theory you should easily be able to handle your 
> anticipated workload without coming close to exceeding the 1 DWPD rating of 
> the SSDs.
> 
> However, as the saying goes, while in theory there’s no difference between 
> theory and practice, in practice there is ... so am I overlooking anything 
> here from a GPFS perspective???
> 
> If anybody still wants to respond on the DWPD rating of the SSDs they use for 
> data, I’m still listening.
> 
> Thanks…
> 
> Kevin
> 
> P.S.  I still have a couple of “outstanding issues” to respond to that I’ve 
> posted to the list about previously:
> 
> 1) the long I/O’s we see occasionally in the output of “mmdiag —iohist” on 
> our NSD servers.  We’re still trying to track that down … it seems to happen 
> only with a subset of our hardware - most of the time at least - but we’re 
> still working to track down what triggers it … i.e. at this point I can’t say 
> whether it’s really the hardware or a user abusing the hardware.
> 
> 2) I promised to post benchmark results of 3 different metadata configs:  a) 
> RAID 1 mirrors, b) a RAID 5 stripe, c) no RAID, but GPFS metadata replication 
> of 3.  That benchmarking has been put on hold for reasons I can’t really 
> discuss on this mailing list at this time … but hopefully soon.
> 
> I haven’t forgotten the above and will respond back on the list when it’s 
> appropriate.  Thanks...
> 
> On Mar 8, 2019, at 10:24 AM, Buterbaugh, Kevin L 
> mailto:kevin.buterba...@vanderbilt.edu>> 
> wrote:
> 
> Hi All,
> 
> This is kind of a survey if you will, so for this one it might be best if you 
> responded directly to me and I’ll summarize the results next week.
> 
> Question 1 - do you use SSDs for data?  If not - i.e. if you only use SSDs 
> for metadata (as we currently do) - thanks, that’s all!  If, however, you do 
> use SSDs for data, please see Question 2.
> 
> Question 2 - what is the DWPD (daily writes per day) of the SSDs that you use 
> for data?
> 
> Question 3 - is that different than the DWPD o

Re: [gpfsug-discuss] SSDs for data - DWPD?

2019-03-18 Thread Buterbaugh, Kevin L
Hi All,

Just wanted to follow up with the results of my survey … I received a grand 
total of two responses (Thanks Alex and John).  In their case, they’re using 
SSDs with a 10 DWPD rating.

The motivation behind my asking this question was … money!  ;-).  Seriously, 10 
DWPD drives are still very expensive, while 3 DWPD drives are significantly 
less expensive and 1 DWPD drives are even cheaper still.  While we would NOT 
feel comfortable using anything less than 10 DWPD drives for metadata, we’re 
wondering about using less expensive drives for data.

For example, let’s just say that you’re getting ready to set up a brand new 
GPFS 5 formatted filesystem of 1-2 PB in size.  You’re considering having 3 
pools:

1) a metadata only system pool of 10 DWPD SSDs.  4K inodes, and a ton of small 
files that’ll fit in the inode.
2) a data only “hot” pool (i.e. the default pool for writes) of SSDs.
3) a data only “capacity” pool of 12 TB spinning disks.

And let’s just say that you have looked back at the historical data you’ve 
collected and you see that over the last 6 months or so you’ve been averaging 
10-12 TB of data being written into your existing filesystem per day.  You want 
to do migrations between pools only on the weekends if at all possible.

12 * 7 = 84 TB.  So if you had somewhere between 125 - 150 TB of SSDs ... 1 
DWPD SSDs … then in theory you should easily be able to handle your anticipated 
workload without coming close to exceeding the 1 DWPD rating of the SSDs.

However, as the saying goes, while in theory there’s no difference between 
theory and practice, in practice there is ... so am I overlooking anything here 
from a GPFS perspective???

If anybody still wants to respond on the DWPD rating of the SSDs they use for 
data, I’m still listening.

Thanks…

Kevin

P.S.  I still have a couple of “outstanding issues” to respond to that I’ve 
posted to the list about previously:

1) the long I/O’s we see occasionally in the output of “mmdiag —iohist” on our 
NSD servers.  We’re still trying to track that down … it seems to happen only 
with a subset of our hardware - most of the time at least - but we’re still 
working to track down what triggers it … i.e. at this point I can’t say whether 
it’s really the hardware or a user abusing the hardware.

2) I promised to post benchmark results of 3 different metadata configs:  a) 
RAID 1 mirrors, b) a RAID 5 stripe, c) no RAID, but GPFS metadata replication 
of 3.  That benchmarking has been put on hold for reasons I can’t really 
discuss on this mailing list at this time … but hopefully soon.

I haven’t forgotten the above and will respond back on the list when it’s 
appropriate.  Thanks...

On Mar 8, 2019, at 10:24 AM, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi All,

This is kind of a survey if you will, so for this one it might be best if you 
responded directly to me and I’ll summarize the results next week.

Question 1 - do you use SSDs for data?  If not - i.e. if you only use SSDs for 
metadata (as we currently do) - thanks, that’s all!  If, however, you do use 
SSDs for data, please see Question 2.

Question 2 - what is the DWPD (daily writes per day) of the SSDs that you use 
for data?

Question 3 - is that different than the DWPD of the SSDs for metadata?

Question 4 - any pertinent information in regards to your answers above (i.e. 
if you’ve got a filesystem that data is uploaded to only once and never 
modified after that then that’s useful to know!)?

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] SSDs for data - DWPD?

2019-03-10 Thread Buterbaugh, Kevin L
Hi All,

This is kind of a survey if you will, so for this one it might be best if you 
responded directly to me and I’ll summarize the results next week.

Question 1 - do you use SSDs for data?  If not - i.e. if you only use SSDs for 
metadata (as we currently do) - thanks, that’s all!  If, however, you do use 
SSDs for data, please see Question 2.

Question 2 - what is the DWPD (daily writes per day) of the SSDs that you use 
for data?

Question 3 - is that different than the DWPD of the SSDs for metadata?

Question 4 - any pertinent information in regards to your answers above (i.e. 
if you’ve got a filesystem that data is uploaded to only once and never 
modified after that then that’s useful to know!)?

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Clarification of mmdiag --iohist output

2019-02-21 Thread Buterbaugh, Kevin L
ectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact  1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.



From:    "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To:gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:02/16/2019 08:18 PM
Subject:[gpfsug-discuss] Clarification of mmdiag --iohist output
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

Been reading man pages, docs, and Googling, and haven’t found a definitive 
answer to this question, so I knew exactly where to turn… ;-)

I’m dealing with some slow I/O’s to certain storage arrays in our environments 
… like really, really slow I/O’s … here’s just one example from one of my NSD 
servers of a 10 second I/O:

08:49:34.943186  Wdata   30:41615622144   2048 10115.192  srv   dm-92   
   

So here’s my question … when mmdiag —iohist tells me that that I/O took 
slightly over 10 seconds, is that:

1.  The time from when the NSD server received the I/O request from the client 
until it shipped the data back onto the wire towards the client?
2.  The time from when the client issued the I/O request until it received the 
data back from the NSD server?
3.  Something else?

I’m thinking it’s #1, but want to confirm.  Which one it is has very obvious 
implications for our troubleshooting steps.  Thanks in advance…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>- 
(615)875-9633
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2bfb2e8e30e64fa06c0f08d6959b2d38%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636860891056277100=PP%2Bs3UFJOHEIFNk7aOXJgo46GVeQr6P%2FLwgDUIGzAXQ%3D=0>



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2bfb2e8e30e64fa06c0f08d6959b2d38%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636860891056297114sdata=5pL67mhVyScJovkRHRqZog9bM5BZG8F2q972czIYAbA%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Clarification of mmdiag --iohist output

2019-02-20 Thread Buterbaugh, Kevin L
ing on fast condvar for signal 
0x7F6D7FA106F8 RdmaSend_NSD_SVC

0.000426716 16691   TRACE_RDMA: handleRecvComp: success 1 of 1 nWrSuccess 1 
index 11 cookie 12 wr_id 0xB0E6E bufferP 0x7F6D7EEE1700 byte_len 4144

0.000432140 37005   TRACE_NSD: nsdDoIO_ReadAndCheck: read complete, len 0 
status 6 err 0 bufP 0x180656C1058 dioIsOverRdma 1 ioDataP 0x20BC000 
ckSumType NsdCksum_None
0.000432163 37005   TRACE_NSD: nsdDoIO_ReadAndCheck: exit err 0
0.000433707 37005   TRACE_GCRYPTO: EncBufPool::releaseTmpBuf(): exit 
bsize=8192 err=0 inBufP=0x180656C1058 bufP=0x180656C1058 index=0
0.000433777 37005   TRACE_NSD: nsdDoIO exit: err 0 0

0.000433844 37005   TRACE_IO: FIO: read data tag 743942 108137 ioVecSize 1 
1st buf 0x122F000 nsdId 0A011103:5C59DBAC da 34:490710888 nSectors 8 err 0
0.000434236 37005   TRACE_DISK: postIO: qosid A00D91E read data disk 
 ioVecSize 1 1st buf 0x122F000 err 0 duration 0.000215000 by 
iocMBHandler (DioHandlerThread)

I'd suggest looking at "mmdiag --iohist" on the NSD server itself and see 
if/how that differs from the client. The other thing you could do is see if 
your NSD server queues are backing up (e.g. "mmfsadm saferdump nsd" and look 
for "requests pending" on queues where the "active" field is > 0). That doesn't 
necessarily mean you need to tune your queues but I'd suggest that if the disk 
I/O on your NSD server looks healthy (e.g. low latency, not overly-taxed) that 
you could benefit from queue tuning.

-Aaron

On Sat, Feb 16, 2019 at 9:47 AM Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:
Hi All,

Been reading man pages, docs, and Googling, and haven’t found a definitive 
answer to this question, so I knew exactly where to turn… ;-)

I’m dealing with some slow I/O’s to certain storage arrays in our environments 
… like really, really slow I/O’s … here’s just one example from one of my NSD 
servers of a 10 second I/O:

08:49:34.943186  Wdata   30:41615622144   2048 10115.192  srv   dm-92   
   

So here’s my question … when mmdiag —iohist tells me that that I/O took 
slightly over 10 seconds, is that:

1.  The time from when the NSD server received the I/O request from the client 
until it shipped the data back onto the wire towards the client?
2.  The time from when the client issued the I/O request until it received the 
data back from the NSD server?
3.  Something else?

I’m thinking it’s #1, but want to confirm.  Which one it is has very obvious 
implications for our troubleshooting steps.  Thanks in advance…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org%2F=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd18386b226474395328208d696ada1a9%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636862069849496039=ZWdCsYl%2BHFHKGOSQRgW1JtdllbJ4786gk0GMX6DQuX0%3D=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd18386b226474395328208d696ada1a9%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636862069849506035=EjqZg1KP9NLMV1ki8KMzIhJQeTnacyxQHm6ueCzekP8%3D=0>
___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd18386b226474395328208d696ada1a9%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636862069849506035=sO2GgFf9Nhryu0q73EbFOgJI%2B2amId%2BMttXpPejUe1E%3D=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd18386b226474395328208d696ada1a9%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636862069849516053=vqg3be8uYOiLnX%2FLIy4OnLa9lS0ikkPC6j9wZUJ4RxY%3D=0>

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd18386b226474395328208d696ada1a9%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636862069849536067sdata=O5p52oLmSxQMWo2wwkVx8Z%2FapYpsAU9lAJ2cKvB095c%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Clarification of mmdiag --iohist output

2019-02-16 Thread Buterbaugh, Kevin L
Hi All,

Been reading man pages, docs, and Googling, and haven’t found a definitive 
answer to this question, so I knew exactly where to turn… ;-)

I’m dealing with some slow I/O’s to certain storage arrays in our environments 
… like really, really slow I/O’s … here’s just one example from one of my NSD 
servers of a 10 second I/O:

08:49:34.943186  Wdata   30:41615622144   2048 10115.192  srv   dm-92   
   

So here’s my question … when mmdiag —iohist tells me that that I/O took 
slightly over 10 seconds, is that:

1.  The time from when the NSD server received the I/O request from the client 
until it shipped the data back onto the wire towards the client?
2.  The time from when the client issued the I/O request until it received the 
data back from the NSD server?
3.  Something else?

I’m thinking it’s #1, but want to confirm.  Which one it is has very obvious 
implications for our troubleshooting steps.  Thanks in advance…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Node ‘crash and restart’ event using GPFS callback?

2019-01-31 Thread Buterbaugh, Kevin L
Hi Bob,

We use the nodeLeave callback to detect node expels … for what you’re wanting 
to do I wonder if nodeJoin might work??  If a node joins the cluster and then 
has an uptime of a few minutes you could go looking in /tmp/mmfs.  HTH...

--
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Jan 30, 2019, at 3:02 PM, Sanchez, Paul 
mailto:paul.sanc...@deshaw.com>> wrote:

There are some cases which I don’t believe can be caught with callbacks (e.g. 
DMS = Dead Man Switch).  But you could possibly use preStartup to check the 
host uptime to make an assumption if GPFS was restarted long after the host 
booted.  You could also peek in /tmp/mmfs and only report if you find something 
there.  That said, the docs say that preStartup fires after the node joins the 
cluster.  So if that means once the node is ‘active’ then you might miss out on 
nodes stuck in ‘arbitrating’ for a while due to a waiter problem.

We run a script with cron which monitors the myriad things which can go wrong 
and attempt to right those which are safe to fix, and raise alerts 
appropriately.  Something like that, outside the reach of GPFS, is often a good 
choice if you don’t need to know something the moment it happens.

Thx
Paul

From: 
gpfsug-discuss-boun...@spectrumscale.org
 
mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 On Behalf Of Oesterlin, Robert
Sent: Wednesday, January 30, 2019 3:52 PM
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Subject: [gpfsug-discuss] Node ‘crash and restart’ event using GPFS callback?

Anyone crafted a good way to detect a node ‘crash and restart’ event using GPFS 
callbacks? I’m thinking “preShutdown” but I’m not sure if that’s the best. What 
I’m really looking for is did the node shutdown (abort) and create a dump in 
/tmp/mmfs


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cccd012a939124326a53908d686f64117%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636844789557921185sdata=9bMPd%2F%2B%2Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset?

2019-01-21 Thread Buterbaugh, Kevin L
Hi All,

I just wanted to follow up on this thread … the only way I have found to obtain 
a list of filesets and their associated junction paths as a non-root user is 
via the REST API (and thanks to those who suggested that).  However, AFAICT 
querying the REST API via a script would expose the username / password used to 
do so to anyone who bothered to look at the code, which would in turn allow a 
knowledgeable and curious user to query the REST API themselves for other 
information we do not necessarily want to expose to them.  Therefore, it is not 
an acceptable solution to us.

Therefore, unless someone responds with a way to allow a non-root user to 
obtain fileset junction paths that doesn’t involve the REST API, I’m afraid I’m 
at a dead end in terms of making our quota usage Python script something that I 
can share with the broader community.  It just has too much site-specific code 
in it.  Sorry…

Kevin

P.S.  In case you’re curious about how the quota script is obtaining those 
junction paths … we have a cron job that runs once per hour on the cluster 
manager that dumps the output of mmlsfileset to a text file, which the script 
then reads.  The cron job used to just run once per day and used to just run 
mmlsfileset.  I have modified it to be a shell script which checks for the load 
average on the cluster manager being less than 10 and that there are no waiters 
of more than 10 seconds duration.  If both of those conditions are true, it 
runs mmlsfileset.  If either are not, it simply exits … the idea being that one 
or both of those would likely be true if something were going on with the 
cluster manager that would cause the mmlsfileset to hang.

I have also modified the quota script itself so that it checks that the 
junction path for a fileset actually exists before attempting to stat it (duh - 
should’ve done that from the start), which handles the case where a user would 
run the quota script and it would bomb off with an exception because the 
fileset was deleted and the cron job hadn’t run yet.  If a new fileset is 
created, well, it just won’t get checked by the quota script until the cron job 
runs successfully.  We have decided that this is an acceptable compromise.

On Jan 15, 2019, at 8:46 AM, Marc A Kaplan 
mailto:makap...@us.ibm.com>> wrote:

Personally, I agree that there ought to be a way in the product.

In the meawhile, you no doubt already have some ways to tell your users where 
to find their filesets as pathnames.
Otherwise, how are they accessing their files?

And to keep things somewhat sane, I'd bet filesets are all linked to one or 
small number of well known paths in the filesystem.
Like  /AGpfsFilesystem/filesets/...  Plus you could add symlinks and/or as has 
been suggested post info extracted from mmlsfileset and/or mmlsquota.

So as a practical matter, is this an urgent problem...?  Why?  How?
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset?

2019-01-15 Thread Buterbaugh, Kevin L
Hi Marc (All),

Yes, I can easily determine where filesets are linked here … it is, as you 
said, in just one or two paths.  The script as it stands now has been doing 
that for several years and only needs a couple of relatively minor tweaks to be 
even more useful to _us_ by whittling down a couple of edge cases relating to 
fileset creation / deletion.

However … there was a request to share the script with the broader community … 
something I’m willing to do if I can get it in a state where it would be useful 
to others with little or no modification.  Anybody who’s been on this list for 
any length of time knows how much help I’ve received from the community over 
the years.  I truly appreciate that and would like to give back, even in a 
minor way, if possible.

But in order to do that the script can’t be full of local assumptions … that’s 
it in a nutshell … that’s why I want to programmatically determine the junction 
path at run time as a non-root user.

I’ll also mention here that early on in this thread Simon Thompson suggested 
looking into the REST API.  Sure enough, you can get the information that way … 
but, AFAICT, that would require the script to contain a username / password 
combination that would allow anyone with access to the script to then use that 
authentication information to access other information within GPFS that we 
probably don’t want them to have access to.  If I’m mistaken about that, then 
please feel free to enlighten me.

Thanks again…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Jan 15, 2019, at 8:46 AM, Marc A Kaplan 
mailto:makap...@us.ibm.com>> wrote:

Personally, I agree that there ought to be a way in the product.

In the meawhile, you no doubt already have some ways to tell your users where 
to find their filesets as pathnames.
Otherwise, how are they accessing their files?

And to keep things somewhat sane, I'd bet filesets are all linked to one or 
small number of well known paths in the filesystem.
Like  /AGpfsFilesystem/filesets/...  Plus you could add symlinks and/or as has 
been suggested post info extracted from mmlsfileset and/or mmlsquota.

So as a practical matter, is this an urgent problem...?  Why?  How?
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbd2c28fdb60041f3434e08d67af83b11%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636831603904557717sdata=A74TTq%2FQvyhEMHaolklbiMAEnaGVuHNiyhVYfn4wRek%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

2019-01-15 Thread Buterbaugh, Kevin L
Hi Scott and Valdis (and everyone else),

Thanks for your responses.

Yes, we _could_ easily build a local naming scheme … the name of the fileset 
matches the name of a folder in one of a couple of parent directories.  
However, an earlier response to my post asked if we’d be willing to share our 
script with the community and we would … _if_ we can make it generic enough to 
be useful.  Local naming schemes hardcoded in the script make it much less 
generically useful.

Plus, it just seems to me that there ought to be a way to do this … to get a 
list of fileset names from mmlsquota and then programmatically determine their 
junction path without having root privileges.  GPFS has got to be storing that 
information somewhere, and I’m frankly quite surprised that no IBMer has 
responded with an answer to that.  But I also know that when IBM is silent, 
there’s typically a reason.

And yes, we could regularly create a static file … in fact, that’s what we do 
now once per day (in the early morning hours).  While this is not a huge deal - 
we only create / delete filesets a handful of times per month - on the day we 
do the script won’t function properly unless we manually update the file.  I’m 
wanting to eliminate that, if possible … which as I stated in the preceding 
paragraph, I have a hard time believing is not possible.

I did look at the list of callbacks again (good thought!) and there’s not one 
specifically related to the creation / deletion of a fileset.  There was only 
one that I saw that I think could even possibly be of use … ccrFileChange.  Can 
anyone on the list confirm or deny that the creation / deletion of a fileset 
would cause that callback to be triggered??  If it is triggered, then we could 
use that to update the static filesets within a minute or two of the change 
being made, which would definitely be acceptable.  I realize that many things 
likely trigger a ccrFileChange, so I’m thinking of having a callback script 
that checks the current list of filesets against the static file and updates 
that appropriately.

Thanks again for the responses…

Kevin

> On Jan 13, 2019, at 10:09 PM, Scott Goldman  wrote:
> 
> Kevin,
> Something I've done in the past is to create a service that once an 
> hour/day/week that would build a static file that consists of the needed 
> output.
> 
> As long as you can take the update delay (or perhaps trigger the update with 
> a callback), this should work and could actually be lighter on the system.
> 
> Sent from my BlackBerry - the most secure mobile device
> 
>   Original Message  
> From: valdis.kletni...@vt.edu
> Sent: January 12, 2019 4:07 PM
> To: gpfsug-discuss@spectrumscale.org
> Reply-to: gpfsug-discuss@spectrumscale.org
> Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running  
> mmlsfileset?
> 
> On Sat, 12 Jan 2019 03:07:29 +, "Buterbaugh, Kevin L" said:
>> But from there I need to then be able to find out where that fileset is
>> mounted in the directory tree so that I can see who the owner and group of 
>> that
>> directory are.
> 
> You're not able to leverage a local naming scheme? There's no connection 
> between
> the name of the fileset and where it is in the tree?  I would hope there is, 
> because
> otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user 
> will
> now be confused over what director(y/ies) need to be cleaned up.  If your tool
> says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at
> /gpfs/foo/bar/baz then it's actionable.
> 
> And if the user knows what the mapping is, your script can know it too
> 

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

2019-01-12 Thread Buterbaugh, Kevin L
Hi All,

I appreciate the time several of you have taken to respond to my inquiry.  
However, unless I’m missing something - and my apologies if I am - none so far 
appear to allow me to obtain the list of junction paths as a non-root user.  
Yes, mmlsquota shows all the filesets.  But from there I need to then be able 
to find out where that fileset is mounted in the directory tree so that I can 
see who the owner and group of that directory are.  Only if the user running 
the script is either the owner or a member of the group do I want to display 
the fileset quota for that fileset to the user.

Thanks again…

Kevin

On Jan 11, 2019, at 10:24 AM, Jeffrey R. Lang 
mailto:jrl...@uwyo.edu>> wrote:

What we do is the use “mmlsquota -Y ” which will list out all the 
filesets in an easily parseable format.   And the command can be run by the 
user.


From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 On Behalf Of Peter Childs
Sent: Friday, January 11, 2019 6:50 AM
To: gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>
Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running 
mmlsfileset?

◆ This message was sent from a non-UWYO address. Please exercise caution when 
clicking links or opening attachments from external sources.

We have a similar issue, I'm wondering if getting mmlsfileset to work as a user 
is a reasonable "request for enhancement" I suspect it would need better 
wording.


We too have a rather complex script to report on quota's that I suspect does a 
similar job. It works by having all the filesets mounted in known locations and 
names matching mount point names. It then works out which ones are needed by 
looking at the group ownership, Its very slow and a little cumbersome. Not 
least because it was written ages ago in a mix of bash, sed, awk and find.








On Tue, 2019-01-08 at 22:12 +, Buterbaugh, Kevin L wrote:
Hi All,

Happy New Year to all!  Personally, I’ll gladly and gratefully settle for 2019 
not being a dumpster fire like 2018 was (those who attended my talk at the user 
group meeting at SC18 know what I’m referring to), but I certainly wish all of 
you the best!

Is there a way to get a list of the filesets in a filesystem without running 
mmlsfileset?  I was kind of expecting to find them in one of the config files 
somewhere under /var/mmfs but haven’t found them yet in the searching I’ve done.

The reason I’m asking is that we have a Python script that users can run that 
needs to get a list of all the filesets in a filesystem.  There are obviously 
multiple issues with that, so the workaround we’re using for now is to have a 
cron job which runs mmlsfileset once a day and dumps it out to a text file, 
which the script then reads.  That’s sub-optimal for any day on which a fileset 
gets created or deleted, so I’m looking for a better way … one which doesn’t 
require root privileges and preferably doesn’t involve running a GPFS command 
at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on 
that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks 
to the capacity pool and things are working fine.

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633



___

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>

http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cee10c1e22a474fedceb408d678318231%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636828551398045995=m0nd3Ln0qalNEUCGQmj%2B2ZEQXYCTPzKiYJcSmFXkGZQ%3D=0>



--

Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cee10c1e22a474fedceb408d678318231%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636828551398056004sdata=F56RKhMef0zYjAj2dKFu3bAuq7xQvFoulYhwDnfN1Ms%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Get list of filesets _without_runningmmlsfileset?

2019-01-10 Thread Buterbaugh, Kevin L
Hi Andrew / All,

Well, it does _sound_ useful, but in its current state it’s really not for 
several reasons, mainly having to do with it being coded in a moderately 
site-specific way.  It needs an overhaul anyway, so I’m going to look at 
getting rid of as much of that as possible (there’s some definite low-hanging 
fruit there) and, for the site-specific things that can’t be gotten rid of, 
maybe consolidating them into one place in the code so that the script could be 
more generally useful if you just change those values.

If I can accomplish those things, then yes, we’d be glad to share the script.

But I’ve also realized that I didn’t _entirely_ answer my original question.  
Yes, mmlsquota will show me all the filesets … but I also need to know the 
junction path for each of those filesets.  One of the main reasons we wrote 
this script in the first place is that if you run mmlsquota you see that you 
have no limits on about 60 filesets (currently we use fileset quotas only on 
our filesets) … and that’s because there are no user (or group) quotas in those 
filesets.  The script, however, reads in that text file that is created nightly 
by root that is nothing more than the output of “mmlsfileset ”, gets the junction path, looks up the GID of the junction path, and sees 
if you’re a member of that group.  If you’re not, well, no sense in showing you 
anything about that fileset.  But, of course, if you are a member of that 
group, then we do want to show you the fileset quota for that fileset.

So … my question now is, “Is there a way for a non-root user” to get the 
junction path for the fileset(s)?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Jan 9, 2019, at 7:13 PM, Andrew Beattie 
mailto:abeat...@au1.ibm.com>> wrote:

Kevin,

That sounds like a useful script
would you care to share?

Thanks
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeat...@au1.ibm.com<mailto:abeat...@au1.ibm.com>


- Original message -
From: "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Cc:
Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running 
mmlsfileset?
Date: Thu, Jan 10, 2019 9:22 AM

Hi All,

Let me answer Skylar’s questions in another e-mail, which may also tell whether 
the rest API is a possibility or not.

The Python script in question is to display quota information for a user.  The 
mmlsquota command has a couple of issues:  1) its output is confusing to some 
of our users, 2) more significantly, it displays a ton of information that 
doesn’t apply to the user running it.  For example, it will display all the 
filesets in a filesystem whether or not the user has access to them.  So the 
Python script figures out what group(s) the user is a member of and only 
displays information pertinent to them (i.e. the group of the fileset junction 
path is a group this user is a member of) … and in a simplified (and 
potentially colorized) output format.

And typing that preceding paragraph caused the lightbulb to go off … I know the 
answer to my own question … have the script run mmlsquota and get the full list 
of filesets from that, then parse that to determine which ones I actually need 
to display quota information for.  Thanks!

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Jan 9, 2019, at 4:42 PM, Simon Thompson 
mailto:s.j.thomp...@bham.ac.uk>> wrote:

Hi Kevin,

Have you looked at the rest API?

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htmdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3Dreserved=0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cc1ffac821c5f4524104908d67698e948%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636826796466979672=0%2FsV4sgFJmyJjaETdNtmsRP21pm5zFSYdpr9eNtDSs4%3D=0>

I don't know how much access control there is available in the API so not sure 
if you could lock some sort of service user down to just the get filesets 
comma

Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

2019-01-09 Thread Buterbaugh, Kevin L
Hi All,

Let me answer Skylar’s questions in another e-mail, which may also tell whether 
the rest API is a possibility or not.

The Python script in question is to display quota information for a user.  The 
mmlsquota command has a couple of issues:  1) its output is confusing to some 
of our users, 2) more significantly, it displays a ton of information that 
doesn’t apply to the user running it.  For example, it will display all the 
filesets in a filesystem whether or not the user has access to them.  So the 
Python script figures out what group(s) the user is a member of and only 
displays information pertinent to them (i.e. the group of the fileset junction 
path is a group this user is a member of) … and in a simplified (and 
potentially colorized) output format.

And typing that preceding paragraph caused the lightbulb to go off … I know the 
answer to my own question … have the script run mmlsquota and get the full list 
of filesets from that, then parse that to determine which ones I actually need 
to display quota information for.  Thanks!

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Jan 9, 2019, at 4:42 PM, Simon Thompson 
mailto:s.j.thomp...@bham.ac.uk>> wrote:

Hi Kevin,

Have you looked at the rest API?

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htmdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3Dreserved=0

I don't know how much access control there is available in the API so not sure 
if you could lock some sort of service user down to just the get filesets 
command?

Simon
___
From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Buterbaugh, Kevin L 
[kevin.buterba...@vanderbilt.edu]
Sent: 08 January 2019 22:12
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

Hi All,

Happy New Year to all!  Personally, I’ll gladly and gratefully settle for 2019 
not being a dumpster fire like 2018 was (those who attended my talk at the user 
group meeting at SC18 know what I’m referring to), but I certainly wish all of 
you the best!

Is there a way to get a list of the filesets in a filesystem without running 
mmlsfileset?  I was kind of expecting to find them in one of the config files 
somewhere under /var/mmfs but haven’t found them yet in the searching I’ve done.

The reason I’m asking is that we have a Python script that users can run that 
needs to get a list of all the filesets in a filesystem.  There are obviously 
multiple issues with that, so the workaround we’re using for now is to have a 
cron job which runs mmlsfileset once a day and dumps it out to a text file, 
which the script then reads.  That’s sub-optimal for any day on which a fileset 
gets created or deleted, so I’m looking for a better way … one which doesn’t 
require root privileges and preferably doesn’t involve running a GPFS command 
at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on 
that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks 
to the capacity pool and things are working fine.

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885sdata=WSijRrjhOgQyuWsh9K8ckpjf%2F2CkXfZW1n%2BJw5Gw5tw%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

2019-01-09 Thread Buterbaugh, Kevin L
Hi All,

Happy New Year to all!  Personally, I’ll gladly and gratefully settle for 2019 
not being a dumpster fire like 2018 was (those who attended my talk at the user 
group meeting at SC18 know what I’m referring to), but I certainly wish all of 
you the best!

Is there a way to get a list of the filesets in a filesystem without running 
mmlsfileset?  I was kind of expecting to find them in one of the config files 
somewhere under /var/mmfs but haven’t found them yet in the searching I’ve done.

The reason I’m asking is that we have a Python script that users can run that 
needs to get a list of all the filesets in a filesystem.  There are obviously 
multiple issues with that, so the workaround we’re using for now is to have a 
cron job which runs mmlsfileset once a day and dumps it out to a text file, 
which the script then reads.  That’s sub-optimal for any day on which a fileset 
gets created or deleted, so I’m looking for a better way … one which doesn’t 
require root privileges and preferably doesn’t involve running a GPFS command 
at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on 
that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks 
to the capacity pool and things are working fine.

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Anybody running GPFS over iSCSI?

2018-12-15 Thread Buterbaugh, Kevin L
Hi All,

Googling “GPFS and iSCSI” doesn’t produce a ton of hits!  But we are interested 
to know if anyone is actually using GPFS over iSCSI?

The reason why I’m asking is that we currently use an 8 Gb FC SAN … QLogic 
SANbox 5800’s, QLogic HBA’s in our NSD servers … but we’re seeing signs that, 
especially when we start using beefier storage arrays with more disks behind 
the controllers, the 8 Gb FC could be a bottleneck.

As many / most of you are already aware, I’m sure, while 16 Gb FC exists, 
there’s basically only one vendor in that game.  And guess what happens to 
prices when there’s only one vendor???  We bought our 8 Gb FC switches for 
approximately $5K apiece.  List price on a  16 Gb FC switch - $40K. 
 Ouch.

So the idea of being able to use commodity 10 or 40 Gb Ethernet switches and 
HBA’s is very appealing … both from a cost and a performance perspective (last 
I checked 40 Gb was more than twice 16 Gb!).  Anybody doing this already?

As those of you who’ve been on this list for a while and don’t filter out 
e-mails from me () already know, we have a much beefier Infortrend 
storage array we’ve purchased that I’m currently using to test various metadata 
configurations (and I will report back results on that when done, I promise).  
That array also supports iSCSI, so I actually have our test cluster GPFS 
filesystem up and running over iSCSI.  It was surprisingly easy to set up.  But 
any tips, suggestions, warnings, etc. about running GPFS over iSCSI are 
appreciated!

Two things that I am already aware of are:  1) use jumbo frames, and 2) run 
iSCSI over it’s own private network.  Other things I should be aware of?!?

Thanks all…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Best way to migrate data

2018-10-18 Thread Buterbaugh, Kevin L
Hi Dwayne,

I’m assuming you can’t just let an rsync run, possibly throttled in some way?  
If not, and if you’re just tapping out your network, then would it be possible 
to go old school?  We have parts of the Medical Center here where their network 
connections are … um, less than robust.  So they tar stuff up to a portable HD, 
sneaker net it to us, and we untar is from an NSD server.

HTH, and I really hope that someone has a better idea than that!

Kevin

> On Oct 18, 2018, at 12:19 PM, dwayne.h...@med.mun.ca wrote:
> 
> Hi,
> 
> Just wondering what the best recipe for migrating a user’s home directory 
> content from one GFPS file system to another which hosts a larger research 
> GPFS file system? I’m currently using rsync and it has maxed out the client 
> system’s IB interface.
> 
> Best,
> Dwayne 
> —
> Dwayne Hart | Systems Administrator IV
> 
> CHIA, Faculty of Medicine 
> Memorial University of Newfoundland 
> 300 Prince Philip Drive
> St. John’s, Newfoundland | A1B 3V6
> Craig L Dobbin Building | 4M409
> T 709 864 6631
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ccca728d2d61f4be06bcd08d6351f3650%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754805507359478sdata=2YAiqgqKl4CerlyCn3vJ9v9u%2FrGzbfa7aKxJ0PYV%2Fhc%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Job vacancy @Birmingham

2018-10-18 Thread Buterbaugh, Kevin L
Hi Nathan,

Well, while I’m truly sorry for what you’re going thru, at least a majority of 
the voters in the UK did vote for it.  Keep in mind that things could be worse.

Some of us do happen to live in a country where a far worse thing has happened 
despite the fact that the majority of the voters were _against_ it…. ;-)

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Oct 18, 2018, at 4:23 AM, Nathan Harper 
mailto:nathan.har...@cfms.org.uk>> wrote:

Olaf - we don't need any reminders of Bre.. this 
morning

On Thu, 18 Oct 2018 at 10:15, Olaf Weiser 
mailto:olaf.wei...@de.ibm.com>> wrote:
Hi  Simon ..
well - I would love to .. .but .. ;-) hey - what do you think, how long a 
citizen from the EU can live (and work) in UK ;-)
don't take me too serious... see you soon, consider you invited for a coffee 
for my rude comment .. ;-)
olaf




From:Simon Thompson 
mailto:s.j.thomp...@bham.ac.uk>>
To:
"gpfsug-discuss@spectrumscale.org" 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:10/17/2018 11:02 PM
Subject:[gpfsug-discuss] Job vacancy @Birmingham
Sent by:
gpfsug-discuss-boun...@spectrumscale.org




We're looking for someone to join our systems team here at University of 
Birmingham. In case you didn't realise, we're pretty reliant on Spectrum Scale 
to deliver our storage systems.

https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3=1763739_template=767=5032521=fair_id=0_code=15445_code=6876_code=117https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3=1763739_template=767=5032521=fair_id=0_code=15445_code=6876_code=117

Such a snappy URL :-)

Feel free to email me *OFFLIST* if you have informal enquiries!

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org

Re: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously?

2018-10-15 Thread Buterbaugh, Kevin L
Marc,

Ugh - sorry, completely overlooked that…

Kevin

On Oct 15, 2018, at 1:44 PM, Marc A Kaplan 
mailto:makap...@us.ibm.com>> wrote:

How about using the -F option?

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb6d9700cd6ff4bbed85808d632ce4ff2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636752259026486137sdata=mBfANLkK8v2ZEahGumE4a7iVIAcVJXb1Dv2kgSrynrI%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] mmfileid on 2 NSDs simultaneously?

2018-10-15 Thread Buterbaugh, Kevin L
Hi All,

Is there a way to run mmfileid on two NSD’s simultaneously?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Long I/O's on client but not on NSD server(s)

2018-10-04 Thread Buterbaugh, Kevin L
Hi All,

What does it mean if I have a few dozen very long I/O’s (50 - 75 seconds) on a 
gateway as reported by “mmdiag —iohist” and they all reference two of my eight 
NSD servers…

… but then I go to those 2 NSD servers and I don’t see any long I/O’s at all?

In other words, if the problem (this time) were the backend storage, I should 
see long I/O’s on the NSD servers, right?

I’m thinking this indicates that there is some sort of problem with either the 
client gateway itself or the network in between the gateway and the NSD 
server(s) … thoughts???

Thanks in advance…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] What is this error message telling me?

2018-09-27 Thread Buterbaugh, Kevin L
Hi Aaron,

No … just plain old ethernet.  Thanks!

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Sep 27, 2018, at 11:03 AM, Aaron Knister 
mailto:aaron.s.knis...@nasa.gov>> wrote:

Kevin,

Is the communication in this case by chance using IPoIB in connected mode?

-Aaron

On 9/27/18 11:04 AM, Buterbaugh, Kevin L wrote:
Hi All,
2018-09-27_09:48:50.923-0500: [E] The TCP connection to IP address 1.2.3.4 some 
client  (socket 442) state is unexpected: ca_state=1 unacked=3 
rto=27008000
Seeing errors like the above and trying to track down the root cause.  I know 
that at last weeks’ GPFS User Group meeting at ORNL this very error message was 
discussed, but I don’t recall the details and the slides haven’t been posted to 
the website yet.  IIRC, the “rto” is significant …
I’ve Googled, but haven’t gotten any hits, nor have I found anything in the 
GPFS 4.2.2 Problem Determination Guide.
Thanks in advance…
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> 
<mailto:kevin.buterba...@vanderbilt.edu> - (615)875-9633
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C639e397dfb514469f48d08d62492c8a2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636736610191929732sdata=GE1IIRL77bjWiFaa2%2FpV68sPtXJNUrtGPrc68GsOrtg%3Dreserved=0

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C639e397dfb514469f48d08d62492c8a2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636736610191929732sdata=GE1IIRL77bjWiFaa2%2FpV68sPtXJNUrtGPrc68GsOrtg%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] RAID type for system pool

2018-09-11 Thread Buterbaugh, Kevin L
Hi Marc,

Understood … I’m just trying to understand why some I/O’s are flagged as 
metadata, while others are flagged as inode?!?  Since this filesystem uses 512 
byte inodes, there is no data content from any files involved (for a metadata 
only disk), correct?  Thanks…

Kevin

On Sep 11, 2018, at 9:12 AM, Marc A Kaplan 
mailto:makap...@us.ibm.com>> wrote:

Metadata is anything besides the data contents of your files.
Inodes, directories, indirect blocks, allocation maps, log data ...  are the 
biggies.

Apparently, --iohist may sometimes distinguish some metadata as "inode", 
"logData", ...  that doesn't mean those aren't metadata also.




From:    "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To:gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:09/10/2018 03:12 PM
Subject:[gpfsug-discuss]  RAID type for system pool
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





From: 
gpfsug-discuss-ow...@spectrumscale.org<mailto:gpfsug-discuss-ow...@spectrumscale.org>
Subject: Re: [gpfsug-discuss] RAID type for system pool
Date: September 10, 2018 at 11:35:05 AM CDT
To: k...@accre.vanderbilt.edu<mailto:k...@accre.vanderbilt.edu>

Hi All,

So while I’m waiting for the purchase of new hardware to go thru, I’m trying to 
gather more data about the current workload.  One of the things I’m trying to 
do is get a handle on the ratio of reads versus writes for my metadata.

I’m using “mmdiag —iohist” … in this case “dm-12” is one of my metadataOnly 
disks and I’m running this on the primary NSD server for that NSD.  I’m seeing 
output like:

11:22:13.931117  W   inode4:29984416310.448  srv   dm-12
 
11:22:13.932344  Rmetadata4:36659676 40.307  srv   dm-12
 
11:22:13.932005  W logData4:49676176 10.726  srv   dm-12
 

And I’m confused as to the difference between “inode” and “metadata” (I at 
least _think_ I understand “logData”)?!?  The man page for mmdiag doesn’t help 
and I’ve not found anything useful yet in my Googling.

This is on a filesystem that currently uses 512 byte inodes, if that matters.  
Thanks…

Kevin
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2dbbb1fe9f5a4b80aa6b08d617f0a664%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636722719686349116=QfIeBZZ5%2Brpqqw1XL0cnIGqzswhtX3mYfJuTKSGpDEM%3D=0>



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2dbbb1fe9f5a4b80aa6b08d617f0a664%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636722719686369131sdata=uOPawxUhx4Wvxja5%2FLvJJMpAHj3uRb0Q1eiogmRXGgw%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] RAID type for system pool

2018-09-10 Thread Buterbaugh, Kevin L

From: 
gpfsug-discuss-ow...@spectrumscale.org
Subject: Re: [gpfsug-discuss] RAID type for system pool
Date: September 10, 2018 at 11:35:05 AM CDT
To: k...@accre.vanderbilt.edu

Hi All,

So while I’m waiting for the purchase of new hardware to go thru, I’m trying to 
gather more data about the current workload.  One of the things I’m trying to 
do is get a handle on the ratio of reads versus writes for my metadata.

I’m using “mmdiag —iohist” … in this case “dm-12” is one of my metadataOnly 
disks and I’m running this on the primary NSD server for that NSD.  I’m seeing 
output like:

11:22:13.931117  W   inode4:29984416310.448  srv   dm-12
 
11:22:13.932344  Rmetadata4:36659676 40.307  srv   dm-12
 
11:22:13.932005  W logData4:49676176 10.726  srv   dm-12
 

And I’m confused as to the difference between “inode” and “metadata” (I at 
least _think_ I understand “logData”)?!?  The man page for mmdiag doesn’t help 
and I’ve not found anything useful yet in my Googling.

This is on a filesystem that currently uses 512 byte inodes, if that matters.  
Thanks…

Kevin

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] RAID type for system pool

2018-09-10 Thread Buterbaugh, Kevin L
Hi All,

So while I’m waiting for the purchase of new hardware to go thru, I’m trying to 
gather more data about the current workload.  One of the things I’m trying to 
do is get a handle on the ratio of reads versus writes for my metadata.

I’m using “mmdiag —iohist” … in this case “dm-12” is one of my metadataOnly 
disks and I’m running this on the primary NSD server for that NSD.  I’m seeing 
output like:

11:22:13.931117  W   inode4:29984416310.448  srv   dm-12
 
11:22:13.932344  Rmetadata4:36659676 40.307  srv   dm-12
 
11:22:13.932005  W logData4:49676176 10.726  srv   dm-12
 

And I’m confused as to the difference between “inode” and “metadata” (I at 
least _think_ I understand “logData”)?!?  The man page for mmdiag doesn’t help 
and I’ve not found anything useful yet in my Googling.

This is on a filesystem that currently uses 512 byte inodes, if that matters.  
Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] RAID type for system pool

2018-09-06 Thread Buterbaugh, Kevin L
Hi All,

Wow - my query got more responses than I expected and my sincere thanks to all 
who took the time to respond!

At this point in time we do have two GPFS filesystems … one which is basically 
“/home” and some software installations and the other which is “/scratch” and 
“/data” (former backed up, latter not).  Both of them have their metadata on 
SSDs set up as RAID 1 mirrors and replication set to two.  But at this point in 
time all of the SSDs are in a single storage array (albeit with dual redundant 
controllers) … so the storage array itself is my only SPOF.

As part of the hardware purchase we are in the process of making we will be 
buying a 2nd storage array that can house 2.5” SSDs.  Therefore, we will be 
splitting our SSDs between chassis and eliminating that last SPOF.  Of course, 
this includes the new SSDs we are getting for our new /home filesystem.

Our plan right now is to buy 10 SSDs, which will allow us to test 3 
configurations:

1) two 4+1P RAID 5 LUNs split up into a total of 8 LV’s (with each of my 8 NSD 
servers as primary for one of those LV’s and the other 7 as backups) and GPFS 
metadata replication set to 2.

2) four RAID 1 mirrors (which obviously leaves 2 SSDs unused) and GPFS metadata 
replication set to 2.  This would mean that only 4 of my 8 NSD servers would be 
a primary.

3) nine RAID 0 / bare drives with GPFS metadata replication set to 3 (which 
leaves 1 SSD unused).  All 8 NSD servers primary for one SSD and 1 serving up 
two.

The responses I received concerning RAID 5 and performance were not a surprise 
to me.  The main advantage that option gives is the most usable storage space 
for the money (in fact, it gives us way more storage space than we currently 
need) … but if it tanks performance, then that’s a deal breaker.

Personally, I like the four RAID 1 mirrors config like we’ve been using for 
years, but it has the disadvantage of giving us the least usable storage space 
… that config would give us the minimum we need for right now, but doesn’t 
really allow for much future growth.

I have no experience with metadata replication of 3 (but had actually thought 
of that option, so feel good that others suggested it) so option 3 will be a 
brand new experience for us.  It is the most optimal in terms of meeting 
current needs plus allowing for future growth without giving us way more space 
than we are likely to need).  I will be curious to see how long it takes GPFS 
to re-replicate the data when we simulate a drive failure as opposed to how 
long a RAID rebuild takes.

I am a big believer in Murphy’s Law (Sunday I paid off a bill, Wednesday my 
refrigerator died!) … and also believe that the definition of a pessimist is 
“someone with experience”  … so we will definitely not set GPFS metadata 
replication to less than two, nor will we use non-Enterprise class SSDs for 
metadata … but I do still appreciate the suggestions.

If there is interest, I will report back on our findings.  If anyone has any 
additional thoughts or suggestions, I’d also appreciate hearing them.  Again, 
thank you!

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] RAID type for system pool

2018-09-05 Thread Buterbaugh, Kevin L
Hi All,

We are in the process of finalizing the purchase of some new storage arrays (so 
no sales people who might be monitoring this list need contact me) to 
life-cycle some older hardware.  One of the things we are considering is the 
purchase of some new SSD’s for our “/home” filesystem and I have a question or 
two related to that.

Currently, the existing home filesystem has it’s metadata on SSD’s … two RAID 1 
mirrors and metadata replication set to two.  However, the filesystem itself is 
old enough that it uses 512 byte inodes.  We have analyzed our users files and 
know that if we create a new filesystem with 4K inodes that a very significant 
portion of the files would now have their _data_ stored in the inode as well 
due to the files being 3.5K or smaller (currently all data is on spinning HD 
RAID 1 mirrors).

Of course, if we increase the size of the inodes by a factor of 8 then we also 
need 8 times as much space to store those inodes.  Given that Enterprise class 
SSDs are still very expensive and our budget is not unlimited, we’re trying to 
get the best bang for the buck.

We have always - even back in the day when our metadata was on spinning disk 
and not SSD - used RAID 1 mirrors and metadata replication of two.  However, we 
are wondering if it might be possible to switch to RAID 5?  Specifically, what 
we are considering doing is buying 8 new SSDs and creating two 3+1P RAID 5 LUNs 
(metadata replication would stay at two).  That would give us 50% more usable 
space than if we configured those same 8 drives as four RAID 1 mirrors.

Unfortunately, unless I’m misunderstanding something, mean that the RAID stripe 
size and the GPFS block size could not match.  Therefore, even though we don’t 
need the space, would we be much better off to buy 10 SSDs and create two 4+1P 
RAID 5 LUNs?

I’ve searched the mailing list archives and scanned the DeveloperWorks wiki and 
even glanced at the GPFS documentation and haven’t found anything that says 
“bad idea, Kevin”… ;-)

Expanding on this further … if we just present those two RAID 5 LUNs to GPFS as 
NSDs then we can only have two NSD servers as primary for them.  So another 
thing we’re considering is to take those RAID 5 LUNs and further sub-divide 
them into a total of 8 logical volumes, each of which could be a GPFS NSD and 
therefore would allow us to have each of our 8 NSD servers be primary for one 
of them.  Even worse idea?!?  Good idea?

Anybody have any better ideas???  ;-)

Oh, and currently we’re on GPFS 4.2.3-10, but are also planning on moving to 
GPFS 5.0.1-x before creating the new filesystem.

Thanks much…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue

2018-08-07 Thread Buterbaugh, Kevin L
Hi All,

I was able to navigate down thru IBM’s website and find the GPFS 5.0.1 manuals 
but they contain the same typo, which Pete has correctly identified … and I 
have confirmed that his solution works.

Thanks...

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 7, 2018, at 6:35 AM, Chase, Peter 
mailto:peter.ch...@metoffice.gov.uk>> wrote:

Hi Kevin,

I'm running policy migrations on Spectrum Scale 4.2.3, but I use mmapplypolicy 
to kick off the policy runs, not mmstartpolicy. Docs here (which I admit are 
not for your version of Spectrum Scale) state that mmstartpolicy is for 
internal GPFS use only: 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fwikis%2Fhome%3Flang%3Den%23!%2Fwiki%2FGeneral%2BParallel%2BFile%2BSystem%2B(GPFS)%2Fpage%2FUsing%2BPoliciesdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912985631sdata=4PmYIvmKenhqtLRVhusaQpWHAjGcd6YFMkb5nMa%2Bwuw%3Dreserved=0

So if the above link is correct, I'd recommend switching to using 
mmapplypolicy, which handily comes with a man page, whereas mmstartpolicy 
doesn't and might have you fumbling around in the dark.

As for the issue you're experiencing with adding a callback, it looks like the 
mmaddcallback command is catching the --single-instance flag as an argument for 
it, not as a parameter for the mmstartpolicy command. After looking at the 
documentation you've referenced, I suspect that there's a typo/omission in the 
command and it should have a trailing double quote (") on the end of the parms 
argument list, i.e.:

mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event 
lowDiskSpace --parms "%eventName %fsName --single-instance"

I'm not sure how we go about asking IBM to correct their documentation, but 
expect someone in the user group will have some idea.

Regards,

Pete Chase
peter.ch...@metoffice.gov.uk<mailto:peter.ch...@metoffice.gov.uk>


-Original Message-
From: gpfsug-discuss-boun...@spectrumscale.org 
 On Behalf Of 
gpfsug-discuss-requ...@spectrumscale.org
Sent: 06 August 2018 23:47
To: gpfsug-discuss@spectrumscale.org
Subject: gpfsug-discuss Digest, Vol 79, Issue 21

Send gpfsug-discuss mailing list submissions to
gpfsug-discuss@spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3Dreserved=0
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-requ...@spectrumscale.org

You can reach the person managing the list at
gpfsug-discuss-ow...@spectrumscale.org

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of gpfsug-discuss digest..."


Today's Topics:

  1. mmaddcallback documentation issue (Buterbaugh, Kevin L)
  2. Re: mmaddcallback documentation issue (Eric Sperley)


--

Message: 1
Date: Mon, 6 Aug 2018 21:42:54 +
From: "Buterbaugh, Kevin L" 
To: gpfsug main discussion list 
Subject: [gpfsug-discuss] mmaddcallback documentation issue
Message-ID: <735f4275-191a-4363-b98c-1ea289292...@vanderbilt.edu>
Content-Type: text/plain; charset="utf-8"

Hi All,

So I?m _still_ reading about and testing various policies for file placement 
and migration on our test cluster (which is now running GPFS 5).

On page 392 of the GPFS 5.0.0 Administration Guide it says:


To add a callback, run this command. The following command is on one line:

mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event 
lowDiskSpace --parms "%eventName %fsName --single-instance


The --single-instance flag is required to avoid running multiple migrations on 
the file system at the same time.

However, trying to issue that command gives:

mmaddcallback: Incorrect option: --single-instance

And the man page for mmaddcallback doesn?t mention it or anything similar to 
it.  Now my test cluster is running GPFS 5.0.1.1, so is this something that was 
added in GPFS 5.0.0 and then subsequently removed?

I can?t find the GPFS 5.0.1 Administration Guide with a Google search.

Thanks?

Kevin

?
Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced 
Computing Center for Research and Education 
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633



-- next part --
An HTML attachment w

[gpfsug-discuss] mmaddcallback documentation issue

2018-08-06 Thread Buterbaugh, Kevin L
Hi All,

So I’m _still_ reading about and testing various policies for file placement 
and migration on our test cluster (which is now running GPFS 5).

On page 392 of the GPFS 5.0.0 Administration Guide it says:


To add a callback, run this command. The following command is on one line:

mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event 
lowDiskSpace
--parms "%eventName %fsName --single-instance


The --single-instance flag is required to avoid running multiple migrations on 
the file system at the same time.

However, trying to issue that command gives:

mmaddcallback: Incorrect option: --single-instance

And the man page for mmaddcallback doesn’t mention it or anything similar to 
it.  Now my test cluster is running GPFS 5.0.1.1, so is this something that was 
added in GPFS 5.0.0 and then subsequently removed?

I can’t find the GPFS 5.0.1 Administration Guide with a Google search.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem?

2018-08-06 Thread Buterbaugh, Kevin L
Hi All,

So I was just reading the GPFS 5.0.0 Administration Guide (yes, I actually do 
look at the documentation even if it seems sometimes that I don’t!) for some 
other information and happened to come across this at the bottom of page 358:


The --metadata-block-size flag on the mmcrfs command can be used to create a 
system pool with a different block size from the user pools. This can be 
especially beneficial if the default block size is larger than 1 MB. If data 
and metadata block sizes differ, the system pool must contain only metadataOnly 
disks.

Given that one of the responses I received during this e-mail thread was from 
an IBM engineer basically pointing out that there is no benefit in setting the 
metadata-block-size to less than 4 MB if that’s what I want for the filesystem 
block size, this might be a candidate for a documentation update.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem?

2018-08-03 Thread Buterbaugh, Kevin L
Hi All,

Aargh - now I really do feel like an idiot!  I had set up the stanza file over 
a week ago … then had to work on production issues … and completely forgot 
about setting the block size in the pool stanzas there.  But at least we all 
now know that stanza files override command line arguments to mmcrfs.

My apologies…

Kevin

On Aug 3, 2018, at 1:01 AM, Olaf Weiser 
mailto:olaf.wei...@de.ibm.com>> wrote:

Can u share your stanza file ?

Von meinem iPhone gesendet

Am 02.08.2018 um 23:15 schrieb Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>>:

OK, so hold on … NOW what’s going on???  I deleted the filesystem … went to 
lunch … came back an hour later … recreated the filesystem with a metadata 
block size of 4 MB … and I STILL have a 1 MB block size in the system pool and 
the wrong fragment size in other pools…

Kevin

/root/gpfs
root@testnsd1# mmdelfs gpfs5
All data on the following disks of gpfs5 will be destroyed:
test21A3nsd
test21A4nsd
test21B3nsd
test21B4nsd
test23Ansd
test23Bnsd
test23Cnsd
test24Ansd
test24Bnsd
test24Cnsd
test25Ansd
test25Bnsd
test25Cnsd
Completed deletion of file system /dev/gpfs5.
mmdelfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
/root/gpfs
root@testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 
-j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v 
yes --nofilesetdf --metadata-block-size 4M

The following disks of gpfs5 will be formatted on node testnsd3:
test21A3nsd: size 953609 MB
test21A4nsd: size 953609 MB
test21B3nsd: size 953609 MB
test21B4nsd: size 953609 MB
test23Ansd: size 15259744 MB
test23Bnsd: size 15259744 MB
test23Cnsd: size 1907468 MB
test24Ansd: size 15259744 MB
test24Bnsd: size 15259744 MB
test24Cnsd: size 1907468 MB
test25Ansd: size 15259744 MB
test25Bnsd: size 15259744 MB
test25Cnsd: size 1907468 MB
Formatting file system ...
Disks up to size 8.29 TB can be added to storage pool system.
Disks up to size 16.60 TB can be added to storage pool raid1.
Disks up to size 132.62 TB can be added to storage pool raid6.
Creating Inode File
  12 % complete on Thu Aug  2 13:16:26 2018
  25 % complete on Thu Aug  2 13:16:31 2018
  38 % complete on Thu Aug  2 13:16:36 2018
  50 % complete on Thu Aug  2 13:16:41 2018
  62 % complete on Thu Aug  2 13:16:46 2018
  74 % complete on Thu Aug  2 13:16:52 2018
  85 % complete on Thu Aug  2 13:16:57 2018
  96 % complete on Thu Aug  2 13:17:02 2018
 100 % complete on Thu Aug  2 13:17:03 2018
Creating Allocation Maps
Creating Log Files
   3 % complete on Thu Aug  2 13:17:09 2018
  28 % complete on Thu Aug  2 13:17:15 2018
  53 % complete on Thu Aug  2 13:17:20 2018
  78 % complete on Thu Aug  2 13:17:26 2018
 100 % complete on Thu Aug  2 13:17:27 2018
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool system
  98 % complete on Thu Aug  2 13:17:34 2018
 100 % complete on Thu Aug  2 13:17:34 2018
Formatting Allocation Map for storage pool raid1
  52 % complete on Thu Aug  2 13:17:39 2018
 100 % complete on Thu Aug  2 13:17:43 2018
Formatting Allocation Map for storage pool raid6
  24 % complete on Thu Aug  2 13:17:48 2018
  50 % complete on Thu Aug  2 13:17:53 2018
  74 % complete on Thu Aug  2 13:17:58 2018
  99 % complete on Thu Aug  2 13:18:03 2018
 100 % complete on Thu Aug  2 13:18:03 2018
Completed creation of file system /dev/gpfs5.
mmcrfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
/root/gpfs
root@testnsd1# mmlsfs gpfs5
flagvaluedescription
---  ---
 -f 8192 Minimum fragment (subblock) size 
in bytes (system pool)
32768Minimum fragment (subblock) size 
in bytes (other pools)
 -i 4096 Inode size in bytes
 -I 32768Indirect block size in bytes
 -m 2Default number of metadata replicas
 -M 3Maximum number of metadata replicas
 -r 1Default number of data replicas
 -R 3Maximum number of data replicas
 -j scatter  Block allocation type
 -D nfs4 File locking semantics in effect
 -k all  ACL semantics in effect
 -n 32   Estimated number of nodes that 
will mount file system
 -B 1048576  Block size (system pool)
4194304  Block size (other pools)
 -Q user;g

Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem?

2018-08-02 Thread Buterbaugh, Kevin L
 Yes  Exact mtime mount option
 -S relatime Suppress atime mount option
 -K whenpossible Strict replica allocation option
 --fastea   Yes  Fast external attributes enabled?
 --encryption   No   Encryption enabled?
 --inode-limit  101095424Maximum number of inodes
 --log-replicas 0Number of log replicas
 --is4KAligned  Yes  is4KAligned?
 --rapid-repair Yes  rapidRepair enabled?
 --write-cache-threshold 0   HAWC Threshold (max 65536)
 --subblocks-per-full-block 128  Number of subblocks per full block
 -P system;raid1;raid6   Disk storage pools in file system
 --file-audit-log   No   File Audit Logging enabled?
 --maintenance-mode No   Maintenance Mode enabled?
 -d 
test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd
  Disks in file system
 -A yes  Automatic mount option
 -o none Additional mount options
 -T /gpfs5   Default mount point
 --mount-priority   0Mount priority
/root/gpfs
root@testnsd1#


—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi All,

Thanks for all the responses on this, although I have the sneaking suspicion 
that the most significant thing that is going to come out of this thread is the 
knowledge that Sven has left IBM for DDN.  ;-) or :-( or :-O depending on your 
perspective.

Anyway … we have done some testing which has shown that a 4 MB block size is 
best for those workloads that use “normal” sized files.  However, we - like 
many similar institutions - support a mixed workload, so the 128K fragment size 
that comes with that is not optimal for the primarily biomedical type 
applications that literally create millions of very small files.  That’s why we 
settled on 1 MB as a compromise.

So we’re very eager to now test with GPFS 5, a 4 MB block size, and a 8K 
fragment size.  I’m recreating my test cluster filesystem now with that config 
… so 4 MB block size on the metadata only system pool, too.

Thanks to all who took the time to respond to this thread.  I hope it’s been 
beneficial to others as well…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Aug 1, 2018, at 7:11 PM, Andrew Beattie 
mailto:abeat...@au1.ibm.com>> wrote:

I too would second the comment about doing testing specific to your environment

We recently deployed a number of ESS building blocks into a customer site that 
was specifically being used for a mixed HPC workload.

We spent more than a week playing with different block sizes for both data and 
metadata trying to identify which variation would provide the best mix of both 
metadata performance and data performance.  one thing we noticed very early on 
is that MDtest and IOR both respond very differently as you play with both 
block size and subblock size.  What works for one use case may be a very poor 
option for another use case.

Interestingly enough it turned out that the best overall option for our 
particular use case was an 8MB block size with 32k sub blocks -- as that gave 
us good Metadata performance and good sequential data performance

which is probably why 32k sub block was the default for so many years 
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeat...@au1.ibm.com<mailto:abeat...@au1.ibm.com>


- Original message -
From: "Marc A Kaplan" mailto:makap...@us.ibm.com>>
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Cc:
Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 
filesystem?
Date: Thu, Aug 2, 2018 10:01 AM

Firstly, I do suggest that you run some tests and see how much, if any, 
difference the settings that are available make in performance and/or storage 
utilization.

Secondly, as I and others have hinted at, deeper in the system, there may be 
additional parameters and settings.  Sometimes they are available via commands, 
and/or configuration settings, sometimes not.

Sometimes that's

Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem?

2018-08-02 Thread Buterbaugh, Kevin L
Hi All,

Thanks for all the responses on this, although I have the sneaking suspicion 
that the most significant thing that is going to come out of this thread is the 
knowledge that Sven has left IBM for DDN.  ;-) or :-( or :-O depending on your 
perspective.

Anyway … we have done some testing which has shown that a 4 MB block size is 
best for those workloads that use “normal” sized files.  However, we - like 
many similar institutions - support a mixed workload, so the 128K fragment size 
that comes with that is not optimal for the primarily biomedical type 
applications that literally create millions of very small files.  That’s why we 
settled on 1 MB as a compromise.

So we’re very eager to now test with GPFS 5, a 4 MB block size, and a 8K 
fragment size.  I’m recreating my test cluster filesystem now with that config 
… so 4 MB block size on the metadata only system pool, too.

Thanks to all who took the time to respond to this thread.  I hope it’s been 
beneficial to others as well…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Aug 1, 2018, at 7:11 PM, Andrew Beattie 
mailto:abeat...@au1.ibm.com>> wrote:

I too would second the comment about doing testing specific to your environment

We recently deployed a number of ESS building blocks into a customer site that 
was specifically being used for a mixed HPC workload.

We spent more than a week playing with different block sizes for both data and 
metadata trying to identify which variation would provide the best mix of both 
metadata performance and data performance.  one thing we noticed very early on 
is that MDtest and IOR both respond very differently as you play with both 
block size and subblock size.  What works for one use case may be a very poor 
option for another use case.

Interestingly enough it turned out that the best overall option for our 
particular use case was an 8MB block size with 32k sub blocks -- as that gave 
us good Metadata performance and good sequential data performance

which is probably why 32k sub block was the default for so many years 
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeat...@au1.ibm.com


- Original message -
From: "Marc A Kaplan" mailto:makap...@us.ibm.com>>
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Cc:
Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 
filesystem?
Date: Thu, Aug 2, 2018 10:01 AM

Firstly, I do suggest that you run some tests and see how much, if any, 
difference the settings that are available make in performance and/or storage 
utilization.

Secondly, as I and others have hinted at, deeper in the system, there may be 
additional parameters and settings.  Sometimes they are available via commands, 
and/or configuration settings, sometimes not.

Sometimes that's just because we didn't want to overwhelm you or ourselves with 
yet more "tuning knobs".

Sometimes it's because we made some component more tunable than we really 
needed, but did not make all the interconnected components equally or as widely 
tunable.
Sometimes it's because we want to save you from making ridiculous settings that 
would lead to problems...

OTOH, as I wrote before, if a burning requirement surfaces, things may change 
from release to release... Just as for so many years subblocks per block seemed 
forever frozen at the number 32.  Now it varies... and then the discussion 
shifts to why can't it be even more flexible?


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

2018-08-01 Thread Buterbaugh, Kevin L
Hi Sven (and Stephen and everyone else),

I know there are certainly things you know but can’t talk about, but I suspect 
that I am not the only one to wonder about the possible significance of “with 
the released code” in your response below?!?

I understand the technical point you’re making and maybe the solution for me is 
to just use a 4 MB block size for my metadata only system pool?  As Stephen 
Ulmer said in his response … ("Why the desire for a 1MB block size for 
metadata? It is RAID1 so no re-write penalty or need to hit a stripe size. Are 
you just trying to save the memory?  If you had a 4MB block size, an 8KB 
sub-block size and things were 4K-aligned, you would always read 2 4K inodes,”) 
… so if I’m using RAID 1 with 4K inodes then am I gaining anything by going 
with a smaller block size for metadata?

So why was I choosing 1 MB in the first place?  Well, I was planning on doing 
some experimenting with different block sizes for metadata to see if it made 
any difference.  Historically, we had used a metadata block size of 64K to 
match the hardware “stripe” size on the storage arrays (RAID 1 mirrors of hard 
drives back in the day).  Now our metadata is on SSDs so with our latest 
filesystem we used 1 MB for both data and metadata because of the 1/32nd 
sub-block thing in GPFS 4.x.  Since GPFS 5 removes that restriction, I was 
going to do some experimenting, but if the correct answer is just “if 4 MB is 
what’s best for your data, then use it for metadata too” then I don’t mind 
saving some time…. ;-)

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Aug 1, 2018, at 4:01 PM, Sven Oehme 
mailto:oeh...@gmail.com>> wrote:

the only way to get max number of subblocks for a 5.0.x filesystem with the 
released code is to have metadata and data use the same blocksize.

sven

On Wed, Aug 1, 2018 at 11:52 AM Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:
All,

Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB … so does this 
go back to what Marc is saying that there’s really only one sub blocks per 
block parameter?  If so, is there any way to get what I want as described below?

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi Sven,

OK … but why?  I mean, that’s not what the man page says.  Where does that “4 
x” come from?

And, most importantly … that’s not what I want.  I want a smaller block size 
for the system pool since it’s metadata only and on RAID 1 mirrors (HD’s on the 
test cluster but SSD’s on the production cluster).  So … side question … is 1 
MB OK there?

But I want a 4 MB block size for data with an 8 KB sub block … I want good 
performance for the sane people using our cluster without unduly punishing the 
… ahem … fine folks whose apps want to create a bazillion tiny files!

So how do I do that?

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 1, 2018, at 1:41 PM, Sven Oehme 
mailto:oeh...@gmail.com>> wrote:

the number of subblocks is derived by the smallest blocksize in any pool of a 
given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in 
the metadata pool, but 4 x of that in the data pool if your data pool is 4M.

sven
On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop 
mailto:k...@us.ibm.com>> wrote:

Marc, Kevin,

We'll be looking into this issue, since at least at a first glance, it does 
look odd. A 4MB block size should have resulted in an 8KB subblock size. I 
suspect that, somehow, the --metadata-block-size 1M may have resulted in

32768 Minimum fragment (subblock) size in bytes (other pools)


but I do not yet understand how.

The subblocks-per-full-block parameter is not supported with mmcrfs .

Felipe


Felipe Knop k...@us.ibm.com<mailto:k...@us.ibm.com>
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314




"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into 
all the details but here's a clue -- notice there is only one "subblocks-per-

From: "Marc A Kaplan" mailto:makap...@us.ibm.com>>


To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>

Date: 08/01/2018 01:21 PM
Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Se

Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

2018-08-01 Thread Buterbaugh, Kevin L
Hi Sven,

OK … but why?  I mean, that’s not what the man page says.  Where does that “4 
x” come from?

And, most importantly … that’s not what I want.  I want a smaller block size 
for the system pool since it’s metadata only and on RAID 1 mirrors (HD’s on the 
test cluster but SSD’s on the production cluster).  So … side question … is 1 
MB OK there?

But I want a 4 MB block size for data with an 8 KB sub block … I want good 
performance for the sane people using our cluster without unduly punishing the 
… ahem … fine folks whose apps want to create a bazillion tiny files!

So how do I do that?

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 1, 2018, at 1:41 PM, Sven Oehme 
mailto:oeh...@gmail.com>> wrote:

the number of subblocks is derived by the smallest blocksize in any pool of a 
given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in 
the metadata pool, but 4 x of that in the data pool if your data pool is 4M.

sven


On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop 
mailto:k...@us.ibm.com>> wrote:

Marc, Kevin,

We'll be looking into this issue, since at least at a first glance, it does 
look odd. A 4MB block size should have resulted in an 8KB subblock size. I 
suspect that, somehow, the --metadata-block-size 1M may have resulted in

32768 Minimum fragment (subblock) size in bytes (other pools)


but I do not yet understand how.

The subblocks-per-full-block parameter is not supported with mmcrfs .

Felipe


Felipe Knop k...@us.ibm.com<mailto:k...@us.ibm.com>
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314




"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into 
all the details but here's a clue -- notice there is only one "subblocks-per-

From: "Marc A Kaplan" mailto:makap...@us.ibm.com>>


To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>

Date: 08/01/2018 01:21 PM
Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





I haven't looked into all the details but here's a clue -- notice there is only 
one "subblocks-per-full-block" parameter.

And it is the same for both metadata blocks and datadata blocks.

So maybe (MAYBE) that is a constraint somewhere...

Certainly, in the currently supported code, that's what you get.




From: "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date: 08/01/2018 12:55 PM
Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving 
to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am 
setting up a new filesystem there using hardware that we recently life-cycled 
out of our production environment.

I “successfully” created a filesystem but I believe the sub-block size is 
wrong. I’m using a 4 MB filesystem block size, so according to the mmcrfs man 
page the sub-block size should be 8K:

Table 1. Block sizes and subblock sizes

+‐‐‐+‐‐‐+
| Block size | Subblock size |
+‐‐‐+‐‐‐+
| 64 KiB | 2 KiB |
+‐‐‐+‐‐‐+
| 128 KiB | 4 KiB |
+‐‐‐+‐‐‐+
| 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB |
| MiB, 4 MiB | |
+‐‐‐+‐‐‐+
| 8 MiB, 16 MiB | 16 KiB |
+‐‐‐+‐‐‐+

However, it appears that it’s 8K for the system pool but 32K for the other 
pools:

flag value description
---  ---
-f 8192 Minimum fragment (subblock) size in bytes (system pool)
32768 Minimum fragment (subblock) size in bytes (other pools)
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 2 Default number of metadata replicas
-M 3 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 3 Maximum number of data replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-n 32 Estimated number of nodes that will mount file system
-B 1048576 Block size (system pool)
4194304 Block size (other pools)
-Q user;group;fileset Qu

Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

2018-08-01 Thread Buterbaugh, Kevin L
All,

Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB … so does this 
go back to what Marc is saying that there’s really only one sub blocks per 
block parameter?  If so, is there any way to get what I want as described below?

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi Sven,

OK … but why?  I mean, that’s not what the man page says.  Where does that “4 
x” come from?

And, most importantly … that’s not what I want.  I want a smaller block size 
for the system pool since it’s metadata only and on RAID 1 mirrors (HD’s on the 
test cluster but SSD’s on the production cluster).  So … side question … is 1 
MB OK there?

But I want a 4 MB block size for data with an 8 KB sub block … I want good 
performance for the sane people using our cluster without unduly punishing the 
… ahem … fine folks whose apps want to create a bazillion tiny files!

So how do I do that?

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 1, 2018, at 1:41 PM, Sven Oehme 
mailto:oeh...@gmail.com>> wrote:

the number of subblocks is derived by the smallest blocksize in any pool of a 
given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in 
the metadata pool, but 4 x of that in the data pool if your data pool is 4M.

sven


On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop 
mailto:k...@us.ibm.com>> wrote:

Marc, Kevin,

We'll be looking into this issue, since at least at a first glance, it does 
look odd. A 4MB block size should have resulted in an 8KB subblock size. I 
suspect that, somehow, the --metadata-block-size 1M may have resulted in

32768 Minimum fragment (subblock) size in bytes (other pools)


but I do not yet understand how.

The subblocks-per-full-block parameter is not supported with mmcrfs .

Felipe


Felipe Knop k...@us.ibm.com<mailto:k...@us.ibm.com>
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314




"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into 
all the details but here's a clue -- notice there is only one "subblocks-per-

From: "Marc A Kaplan" mailto:makap...@us.ibm.com>>


To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>

Date: 08/01/2018 01:21 PM
Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





I haven't looked into all the details but here's a clue -- notice there is only 
one "subblocks-per-full-block" parameter.

And it is the same for both metadata blocks and datadata blocks.

So maybe (MAYBE) that is a constraint somewhere...

Certainly, in the currently supported code, that's what you get.




From: "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date: 08/01/2018 12:55 PM
Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving 
to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am 
setting up a new filesystem there using hardware that we recently life-cycled 
out of our production environment.

I “successfully” created a filesystem but I believe the sub-block size is 
wrong. I’m using a 4 MB filesystem block size, so according to the mmcrfs man 
page the sub-block size should be 8K:

Table 1. Block sizes and subblock sizes

+‐‐‐+‐‐‐+
| Block size | Subblock size |
+‐‐‐+‐‐‐+
| 64 KiB | 2 KiB |
+‐‐‐+‐‐‐+
| 128 KiB | 4 KiB |
+‐‐‐+‐‐‐+
| 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB |
| MiB, 4 MiB | |
+‐‐‐+‐‐‐+
| 8 MiB, 16 MiB | 16 KiB |
+‐‐‐+‐‐‐+

However, it appears that it’s 8K for the system pool but 32K for the other 
pools:

flag value description
---  ---
-f 8192 Minimum 

Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

2018-08-01 Thread Buterbaugh, Kevin L
Hi Marc,

Thanks for the response … I understand what you’re saying, but since I’m asking 
for a 1 MB block size for metadata and a 4 MB block size for data and according 
to the chart in the mmcrfs man page both result in an 8 KB sub block size I’m 
still confused as to why I’ve got a 32 KB sub block size for my non-system 
(i.e. data) pools?  Especially when you consider that 32 KB isn’t the default 
even if I had chosen an 8 or 16 MB block size!

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Aug 1, 2018, at 12:21 PM, Marc A Kaplan 
mailto:makap...@us.ibm.com>> wrote:

I haven't looked into all the details but here's a clue -- notice there is only 
one "subblocks-per-full-block" parameter.

And it is the same for both metadata blocks and datadata blocks.

So maybe (MAYBE) that is a constraint somewhere...

Certainly, in the currently supported code, that's what you get.

From:    "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To:gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:08/01/2018 12:55 PM
Subject:[gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving 
to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1.  I 
am setting up a new filesystem there using hardware that we recently 
life-cycled out of our production environment.

I “successfully” created a filesystem but I believe the sub-block size is 
wrong.  I’m using a 4 MB filesystem block size, so according to the mmcrfs man 
page the sub-block size should be 8K:

 Table 1. Block sizes and subblock sizes

+‐‐‐+‐‐‐+
| Block size| Subblock size |
+‐‐‐+‐‐‐+
| 64 KiB| 2 KiB |
+‐‐‐+‐‐‐+
| 128 KiB   | 4 KiB |
+‐‐‐+‐‐‐+
| 256 KiB, 512 KiB, 1 MiB, 2| 8 KiB |
| MiB, 4 MiB|   |
+‐‐‐+‐‐‐+
| 8 MiB, 16 MiB | 16 KiB|
+‐‐‐+‐‐‐+

However, it appears that it’s 8K for the system pool but 32K for the other 
pools:

flagvaluedescription
---  ---
 -f 8192 Minimum fragment (subblock) size 
in bytes (system pool)
32768Minimum fragment (subblock) size 
in bytes (other pools)
 -i 4096 Inode size in bytes
 -I 32768Indirect block size in bytes
 -m 2Default number of metadata replicas
 -M 3Maximum number of metadata replicas
 -r 1Default number of data replicas
 -R 3Maximum number of data replicas
 -j scatter  Block allocation type
 -D nfs4 File locking semantics in effect
 -k all  ACL semantics in effect
 -n 32   Estimated number of nodes that 
will mount file system
 -B 1048576  Block size (system pool)
4194304  Block size (other pools)
 -Q user;group;fileset   Quotas accounting enabled
user;group;fileset   Quotas enforced
none Default quotas enabled
 --perfileset-quota No   Per-fileset quota enforcement
 --filesetdfNo   Fileset df enabled?
 -V 19.01 (5.0.1.0)  File system version
 --create-time  Wed Aug  1 11:39:39 2018 File system creation time
 -z No   Is DMAPI enabled?
 -L 33554432 Logfile size
 -E Yes  Exact mtime mount option
 -S relatime Suppress atime mount option
 -K whenpossible Strict

[gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

2018-08-01 Thread Buterbaugh, Kevin L
Hi All,

Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving 
to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1.  I 
am setting up a new filesystem there using hardware that we recently 
life-cycled out of our production environment.

I “successfully” created a filesystem but I believe the sub-block size is 
wrong.  I’m using a 4 MB filesystem block size, so according to the mmcrfs man 
page the sub-block size should be 8K:

 Table 1. Block sizes and subblock sizes

+‐‐‐+‐‐‐+
| Block size| Subblock size |
+‐‐‐+‐‐‐+
| 64 KiB| 2 KiB |
+‐‐‐+‐‐‐+
| 128 KiB   | 4 KiB |
+‐‐‐+‐‐‐+
| 256 KiB, 512 KiB, 1 MiB, 2| 8 KiB |
| MiB, 4 MiB|   |
+‐‐‐+‐‐‐+
| 8 MiB, 16 MiB | 16 KiB|
+‐‐‐+‐‐‐+

However, it appears that it’s 8K for the system pool but 32K for the other 
pools:

flagvaluedescription
---  ---
 -f 8192 Minimum fragment (subblock) size 
in bytes (system pool)
32768Minimum fragment (subblock) size 
in bytes (other pools)
 -i 4096 Inode size in bytes
 -I 32768Indirect block size in bytes
 -m 2Default number of metadata replicas
 -M 3Maximum number of metadata replicas
 -r 1Default number of data replicas
 -R 3Maximum number of data replicas
 -j scatter  Block allocation type
 -D nfs4 File locking semantics in effect
 -k all  ACL semantics in effect
 -n 32   Estimated number of nodes that 
will mount file system
 -B 1048576  Block size (system pool)
4194304  Block size (other pools)
 -Q user;group;fileset   Quotas accounting enabled
user;group;fileset   Quotas enforced
none Default quotas enabled
 --perfileset-quota No   Per-fileset quota enforcement
 --filesetdfNo   Fileset df enabled?
 -V 19.01 (5.0.1.0)  File system version
 --create-time  Wed Aug  1 11:39:39 2018 File system creation time
 -z No   Is DMAPI enabled?
 -L 33554432 Logfile size
 -E Yes  Exact mtime mount option
 -S relatime Suppress atime mount option
 -K whenpossible Strict replica allocation option
 --fastea   Yes  Fast external attributes enabled?
 --encryption   No   Encryption enabled?
 --inode-limit  101095424Maximum number of inodes
 --log-replicas 0Number of log replicas
 --is4KAligned  Yes  is4KAligned?
 --rapid-repair Yes  rapidRepair enabled?
 --write-cache-threshold 0   HAWC Threshold (max 65536)
 --subblocks-per-full-block 128  Number of subblocks per full block
 -P system;raid1;raid6   Disk storage pools in file system
 --file-audit-log   No   File Audit Logging enabled?
 --maintenance-mode No   Maintenance Mode enabled?
 -d 
test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd
  Disks in file system
 -A yes  Automatic mount option
 -o none Additional mount options
 -T /gpfs5   Default mount point
 --mount-priority   0Mount priority

Output of mmcrfs:

mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k 
all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes 
--nofilesetdf --metadata-block-size 1M

The following disks of gpfs5 will be formatted on node testnsd3:

Re: [gpfsug-discuss] Power9 / GPFS

2018-07-27 Thread Buterbaugh, Kevin L
Hi Simon,

Have you tried running it with the “—silent” flag, too?

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Jul 27, 2018, at 10:18 AM, Simon Thompson 
mailto:s.j.thomp...@bham.ac.uk>> wrote:

I feel like I must be doing something stupid here but …

We’re trying to install GPFS onto some Power 9 AI systems we’ve just got…

So from Fix central, we download 
“Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install”, however we are 
failing to unpack the file:

./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 
--text-only

Extracting License Acceptance Process Tool to 5.0.1.1 ...
tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | 
tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm  
--exclude=*tgz --exclude=*deb 1> /dev/null

Installing JRE ...

If directory 5.0.1.1 has been created or was previously created during another 
extraction,
.rpm, .deb, and repository related files in it (if there were) will be removed 
to avoid conflicts with the ones being extracted.

tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | 
tar -C 5.0.1.1 --wildcards -xvz  ibm-java*tgz 1> /dev/null
tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz

Invoking License Acceptance Process Tool ...
5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar 
com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1  -text_only
Unhandled exception
Type=Segmentation error vmState=0x
J9Generic_Signal_Number=0004 Signal_Number=000b Error_Value= 
Signal_Code=0001
Handler1=7FFFB194FC80 Handler2=7FFFB176EA40
R0=7FFFB176A0E8 R1=7FFFB23AC5D0 R2=7FFFB2737400 R3=
R4=7FFFB17D2AA4 R5=0006 R6= R7=7FFFAC12A3C0


This looks like the java runtime is failing during the license approval status.

First off, can someone confirm that 
“Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install” is indeed the 
correct package we are downloading for Power9, and then any tips on how to 
extract the packages.

These systems are running the IBM factory shipped install of RedHat 7.5.

Thanks

Simon


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9660d98faa7b4241b52508d5f3d44462%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636683015365941338sdata=8%2BKtcv8Tm3S5OS67xX5lOZatL%2B7mHZ71HXgm6dalEmg%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmdiag --iohist question

2018-07-23 Thread Buterbaugh, Kevin L
Hi GPFS team,

Yes, that’s what we see, too … thanks.

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Jul 23, 2018, at 1:51 AM, IBM Spectrum Scale 
mailto:sc...@us.ibm.com>> wrote:


Hi

Please check the IO type before examining the IP address for the output of 
mmdiag --iohist. For the "lcl"(local) IO, the IP address is not necessary and 
we don't show it. Please check whether this is your case.

=== mmdiag: iohist ===

I/O history:

I/O start time RW Buf type disk:sectorNum nSec time ms Type Device/NSD ID NSD 
node
--- -- --- - - ---  
-- ---
01:14:08.450177 R inode 6:189513568 8 4.920 srv dm-4 192.168.116.92
01:14:08.450448 R inode 6:189513664 8 4.968 srv dm-4 192.168.116.92
01:14:08.475689 R inode 6:189428264 8 0.230 srv dm-4 192.168.116.92
01:14:08.983587 W logData 4:30686784 8 0.216 lcl dm-0
01:14:08.983601 W logData 3:25468480 8 0.197 lcl dm-8
01:14:08.983961 W inode 2:188808504 8 0.142 lcl dm-11
01:14:08.984144 W inode 1:188808504 8 0.134 lcl dm-7



Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D----0479=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbc6d7df8b9fb453b50bf08d5f068cc1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636679255263961402=adR3hLlARxW6mIqw%2Fw4e29V6QgBtkOvkAH8RgN2Tgeg%3D=0>.

If your query concerns a potential software error in Spectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact 1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.

"Buterbaugh, Kevin L" ---07/11/2018 10:34:32 PM---Hi All, Quick 
question about “mmdiag —iohist” that is not documented in the man page … what 
does it

From: "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date: 07/11/2018 10:34 PM
Subject: [gpfsug-discuss] mmdiag --iohist question
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





Hi All,

Quick question about “mmdiag —iohist” that is not documented in the man page … 
what does it mean if the client IP address field is blank? That the NSD server 
itself issued the I/O? Or ???

This only happens occasionally … and the way I discovered it was that our 
Python script that takes “mmdiag —iohist” output, looks up the client IP for 
any waits above the threshold, converts that to a hostname, and queries SLURM 
for whose jobs are on that client started occasionally throwing an exception … 
and when I started looking at the “mmdiag —iohist” output itself I do see times 
when there is no client IP address listed for a I/O wait.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbc6d7df8b9fb453b50bf08d5f068cc1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636679255263971410=UjU%2BSAOBf4P9oCdFRxatJ58blR9YOgDKes3Y2%2FYRzV4%3D=0>



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbc6d7df8b9fb453b50bf08d5f068cc1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636679255264001433sdata=uSiXYheeOw%2F4%2BSls8lP3XO9w7i7dFc3UWEYa%2F8aIn%2B0%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmhealth - where is the info hiding?

2018-07-19 Thread Buterbaugh, Kevin L
Hi Valdis,

Is this what you’re looking for (from an IBMer in response to another question 
a few weeks back)?

assuming 4.2.3 code level this can be done by deleting and recreating the rule 
with changed settings:

# mmhealth thresholds list
### Threshold Rules ###
rule_namemetricerror  warn  
direction  filterBy  groupBy   
sensitivity

InodeCapUtil_RuleFileset_inode 90.0   80.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name  300
MetaDataCapUtil_Rule MetaDataPool_capUtil  90.0   80.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
DataCapUtil_Rule DataPool_capUtil  90.0   80.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
MemFree_Rule mem_memfree   5  10low 
 node   300

# mmhealth thresholds delete MetaDataCapUtil_Rule
The rule(s) was(were) deleted successfully


# mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 
85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby 
gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name


#  mmhealth thresholds list
### Threshold Rules ###
rule_namemetricerror  warn  
direction  filterBy  groupBy 
sensitivity  

InodeCapUtil_RuleFileset_inode 90.0   80.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name  300
MemFree_Rule mem_memfree   5  10low 
 node   300
DataCapUtil_Rule DataPool_capUtil  90.0   80.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
MetaDataCapUtil_Rule MetaDataPool_capUtil  95.0   85.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



On Jul 19, 2018, at 4:25 PM, 
valdis.kletni...@vt.edu wrote:

So I'm trying to tidy up things like 'mmhealth' etc.  Got most of it fixed, but 
stuck on
one thing..

Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which
cleaned out a bunch of other long-past events that were "stuck" as failed /
degraded even though they were corrected days/weeks ago - keep this in mind as
you read on

# mmhealth cluster show

Component   Total Failed   DegradedHealthy  
Other
-
NODE   10  0  0 10  
0
GPFS   10  0  0 10  
0
NETWORK10  0  0 10  
0
FILESYSTEM  1  0  1  0  
0
DISK  102  0  0102  
0
CES 4  0  0  4  
0
GUI 1  0  0  1  
0
PERFMON10  0  0 10  
0
THRESHOLD  10  0  0 10  
0

Great.  One hit for 'degraded' filesystem.

# mmhealth node show --unhealthy -N all
(skipping all the nodes that show healthy)

Node name:  arnsd3-vtc.nis.internal
Node status:HEALTHY
Status Change:  21 hours ago

Component  StatusStatus Change Reasons
---
FILESYSTEM FAILED24 days ago   
pool-data_high_error(archive/system)
(...)
Node name:  arproto2-isb.nis.internal
Node status:HEALTHY
Status Change:  21 hours ago

Component  StatusStatus Change Reasons
--
FILESYSTEM DEGRADED  6 days ago
pool-data_high_warn(archive/system)

mmdf tells me:
nsd_isb_01131030056961 No   Yes  1747905536 ( 13%) 
111667200 ( 1%)
nsd_isb_02

Re: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace?

2018-07-15 Thread Buterbaugh, Kevin L
Hi All,

So I had noticed some waiters on my NSD servers that I thought were unrelated 
to the mmchdisk.  However, I decided to try rebooting my NSD servers one at a 
time (mmshutdown failed!) to clear that up … and evidently one of them had 
things hung up because the mmchdisk start completed.

Thanks…

Kevin

On Jul 15, 2018, at 12:34 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE 
CORP] mailto:aaron.s.knis...@nasa.gov>> wrote:

Hmm...have you dumped waiters across the entire cluster or just on the NSD 
servers/fs managers? Maybe there’s a slow node out there participating in the 
suspend effort? Might be worth running some quick tracing on the FS manager to 
see what it’s up to.





On July 15, 2018 at 13:27:54 EDT, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:
Hi All,

We are in a partial cluster downtime today to do firmware upgrades on our 
storage arrays.  It is a partial downtime because we have two GPFS filesystems:

1.  gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which 
I’ve unmounted across the cluster because it has data replication set to 1.

2.  gpfs22 - 42 TB and which corresponds to /home.  It has data replication set 
to two, so what we’re doing is “mmchdisk gpfs22 suspend -d ”, 
then doing the firmware upgrade, and once the array is back we’re doing a 
“mmchdisk gpfs22 resume -d ”, followed by “mmchdisk gpfs22 start -d ”.

On the 1st storage array this went very smoothly … the mmchdisk took about 5 
minutes, which is what I would expect.

But on the 2nd storage array the mmchdisk appears to either be hung or 
proceeding at a glacial pace.  For more than an hour it’s been stuck at:

mmchdisk: Processing continues ...
Scanning file system metadata, phase 1 …

There are no waiters of any significance and “mmdiag —iohist” doesn’t show any 
issues either.

Any ideas, anyone?  Unless I can figure this out I’m hosed for this downtime, 
as I’ve got 7 more arrays to do after this one!

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd518db52846a4be34e2208d5ea7a00d7%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636672732087040757sdata=m77IpWNOlODc%2FzLiYI2qiPo9Azs8qsIdXSY8%2FoC6Nn0%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace?

2018-07-15 Thread Buterbaugh, Kevin L
Hi All,

We are in a partial cluster downtime today to do firmware upgrades on our 
storage arrays.  It is a partial downtime because we have two GPFS filesystems:

1.  gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which 
I’ve unmounted across the cluster because it has data replication set to 1.

2.  gpfs22 - 42 TB and which corresponds to /home.  It has data replication set 
to two, so what we’re doing is “mmchdisk gpfs22 suspend -d ”, 
then doing the firmware upgrade, and once the array is back we’re doing a 
“mmchdisk gpfs22 resume -d ”, followed by “mmchdisk gpfs22 start -d ”.

On the 1st storage array this went very smoothly … the mmchdisk took about 5 
minutes, which is what I would expect.

But on the 2nd storage array the mmchdisk appears to either be hung or 
proceeding at a glacial pace.  For more than an hour it’s been stuck at:

mmchdisk: Processing continues ...
Scanning file system metadata, phase 1 …

There are no waiters of any significance and “mmdiag —iohist” doesn’t show any 
issues either.

Any ideas, anyone?  Unless I can figure this out I’m hosed for this downtime, 
as I’ve got 7 more arrays to do after this one!

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] mmdiag --iohist question

2018-07-11 Thread Buterbaugh, Kevin L
Hi All,

Quick question about “mmdiag —iohist” that is not documented in the man page … 
what does it mean if the client IP address field is blank?  That the NSD server 
itself issued the I/O?  Or ???

This only happens occasionally … and the way I discovered it was that our 
Python script that takes “mmdiag —iohist” output, looks up the client IP for 
any waits above the threshold, converts that to a hostname, and queries SLURM 
for whose jobs are on that client started occasionally throwing an exception … 
and when I started looking at the “mmdiag —iohist” output itself I do see times 
when there is no client IP address listed for a I/O wait.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] What NSDs does a file have blocks on?

2018-07-09 Thread Buterbaugh, Kevin L
Hi All,

I am still working on my issue of the occasional high I/O wait times and that 
has raised another question … I know that I can run mmfileid to see what files 
have a block on a given NSD, but is there a way to do the opposite?  I.e. I 
want to know what NSDs a single file has its’ blocks on?  The mmlsattr command 
does not appear to show this information unless it’s got an undocumented 
option.  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] High I/O wait times

2018-07-06 Thread Buterbaugh, Kevin L
Hi All,

Another update on this issue as we have made significant progress today … but 
first let me address the two responses I received.

Alex - this is a good idea and yes, we did this today.  We did see some higher 
latencies on one storage array as compared to the others.  10-20 ms on the 
“good” storage arrays … 50-60 ms on the one storage array.  It took us a while 
to be able to do this because while the vendor provides a web management 
interface, that didn’t show this information.  But they have an actual app that 
will … and the Mac and Linux versions don’t work.  So we had to go scrounge up 
this thing called a Windows PC and get the software installed there.  ;-)

Jonathan - also a good idea and yes, we also did this today.  I’ll explain as 
part of the rest of this update.

The main thing that we did today that has turned out to be most revealing is to 
take a list of all the NSDs in the impacted storage pool … 19 devices spread 
out over 7 storage arrays … and run read dd tests on all of them (the 
/dev/dm-XX multipath device).  15 of them showed rates of 33 - 100+ MB/sec and 
the variation is almost definitely explained by the fact that they’re in 
production use and getting hit by varying amounts of “real” work.  But 4 of 
them showed rates of 2-10 MB/sec and those 4 all happen to be on storage array 
eon34.

So, to try to rule out everything but the storage array we replaced the FC 
cables going from the SAN switches to the array, plugging the new cables into 
different ports on the SAN switches.  Then we repeated the dd tests from a 
different NSD server, which both eliminated the NSD server and its’ FC cables 
as a potential cause … and saw results virtually identical to the previous 
test.  Therefore, we feel pretty confident that it is the storage array and 
have let the vendor know all of this.

And there’s another piece of quite possibly relevant info … the last week in 
May one of the controllers in this array crashed and rebooted (it’s a 
active-active dual controller array) … when that happened the failover occurred 
… with a major glitch.  One of the LUNs essentially disappeared … more 
accurately, it was there, but had no size!  We’ve been using this particular 
vendor for 15 years now and I have seen more than a couple of their controllers 
go bad during that time and nothing like this had ever happened before.  They 
were never able to adequately explain what happened there.  So what I am 
personally suspecting has happened is that whatever caused that one LUN to go 
MIA has caused these issues with the other LUNs on the array.  As an aside, we 
ended up using mmfileid to identify the files that had blocks on the MIA LUN 
and restored those from tape backup.

I want to thank everyone who has offered their suggestions so far.  I will 
update the list again once we have a definitive problem determination.

I hope that everyone has a great weekend.  In the immortal words of the wisest 
man who ever lived, “I’m kinda tired … think I’ll go home now.”  ;-)

Kevin

On Jul 6, 2018, at 12:13 PM, Alex Chekholko 
mailto:a...@calicolabs.com>> wrote:

Hi Kevin,

This is a bit of a "cargo cult" suggestion but one issue that I have seen is if 
a disk starts misbehaving a bit but does not fail, it slows down the whole raid 
group that it is in.  And the only way to detect it is to examine the 
read/write latencies on the individual disks.  Does your SAN allow you to do 
that?

That happened to me at least twice in my life and replacing the offending 
individual disk solved the issue.  This was on DDN, so the relevant command 
were something like 'show pd * counters write_lat' or similar, which showed the 
latency for the I/Os for each disk.  If one disk in the group is an outlier 
(e.g. 1s write latencies), then the whole raid array (LUN) is just waiting for 
that one disk.

Another possibility for troubleshooting, if you have sufficient free resources: 
you can just suspend the problematic LUNs in GPFS, as that will remove the 
write load from them, while still having them service read requests and not 
affecting users.

Regards,
Alex

On Fri, Jul 6, 2018 at 9:11 AM Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:
Hi Jim,

Thank you for your response.  We are taking a two-pronged approach at this 
point:

1.  While I don’t see anything wrong with our storage arrays, I have opened a 
ticket with the vendor (not IBM) to get them to look at things from that angle.

2.  Since the problem moves around from time to time, we are enhancing our 
monitoring script to see if we can basically go from “mmdiag —iohist” to 
“clients issuing those I/O requests” to “jobs running on those clients” to see 
if there is any commonality there.

Thanks again - much appreciated!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderb

Re: [gpfsug-discuss] High I/O wait times

2018-07-06 Thread Buterbaugh, Kevin L
Hi Jim,

Thank you for your response.  We are taking a two-pronged approach at this 
point:

1.  While I don’t see anything wrong with our storage arrays, I have opened a 
ticket with the vendor (not IBM) to get them to look at things from that angle.

2.  Since the problem moves around from time to time, we are enhancing our 
monitoring script to see if we can basically go from “mmdiag —iohist” to 
“clients issuing those I/O requests” to “jobs running on those clients” to see 
if there is any commonality there.

Thanks again - much appreciated!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Jul 6, 2018, at 8:02 AM, Jim Doherty 
mailto:jjdohe...@yahoo.com>> wrote:

You may want to get an mmtrace,  but I suspect that the disk IOs are slow. 
The iohist is showing the time from when the start IO was issued until it was 
finished.Of course if you have disk IOs taking 10x too long then other IOs 
are going to queue up behind it.If there are more IOs than there are NSD 
server threads then there are going to be IOs that are queued and waiting for a 
thread.

Jim


On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:


Hi All,

First off, my apologies for the delay in responding back to the list … we’ve 
actually been working our tails off on this one trying to collect as much data 
as we can on what is a very weird issue.  While I’m responding to Aaron’s 
e-mail, I’m going to try to address the questions raised in all the responses.

Steve - this all started last week.  You’re correct about our mixed workload.  
There have been no new workloads that I am aware of.

Stephen - no, this is not an ESS.  We are running GPFS 4.2.3-8.

Aaron - no, this is not on a DDN, either.

The hardware setup is a vanilla 8 GB FC SAN.  Commodity hardware for the 
servers and storage.  We have two SAN “stacks” and all NSD servers and storage 
are connected to both stacks.  Linux multipathing handles path failures.  10 
GbE out to the network.

We first were alerted to this problem by one of our monitoring scripts which 
was designed to alert us to abnormally high I/O times, which, as I mentioned 
previously, in our environment has usually been caused by cache battery backup 
failures in the storage array controllers (but _not_ this time).  So I’m 
getting e-mails that in part read:

Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms.
Disk eon34Ensd on nsd4 has a service time of 3146.715 ms.

The “34” tells me what storage array and the “C” or “E” tells me what LUN on 
that storage array.  As I’ve mentioned, those two LUNs are by far and away my 
most frequent problem children, but here’s another report from today as well:

Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms.
Disk eon28Ansd on nsd7 has a service time of 1154.002 ms.
Disk eon31Ansd on nsd3 has a service time of 1068.987 ms.
Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms.

NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8.

Based on Fred’s excellent advice, we took a closer look at the “mmfsadm dump 
nsd” output.  We wrote a Python script to pull out what we think is the most 
pertinent information:

nsd1
29 SMALL queues, 50 requests pending, 3741 was the highest number of requests 
pending.
348 threads started, 1 threads active, 348 was the highest number of 
threads active.
29 LARGE queues, 0 requests pending, 5694 was the highest number of requests 
pending.
348 threads started, 124 threads active, 348 was the highest number of 
threads active.
nsd2
29 SMALL queues, 0 requests pending, 1246 was the highest number of requests 
pending.
348 threads started, 13 threads active, 348 was the highest number of 
threads active.
29 LARGE queues, 470 requests pending, 2404 was the highest number of requests 
pending.
348 threads started, 340 threads active, 348 was the highest number of 
threads active.
nsd3
29 SMALL queues, 108 requests pending, 1796 was the highest number of requests 
pending.
348 threads started, 0 threads active, 348 was the highest number of 
threads active.
29 LARGE queues, 35 requests pending, 3331 was the highest number of requests 
pending.
348 threads started, 4 threads active, 348 was the highest number of 
threads active.
nsd4
42 SMALL queues, 0 requests pending, 1529 was the highest number of requests 
pending.
504 threads started, 8 threads active, 504 was the highest number of 
threads active.
42 LARGE queues, 0 requests pending, 637 was the highest number of requests 
pending.
504 threads started, 211 threads active, 504 was the highest number of 
threads active.
nsd5
42 SMALL queues, 182 requests pending, 2798 was the highest number of requests 
pending.
504 threads started, 6 threads active, 5

Re: [gpfsug-discuss] High I/O wait times

2018-07-05 Thread Buterbaugh, Kevin L
” on the LARGE queue side of things and that nsd2 
and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently 
in our alerts) are the heaviest loaded.

One other thing we have noted is that our home grown RRDtool monitoring plots 
that are based on netstat, iostat, vmstat, etc. also show an oddity.  Most of 
our LUNs show up as 33 - 68% utilized … but all the LUNs on eon34 (there are 4 
in total) show up as 93 - 97% utilized.  And another oddity there is that 
eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E 
show up wyyy more than anything else … the difference between them is 
that A and B are on the storage array itself and C and E are on JBOD’s 
SAS-attached to the storage array (and yes, we’ve actually checked and reseated 
those connections).

Another reason why I could not respond earlier today is that one of the things 
which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 
GB respectively to 64 GB each … and I then upped the pagepool on those two 
boxes to 40 GB.  That has not made a difference.  How can I determine how much 
of the pagepool is actually being used, BTW?  A quick Google search didn’t help 
me.

So we’re trying to figure out if we have storage hardware issues causing GPFS 
issues or GPFS issues causing storage slowdowns.  The fact that I see slowdowns 
most often on one storage array points in one direction, while the fact that at 
times I see even worse slowdowns on multiple other arrays points the other way. 
 The fact that some NSD servers show better stats than others in the analysis 
of the “mmfsadm dump nsd” output tells me … well, I don’t know what it tells me.

I think that’s all for now.  If you have read this entire very long e-mail, 
first off, thank you!  If you’ve read it and have ideas for where I should go 
from here, T-H-A-N-K Y-O-U!

Kevin

> On Jul 4, 2018, at 7:34 AM, Aaron Knister  wrote:
> 
> Hi Kevin,
> 
> Just going out on a very weird limb here...but you're not by chance seeing 
> this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 
> 14K, etc.) We just started seeing some very weird and high latency on some of 
> our SFA12ks (that have otherwise been solid both in terms of stability and 
> performance) but only on certain volumes and the affected volumes change. 
> It's very bizzarre and we've been working closely with DDN to track down the 
> root cause but we've not yet found a smoking gun. The timing and description 
> of your problem sounded eerily similar to what we're seeing so I'd thought 
> I'd ask.
> 
> -Aaron
> 
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> 
> 
> On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote:
> 
>> Hi all,
>> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of 
>> our NSDs as reported by “mmdiag —iohist" and are struggling to understand 
>> why.  One of the
>> confusing things is that, while certain NSDs tend to show the problem more 
>> than others, the problem is not consistent … i.e. the problem tends to move 
>> around from
>> NSD to NSD (and storage array to storage array) whenever we check … which is 
>> sometimes just a few minutes apart.
>> In the past when I have seen “mmdiag —iohist” report high wait times like 
>> this it has *always* been hardware related.  In our environment, the most 
>> common cause has
>> been a battery backup unit on a storage array controller going bad and the 
>> storage array switching to write straight to disk.  But that’s *not* 
>> happening this time.
>> Is there anything within GPFS / outside of a hardware issue that I should be 
>> looking for??  Thanks!
>> —
>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and Education
>> kevin.buterba...@vanderbilt.edu - (615)875-9633
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] High I/O wait times

2018-07-03 Thread Buterbaugh, Kevin L
Hi Fred,

I have a total of 48 NSDs served up by 8 NSD servers.  12 of those NSDs are in 
our small /home filesystem, which is performing just fine.  The other 36 are in 
our ~1 PB /scratch and /data filesystem, which is where the problem is.  Our 
max filesystem block size parameter is set to 16 MB, but the aforementioned 
filesystem uses a 1 MB block size.

nsdMaxWorkerThreads is set to 1024 as shown below.  Since each NSD server 
serves an average of 6 NSDs and 6 x 12 = 72 we’re OK if I’m understanding the 
calculation correctly.  Even multiplying 48 x 12 = 576, so we’re good?!?

Your help is much appreciated!  Thanks again…

Kevin

On Jul 3, 2018, at 4:53 PM, Frederick Stock 
mailto:sto...@us.ibm.com>> wrote:

How many NSDs are served by the NSD servers and what is your maximum file 
system block size?  Have you confirmed that you have sufficient NSD worker 
threads to handle the maximum number of IOs you are configured to have active?  
That would be the number of NSDs served times 12 (you have 12 threads per 
queue).

Fred
__
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
sto...@us.ibm.com<mailto:sto...@us.ibm.com>



From:    "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To:gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:07/03/2018 05:41 PM
Subject:Re: [gpfsug-discuss] High I/O wait times
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi Fred,

Thanks for the response.  I have been looking at the “mmfsadm dump nsd” data 
from the two NSD servers that serve up the two NSDs that most commonly 
experience high wait times (although, again, this varies from time to time).  
In addition, I have been reading:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fwikis%2Fhome%3Flang%3Den%23!%2Fwiki%2FGeneral+Parallel+File+System+(GPFS)%2Fpage%2FNSD+Server+Design+and+Tuning=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C7658e1b458b147ad8a3908d5e12f6982%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662516110903567=cWw5UipcO7HgupLQTFgOWVwXF%2B9b8S%2Fw935%2FeqG6xIY%3D=0>

And:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fwikis%2Fhome%3Flang%3Den%23!%2Fwiki%2FGeneral%2520Parallel%2520File%2520System%2520(GPFS)%2Fpage%2FNSD%2520Server%2520Tuning=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C7658e1b458b147ad8a3908d5e12f6982%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662516110903567=CAuOPOhC1MXdZW2e2HaVOY0PmySwP6FzlsvNNlteWZw%3D=0>

Which seem to be the most relevant documents on the Wiki.

I would like to do a more detailed analysis of the “mmfsadm dump nsd” output, 
but my preliminary looks at it seems to indicate that I see I/O’s queueing in 
the 50 - 100 range for the small queues and the 60 - 200 range on the large 
queues.

In addition, I am regularly seeing all 12 threads on the LARGE queues active, 
while it is much more rare that I see all - or even close to all - the threads 
on the SMALL queues active.

As far as the parameters Scott and Yuri mention, on our cluster they are set 
thusly:

[common]
nsdMaxWorkerThreads 640
[]
nsdMaxWorkerThreads 1024
[common]
nsdThreadsPerQueue 4
[]
nsdThreadsPerQueue 12
[common]
nsdSmallThreadRatio 3
[]
nsdSmallThreadRatio 1

So to me it sounds like I need more resources on the LARGE queue side of things 
… i.e. it sure doesn’t sound like I want to change my small thread ratio.  If I 
increase the amount of threads it sounds like that might help, but that also 
takes more pagepool, and I’ve got limited RAM in these (old) NSD servers.  I do 
have nsdbufspace set to 70, but I’ve only got 16-24 GB RAM each in these NSD 
servers.  And a while back I did try increase the page pool on them (very 
slightly) and ended up causing problems because then they ran out of physical 
RAM.

Thoughts?  Followup questions?  Thanks!

Kevin

On Jul 3, 2018, at 3:11 PM, Frederick Stock 
mailto:sto...@us.ibm.com>> wrote:

Are you seeing similar values for all the nodes or just some of them?  One 
possible issue is how the NSD queues are configured on the NSD servers.  You 
can see this with the output of "mmfsadm dump nsd".  There are queues for LARGE 
IOs (greater than 64K) and queues for SMALL IOs (64K or less).  Check the 
highest pending values to see if many IOs are queueing.  There are a couple of 
options to fix this but rather than explain them I suggest you look for 
information about N

Re: [gpfsug-discuss] High I/O wait times

2018-07-03 Thread Buterbaugh, Kevin L
Hi Fred,

Thanks for the response.  I have been looking at the “mmfsadm dump nsd” data 
from the two NSD servers that serve up the two NSDs that most commonly 
experience high wait times (although, again, this varies from time to time).  
In addition, I have been reading:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning

And:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning

Which seem to be the most relevant documents on the Wiki.

I would like to do a more detailed analysis of the “mmfsadm dump nsd” output, 
but my preliminary looks at it seems to indicate that I see I/O’s queueing in 
the 50 - 100 range for the small queues and the 60 - 200 range on the large 
queues.

In addition, I am regularly seeing all 12 threads on the LARGE queues active, 
while it is much more rare that I see all - or even close to all - the threads 
on the SMALL queues active.

As far as the parameters Scott and Yuri mention, on our cluster they are set 
thusly:

[common]
nsdMaxWorkerThreads 640
[]
nsdMaxWorkerThreads 1024
[common]
nsdThreadsPerQueue 4
[]
nsdThreadsPerQueue 12
[common]
nsdSmallThreadRatio 3
[]
nsdSmallThreadRatio 1

So to me it sounds like I need more resources on the LARGE queue side of things 
… i.e. it sure doesn’t sound like I want to change my small thread ratio.  If I 
increase the amount of threads it sounds like that might help, but that also 
takes more pagepool, and I’ve got limited RAM in these (old) NSD servers.  I do 
have nsdbufspace set to 70, but I’ve only got 16-24 GB RAM each in these NSD 
servers.  And a while back I did try increase the page pool on them (very 
slightly) and ended up causing problems because then they ran out of physical 
RAM.

Thoughts?  Followup questions?  Thanks!

Kevin

On Jul 3, 2018, at 3:11 PM, Frederick Stock  wrote:

Are you seeing similar values for all the nodes or just some of them?  One 
possible issue is how the NSD queues are configured on the NSD servers.  You 
can see this with the output of "mmfsadm dump nsd".  There are queues for LARGE 
IOs (greater than 64K) and queues for SMALL IOs (64K or less).  Check the 
highest pending values to see if many IOs are queueing.  There are a couple of 
options to fix this but rather than explain them I suggest you look for 
information about NSD queueing on the developerWorks site.  There has been 
information posted there that should prove helpful.

Fred
__
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
sto...@us.ibm.com



From:"Buterbaugh, Kevin L" 
To:gpfsug main discussion list 
Date:07/03/2018 03:49 PM
Subject:[gpfsug-discuss] High I/O wait times
Sent by:gpfsug-discuss-boun...@spectrumscale.org




Hi all,

We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our 
NSDs as reported by “mmdiag —iohist" and are struggling to understand why.  One 
of the confusing things is that, while certain NSDs tend to show the problem 
more than others, the problem is not consistent … i.e. the problem tends to 
move around from NSD to NSD (and storage array to storage array) whenever we 
check … which is sometimes just a few minutes apart.

In the past when I have seen “mmdiag —iohist” report high wait times like this 
it has *always* been hardware related.  In our environment, the most common 
cause has been a battery backup unit on a storage array controller going bad 
and the storage array switching to write straight to disk.  But that’s *not* 
happening this time.

Is there anything within GPFS / outside of a hardware issue that I should be 
looking for??  Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>- 
(615)875-9633


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938066010=jL0pB5MEaWtJZjMbS8JzhsKGvwmYB6qV%2FVyosdUKcSU%3D=0>



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014=wIyB66HoqvL13

[gpfsug-discuss] High I/O wait times

2018-07-03 Thread Buterbaugh, Kevin L
Hi all,

We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our 
NSDs as reported by “mmdiag —iohist" and are struggling to understand why.  One 
of the confusing things is that, while certain NSDs tend to show the problem 
more than others, the problem is not consistent … i.e. the problem tends to 
move around from NSD to NSD (and storage array to storage array) whenever we 
check … which is sometimes just a few minutes apart.

In the past when I have seen “mmdiag —iohist” report high wait times like this 
it has *always* been hardware related.  In our environment, the most common 
cause has been a battery backup unit on a storage array controller going bad 
and the storage array switching to write straight to disk.  But that’s *not* 
happening this time.

Is there anything within GPFS / outside of a hardware issue that I should be 
looking for??  Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] File system manager - won't change to new node

2018-06-22 Thread Buterbaugh, Kevin L
Hi Bob,

Have you tried explicitly moving it to a specific manager node?  That’s what I 
always do … I personally never let GPFS pick when I’m moving the management 
functions for some reason.  Thanks…

Kevin

On Jun 22, 2018, at 8:13 AM, Oesterlin, Robert 
mailto:robert.oester...@nuance.com>> wrote:

Any idea why I can’t force the file system manager off this node? I turned off 
the manager on the node (mmchnode --client) and used mmchmgr to move the other 
file systems off, but I can’t move this one. There are 6 other good choices for 
file system managers. I’ve never seen this message before.

[root@nrg1-gpfs01 ~]# mmchmgr dataeng
The best choice node 10.30.43.136 (nrg1-gpfs13) is already the manager for 
dataeng.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C46935624ea7048a9471608d5d841feb5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636652700325626997=Az9GZeDDG76lDLi02NSKYXsXK9EHy%2FT3vLAtaMrnpew%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Capacity pool filling

2018-06-07 Thread Buterbaugh, Kevin L
Hi Uwe,

Thanks for your response.

So our restore software lays down the metadata first, then the data.  While it 
has no specific knowledge of the extended attributes, it does back them up and 
restore them.  So the only explanation that makes sense to me is that since the 
inode for the file says that the file should be in the gpfs23capacity pool, the 
data gets written there.

Right now I don’t have time to do an analysis of the “live” version of a 
fileset and the “restored” version of that same fileset to see if the placement 
of the files matches up.  My quick and dirty checks seem to show files getting 
written to all 3 pools.  Unfortunately, we have no way to tell our tape 
software to ignore files from the gpfs23capacity pool (and we’re aware that we 
won’t need those files).  We’ve also determined that it is actually quicker to 
tell our tape system to restore all files from a fileset than to take the time 
to tell it to selectively restore only certain files … and the same amount of 
tape would have to be read in either case.

Our SysAdmin who is primary on tape backup and restore was going on vacation 
the latter part of the week, so he decided to be helpful and just queue up all 
the restores to run one right after the other.  We didn’t realize that, so we 
are solving our disk space issues by slowing down the restores until we can run 
more instances of the script that replaces the corrupted files and deletes the 
unneeded restored files.

Thanks again…

Kevin

> On Jun 7, 2018, at 1:34 PM, Uwe Falke  wrote:
> 
>> However, I took a look in one of the restore directories under 
>> /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! 
> 
> 
>> So ? I don?t think GPFS is doing this but the next thing I am 
>> going to do is follow up with our tape software vendor ? I bet 
>> they preserve the pool attribute on files and - like Jaime said - 
>> old stuff is therefore hitting the gpfs23capacity pool.
> 
> Hm, then the backup/restore must be doing very funny things. Usually, GPFS 
> should rule the 
> placement of new files, and I assume that a restore of a file, in 
> particular under a different name, 
> creates a new file. So, if your backup tool does override that GPFS 
> placement, it must be very 
> intimate with Scale :-). 
> I'd do some list scans of the capacity pool just to see what the files 
> appearing there from tape have in common. 
> If it's really that these files' data were on the capacity pool at the 
> last backup, they should not be affected by your dead NSD and a restore is 
> in vain anyway.
> 
> If that doesn't help or give no clue, then, if the data pool has some more 
> free  space, you might try to run an upward/backward migration from 
> capacity to data . 
> 
> And, yeah, as GPFS tends to stripe over all NSDs, all files in data large 
> enough plus some smaller ones would have data on your broken NSD. That's 
> the drawback of parallelization.
> Maybe you'd ask the storage vendor whether they supply some more storage 
> for the fault of their (redundant?) device to alleviate your current 
> storage shortage ?
> 
> Mit freundlichen Grüßen / Kind regards
> 
> 
> Dr. Uwe Falke
> 
> IT Specialist
> High Performance Computing Services / Integrated Technology Services / 
> Data Center Services
> ---
> IBM Deutschland
> Rathausstr. 7
> 09111 Chemnitz
> Phone: +49 371 6978 2165
> Mobile: +49 175 575 2877
> E-Mail: uwefa...@de.ibm.com
> ---
> IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: 
> Thomas Wolter, Sven Schooß
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
> HRB 17122 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cacad30699025407bc67b08d5cca54bca%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636639932669887596=vywTFbG4O0lquAIAVfa0csdC0HtpvfhY8%2FOjqm98fxI%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Capacity pool filling

2018-06-07 Thread Buterbaugh, Kevin L
Hi again all,

I received a direct response and am not sure whether that means the sender did 
not want to be identified, but they asked good questions that I wanted to 
answer on list…

No, we do not use snapshots on this filesystem.

No, we’re not using HSM … our tape backup system is a traditional backup system 
not named TSM.  We’ve created a top level directory in the filesystem called 
“RESTORE” and are restoring everything under that … then doing our moves / 
deletes of what we’ve restored … so I *think* that means all of that should be 
written to the gpfs23data pool?!?

On the “plus” side, I may figure this out myself soon when someone / something 
starts getting I/O errors!  :-O

In the meantime, other ideas are much appreciated!

Kevin


Do you have a job that’s creating snapshots?  That’s an easy one to overlook.

Not sure if you are using an HSM. Any new file that gets generated should 
follow the default rule in ILM unless if meets a placement condition. It would 
only be if you’re using an HSM that files would be placed in a non-placement 
location pool but that is purely because the the file location has already been 
updated to the capacity pool.




On Thu, Jun 7, 2018 at 8:17 AM -0600, "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi All,

First off, I’m on day 8 of dealing with two different mini-catastrophes at work 
and am therefore very sleep deprived and possibly missing something obvious … 
with that disclaimer out of the way…

We have a filesystem with 3 pools:  1) system (metadata only), 2) gpfs23data 
(the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with 
an atime - yes atime - of more than 90 days get migrated to by a script that 
runs out of cron each weekend.

However … this morning the free space in the gpfs23capacity pool is dropping … 
I’m down to 0.5 TB free in a 582 TB pool … and I cannot figure out why.  The 
migration script is NOT running … in fact, it’s currently disabled.  So I can 
only think of two possible explanations for this:

1.  There are one or more files already in the gpfs23capacity pool that someone 
has started updating.  Is there a way to check for that … i.e. a way to run 
something like “find /gpfs23 -mtime -7 -ls” but restricted to only files in the 
gpfs23capacity pool.  Marc Kaplan - can mmfind do that??  ;-)

2.  We are doing a large volume of restores right now because one of the 
mini-catastrophes I’m dealing with is one NSD (gpfs23data pool) down due to a 
issue with the storage array.  We’re working with the vendor to try to resolve 
that but are not optimistic so we have started doing restores in case they come 
back and tell us it’s not recoverable.  We did run “mmfileid” to identify the 
files that have one or more blocks on the down NSD, but there are so many that 
what we’re doing is actually restoring all the files to an alternate path 
(easier for out tape system), then replacing the corrupted files, then deleting 
any restores we don’t need.  But shouldn’t all of that be going to the 
gpfs23data pool?  I.e. even if we’re restoring files that are in the 
gpfs23capacity pool shouldn’t the fact that we’re restoring to an alternate 
path (i.e. not overwriting files with the tape restores) and the default pool 
is the gpfs23data pool mean that nothing is being restored to the 
gpfs23capacity pool???

Is there a third explanation I’m not thinking of?

Thanks...

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Capacity pool filling

2018-06-07 Thread Buterbaugh, Kevin L
Hi All,

So in trying to prove Jaime wrong I proved him half right … the cron job is 
stopped:

#13 22 * * 5 /root/bin/gpfs_migration.sh

However, I took a look in one of the restore directories under /gpfs23/ RESTORE 
using mmlsattr and I see files in all 3 pools!  So that explains why the 
capacity pool is filling, but mmlspolicy says:

Policy for file system '/dev/gpfs23':
   Installed by root@gpfsmgr on Wed Jan 25 10:17:01 2017.
   First line of policy 'gpfs23.policy' is:
RULE 'DEFAULT' SET POOL 'gpfs23data'

So … I don’t think GPFS is doing this but the next thing I am going to do is 
follow up with our tape software vendor … I bet they preserve the pool 
attribute on files and - like Jaime said - old stuff is therefore hitting the 
gpfs23capacity pool.

Thanks Jaime and everyone else who has responded so far…

Kevin

> On Jun 7, 2018, at 9:53 AM, Jaime Pinto  wrote:
> 
> I think the restore is is bringing back a lot of material with atime > 90, so 
> it is passing-trough gpfs23data and going directly to gpfs23capacity.
> 
> I also think you may not have stopped the crontab script as you believe you 
> did.
> 
> Jaime
> 
> Quoting "Buterbaugh, Kevin L" :
> 
>> Hi All,
>> 
>> First off, I?m on day 8 of dealing with two different  mini-catastrophes at 
>> work and am therefore very sleep deprived and  possibly missing something 
>> obvious ? with that disclaimer out of the  way?
>> 
>> We have a filesystem with 3 pools:  1) system (metadata only), 2)  
>> gpfs23data (the default pool if I run mmlspolicy), and 3)  gpfs23capacity 
>> (where files with an atime - yes atime - of more than  90 days get migrated 
>> to by a script that runs out of cron each  weekend.
>> 
>> However ? this morning the free space in the gpfs23capacity pool is  
>> dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot  figure 
>> out why.  The migration script is NOT running ? in fact, it?s  currently 
>> disabled.  So I can only think of two possible  explanations for this:
>> 
>> 1.  There are one or more files already in the gpfs23capacity pool  that 
>> someone has started updating.  Is there a way to check for that  ? i.e. a 
>> way to run something like ?find /gpfs23 -mtime -7 -ls? but  restricted to 
>> only files in the gpfs23capacity pool.  Marc Kaplan -  can mmfind do that??  
>> ;-)
>> 
>> 2.  We are doing a large volume of restores right now because one of  the 
>> mini-catastrophes I?m dealing with is one NSD (gpfs23data pool)  down due to 
>> a issue with the storage array.  We?re working with the  vendor to try to 
>> resolve that but are not optimistic so we have  started doing restores in 
>> case they come back and tell us it?s not  recoverable.  We did run 
>> ?mmfileid? to identify the files that have  one or more blocks on the down 
>> NSD, but there are so many that what  we?re doing is actually restoring all 
>> the files to an alternate path  (easier for out tape system), then replacing 
>> the corrupted files,  then deleting any restores we don?t need.  But 
>> shouldn?t all of that  be going to the gpfs23data pool?  I.e. even if we?re 
>> restoring files  that are in the gpfs23capacity pool shouldn?t the fact that 
>> we?re  restoring to an alternate path (i.e. not overwriting files with the  
>> tape restores) and the default pool is the gpfs23data pool mean that  
>> nothing is being restored to the gpfs23capacity pool???
>> 
>> Is there a third explanation I?m not thinking of?
>> 
>> Thanks...
>> 
>> ?
>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and Education
>> kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> -  
>> (615)875-9633
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 
> 
> 
>  TELL US ABOUT YOUR SUCCESS STORIES
> 
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.scinethpc.ca%2Ftestimonials=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C63663970107084=VUOqjEJ%2FWt8VI%2BWolWbpa1snbLx85XFJvc0sZPuI86Q%3D=0
> 
> ---
> Jaime Pinto - Storage Analyst
> SciNet HPC Consortium - Compute/Calcul Canada
> https://na01.safelinks.protection.outlook.com/?url=www.scinet.utoronto.ca=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C63663970107084=3PxI2hAdhUOJZp5d%2BjxOu1N0BoQr8X5K8xZG%2BcONjEU%3D=0

[gpfsug-discuss] Capacity pool filling

2018-06-07 Thread Buterbaugh, Kevin L
Hi All,

First off, I’m on day 8 of dealing with two different mini-catastrophes at work 
and am therefore very sleep deprived and possibly missing something obvious … 
with that disclaimer out of the way…

We have a filesystem with 3 pools:  1) system (metadata only), 2) gpfs23data 
(the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with 
an atime - yes atime - of more than 90 days get migrated to by a script that 
runs out of cron each weekend.

However … this morning the free space in the gpfs23capacity pool is dropping … 
I’m down to 0.5 TB free in a 582 TB pool … and I cannot figure out why.  The 
migration script is NOT running … in fact, it’s currently disabled.  So I can 
only think of two possible explanations for this:

1.  There are one or more files already in the gpfs23capacity pool that someone 
has started updating.  Is there a way to check for that … i.e. a way to run 
something like “find /gpfs23 -mtime -7 -ls” but restricted to only files in the 
gpfs23capacity pool.  Marc Kaplan - can mmfind do that??  ;-)

2.  We are doing a large volume of restores right now because one of the 
mini-catastrophes I’m dealing with is one NSD (gpfs23data pool) down due to a 
issue with the storage array.  We’re working with the vendor to try to resolve 
that but are not optimistic so we have started doing restores in case they come 
back and tell us it’s not recoverable.  We did run “mmfileid” to identify the 
files that have one or more blocks on the down NSD, but there are so many that 
what we’re doing is actually restoring all the files to an alternate path 
(easier for out tape system), then replacing the corrupted files, then deleting 
any restores we don’t need.  But shouldn’t all of that be going to the 
gpfs23data pool?  I.e. even if we’re restoring files that are in the 
gpfs23capacity pool shouldn’t the fact that we’re restoring to an alternate 
path (i.e. not overwriting files with the tape restores) and the default pool 
is the gpfs23data pool mean that nothing is being restored to the 
gpfs23capacity pool???

Is there a third explanation I’m not thinking of?

Thanks...

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] gpfs 4.2.3.6 stops working withkernel 3.10.0-862.2.3.el7

2018-05-15 Thread Buterbaugh, Kevin L
All,

I have to kind of agree with Andrew … it seems that there is a broad range of 
takes on kernel upgrades … everything from “install the latest kernel the day 
it comes out” to “stick with this kernel, we know it works.”

Related to that, let me throw out this question … what about those who haven’t 
upgraded their kernel in a while at least because they’re concerned with the 
negative performance impacts of the meltdown / spectre patches???  So let’s 
just say a customer has upgraded the non-GPFS servers in their cluster, but 
they’ve left their NSD servers unpatched (I’m talking about the kernel only 
here; all other updates are applied) due to the aforementioned performance 
concerns … as long as they restrict access (i.e. who can log in) and use 
appropriate host-based firewall rules, is their some risk that they should be 
aware of?

Discuss.  Thanks!

Kevin

On May 15, 2018, at 4:45 PM, Andrew Beattie 
> wrote:

this thread is mildly amusing, given we regularly get customers asking why we 
are dropping support for versions of linux
that they "just can't move off"


Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeat...@au1.ibm.com


- Original message -
From: Stijn De Weirdt >
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org
To: gpfsug-discuss@spectrumscale.org
Cc:
Subject: Re: [gpfsug-discuss] gpfs 4.2.3.6 stops working withkernel 
3.10.0-862.2.3.el7
Date: Wed, May 16, 2018 5:35 AM

so this means running out-of-date kernels for at least another month? oh
boy...

i hope this is not some new trend in gpfs support. othwerwise all RHEL
based sites will have to start adding EUS as default cost to run gpfs
with basic security compliance.

stijn


On 05/15/2018 09:02 PM, Felipe Knop wrote:
> All,
>
> Validation of RHEL 7.5 on Scale is currently under way, and we are
> currently targeting mid June to release the PTFs on 4.2.3 and 5.0 which
> will include the corresponding fix.
>
> Regards,
>
>   Felipe
>
> 
> Felipe Knop 
> k...@us.ibm.com
> GPFS Development and Security
> IBM Systems
> IBM Building 008
> 2455 South Rd, Poughkeepsie, NY 12601
> (845) 433-9314  T/L 293-9314
>
>
>
>
>
> From: Ryan Novosielski >
> To: gpfsug main discussion list 
> >
> Date: 05/15/2018 12:56 PM
> Subject: Re: [gpfsug-discuss] gpfs 4.2.3.6 stops working withkernel
> 3.10.0-862.2.3.el7
> Sent by: 
> gpfsug-discuss-boun...@spectrumscale.org
>
>
>
> I know these dates can move, but any vague idea of a timeframe target for
> release (this quarter, next quarter, etc.)?
>
> Thanks!
>
> --
> 
> || \\UTGERS,
> |---*O*---
> ||_// the State  | Ryan Novosielski - 
> novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ  | Office of Advanced Research Computing - MSB
> C630, Newark
>  `'
>
>> On May 14, 2018, at 9:30 AM, Felipe Knop 
>> > wrote:
>>
>> All,
>>
>> Support for RHEL 7.5 and kernel level 3.10.0-862 in Spectrum Scale is
> planned for upcoming PTFs on 4.2.3 and 5.0. Since code changes are needed
> in Scale to support this kernel level, upgrading to one of those upcoming
> PTFs will be required in order to run with that kernel.
>>
>> Regards,
>>
>> Felipe
>>
>> 
>> Felipe Knop  k...@us.ibm.com
>> GPFS Development and Security
>> IBM Systems
>> IBM Building 008
>> 2455 South Rd, Poughkeepsie, NY 12601
>> (845) 433-9314 T/L 293-9314
>>
>>
>>
>> Andi Rhod Christiansen ---05/14/2018 08:15:25 AM---You are
> welcome. I see your concern but as long as IBM has not released spectrum
> scale for 7.5 that
>>
>> From:  Andi Rhod Christiansen >
>> To:  gpfsug main discussion list 
>> >
>> Date:  05/14/2018 08:15 AM
>> Subject:  Re: [gpfsug-discuss] gpfs 4.2.3.6 stops working with kernel
> 3.10.0-862.2.3.el7
>> Sent by:  
>> gpfsug-discuss-boun...@spectrumscale.org
>>
>>
>>
>>
>> You are welcome.
>>
>> I see your concern but as long as IBM has not released spectrum scale for
> 7.5 that is their only solution, in regards to them caring about security I
> would say yes they do care, but from their point of view either they tell
> the customer to upgrade as soon as red hat releases new versions and
> forcing the customer to be down until they 

Re: [gpfsug-discuss] FYI, Spectrum Scale 5.0.1 is out

2018-05-11 Thread Buterbaugh, Kevin L
On the other hand, we are very excited by this (from the README):

File systems: Traditional NSD nodes and servers can use checksums
NSD clients and servers that are configured with IBM Spectrum Scale 
can use checksums
to verify data integrity and detect network corruption of file data 
that the client
reads from or writes to the NSD server. For more information, see 
the
nsdCksumTraditional and nsdDumpBuffersOnCksumError attributes in 
the topic mmchconfig command.


Finally!  Thanks, IBM (seriously)…

Kevin

On May 11, 2018, at 12:11 PM, Sanchez, Paul 
> wrote:

I’d normally be excited by this, since we do aggressively apply GPFS upgrades.  
But it’s worth noting that no released version of Scale works with the latest 
RHEL7 kernel yet (anything >= 3.10.0-780). So if you’re also in the habit of 
aggressively upgrading RedHat then you’re going to have to wait for 5.0.1-1 
before you can resume that practice.

From: 
gpfsug-discuss-boun...@spectrumscale.org
 
>
 On Behalf Of Bryan Banister
Sent: Friday, May 11, 2018 12:25 PM
To: gpfsug main discussion list 
(gpfsug-discuss@spectrumscale.org) 
>
Subject: [gpfsug-discuss] FYI, Spectrum Scale 5.0.1 is out

It’s on fix central, 
https://www-945.ibm.com/support/fixcentral

Cheers,
-Bryan



Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cfba17a5bf8c54359d5a308d5b7636fc4%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636616560077181684=ymNFnAFOsfzWoFLXWiQMgaHdUKn9sAC8WMv4%2FNjCP%2B0%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Node list error

2018-05-10 Thread Buterbaugh, Kevin L
Hi Yaron,

Thanks for the response … no firewalld nor SELinux.  I went ahead and opened up 
a PMR and it turns out this is a known defect (at least in GPFS 5, I may have 
been the first to report it in GPFS 4.2.3.x) and IBM is working on a fix.  
Thanks…

Kevin

On May 10, 2018, at 7:51 AM, Yaron Daniel 
<y...@il.ibm.com<mailto:y...@il.ibm.com>> wrote:

Hi

Just to verify - there is no Firewalld running or Selinux ?



Regards





Yaron Daniel 94 Em Ha'Moshavot Rd


Storage ArchitectPetach Tiqva, 49527
IBM Global Markets, Systems HW Sales Israel

Phone:  +972-3-916-5672
Fax:+972-3-916-5672
Mobile: +972-52-8395593
e-mail: y...@il.ibm.com<mailto:y...@il.ibm.com>
IBM 
Israel<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ibm.com%2Fil%2Fhe%2F=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C58826c68a116427f5c2d08d5b674e2b2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636615535509439494=QXGQEXmH2gtVe4taQYPgjGUQ3W78TWwZOMY9X6LowZ8%3D=0>



   



From:Bryan Banister 
<bbanis...@jumptrading.com<mailto:bbanis...@jumptrading.com>>
To:gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date:05/08/2018 11:51 PM
Subject:Re: [gpfsug-discuss] Node list error
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>



What does `mmlsnodeclass -N ` give you?
-B



From:gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Buterbaugh, 
Kevin L
Sent: Tuesday, May 08, 2018 1:24 PM
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: [gpfsug-discuss] Node list error



Note: External Email

Hi All,



I can open a PMR for this if necessary, but does anyone know offhand what the 
following messages mean:



2018-05-08_12:16:39.567-0500: [I] Calling user exit script mmNodeRoleChange: 
event ccrFileChange, Async command /usr/lpp/mmfs/bin/mmsysmonc.
2018-05-08_12:16:39.719-0500: [I] Calling user exit script GUI_CCR_CHANGE: 
event ccrFileChange, Async command 
/usr/lpp/mmfs/gui/callbacks/global/ccrChangedCallback_421.sh.
2018-05-08_12:16:46.325-0500: [E] Node list error. Can not find all nodes in 
list 
1,1415,1515,1517,1519,1569,1571,1572,1573,1574,1575,1576,1577,1578,1579,1580,1581,1582,1583,1584,1585,1586,1587,1588,1589,1590,1591,1592,1783,1784,1786,1787,1788,1789,1793,1794,1795,1796,1797,1798,1799,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1812,1813,1815,1816,1817,1818,1819,1820,1821,1822,1823,1824,1825,1826,1827,1828,1829,1830,1831,1832,1833,1834,1835,1836,1837,1838,1839,1840,1841,1842,1843,1844,1888,1889,1908,1909,1910,1911,1912,1913,1914,1915,1916,1917,1918,1919,1920,1921,1922,1923,1924,1925,1926,1927,1928,1929,1930,1931,1932,1933,1934,1935,1936,1937,1938,1939,1940,1941,1942,1943,1966,2,2223,2235,2399,2400,2401,2402,2403,2404,2405,2407,2408,2409,2410,2411,2413,2414,2415,2416,2418,2419,2420,2421,2423,2424,2425,2426,2427,2428,2429,2430,2432,2436,2437,2438,2439,2440,2441,2442,2443,2444,2445,2446,2447,2448,2449,2450,2451,2452,2453,2454,2455,2456,2457,2458,2459,2460,2461,2462,2463,2464,2465,2466,2467,2468,2469,2470,2471,2472,2473,2474,2475,2476,2477,2478,2479,2480,2481,2482,2483,2520,2521,2522,2523,2524,2525,2526,2527,2528,2529,2530,2531,2532,2533,2534,2535,2536,2537,2538,2539,2540,2541,2542,2543,2544,2545,2546,2547,2548,2549,2550,2551,2552,2553,2554,2555,2556,2557,2558,2559,2560,2561,2562,2563,2564,2565,2566,2567,2568,2569,2570,2571,2572,2573,2574,2575,2604,2605,2607,2608,2609,2611,2612,2613,2614,2615,2616,2617,2618,2619,2620,2621,2622,2623,2624,2625,2626,2627,2628,2629,2630,2631,2632,2634,2635,2636,2637,2638,2640,2641,2642,2643,2650,2651,2652,2653,2654,2656,2657,2658,2660,2661,2662,2663,2664,2665,2666,2667,2668,2669,2670,2671,2672,2673,2674,2675,2676,2677,2679,2680,2681,2682,2683,2684,2685,2686,2687,2688,2689,2690,2691,2692,2693,2694,2695,2696,2697,2698,2699,2700,2702,2703,2704,2705,2706,2707,2708,2709,2710,2711,2712,2713,2714,2715,2716,2717,2718,2719,2720,2721,2722,2723,2724,2725,2726,2727,2728,2729,2730,2740,2741,2742,2743,2744,2745,2746,2754,2796,2797,2799,2800,2801,2802,2804,2805,2807,2808,2809,2812,2814,2815,2816,2817,2818,2819,2820,2823,
2018-05-08_12:16:46.340-0500: [E] Read Callback err 2. No user exit event is 
registered



This is GPFS 4.2.3-8.  We have not done any addition or deletion of nodes and 
have not had a bunch of nodes go offline, either.  Thanks…



Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>- 
(615)875-9633









[gpfsug-discuss] Node list error

2018-05-08 Thread Buterbaugh, Kevin L
Hi All,

I can open a PMR for this if necessary, but does anyone know offhand what the 
following messages mean:

2018-05-08_12:16:39.567-0500: [I] Calling user exit script mmNodeRoleChange: 
event ccrFileChange, Async command /usr/lpp/mmfs/bin/mmsysmonc.
2018-05-08_12:16:39.719-0500: [I] Calling user exit script GUI_CCR_CHANGE: 
event ccrFileChange, Async command 
/usr/lpp/mmfs/gui/callbacks/global/ccrChangedCallback_421.sh.
2018-05-08_12:16:46.325-0500: [E] Node list error. Can not find all nodes in 
list 
1,1415,1515,1517,1519,1569,1571,1572,1573,1574,1575,1576,1577,1578,1579,1580,1581,1582,1583,1584,1585,1586,1587,1588,1589,1590,1591,1592,1783,1784,1786,1787,1788,1789,1793,1794,1795,1796,1797,1798,1799,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1812,1813,1815,1816,1817,1818,1819,1820,1821,1822,1823,1824,1825,1826,1827,1828,1829,1830,1831,1832,1833,1834,1835,1836,1837,1838,1839,1840,1841,1842,1843,1844,1888,1889,1908,1909,1910,1911,1912,1913,1914,1915,1916,1917,1918,1919,1920,1921,1922,1923,1924,1925,1926,1927,1928,1929,1930,1931,1932,1933,1934,1935,1936,1937,1938,1939,1940,1941,1942,1943,1966,2,2223,2235,2399,2400,2401,2402,2403,2404,2405,2407,2408,2409,2410,2411,2413,2414,2415,2416,2418,2419,2420,2421,2423,2424,2425,2426,2427,2428,2429,2430,2432,2436,2437,2438,2439,2440,2441,2442,2443,2444,2445,2446,2447,2448,2449,2450,2451,2452,2453,2454,2455,2456,2457,2458,2459,2460,2461,2462,2463,2464,2465,2466,2467,2468,2469,2470,2471,2472,2473,2474,2475,2476,2477,2478,2479,2480,2481,2482,2483,2520,2521,2522,2523,2524,2525,2526,2527,2528,2529,2530,2531,2532,2533,2534,2535,2536,2537,2538,2539,2540,2541,2542,2543,2544,2545,2546,2547,2548,2549,2550,2551,2552,2553,2554,2555,2556,2557,2558,2559,2560,2561,2562,2563,2564,2565,2566,2567,2568,2569,2570,2571,2572,2573,2574,2575,2604,2605,2607,2608,2609,2611,2612,2613,2614,2615,2616,2617,2618,2619,2620,2621,2622,2623,2624,2625,2626,2627,2628,2629,2630,2631,2632,2634,2635,2636,2637,2638,2640,2641,2642,2643,2650,2651,2652,2653,2654,2656,2657,2658,2660,2661,2662,2663,2664,2665,2666,2667,2668,2669,2670,2671,2672,2673,2674,2675,2676,2677,2679,2680,2681,2682,2683,2684,2685,2686,2687,2688,2689,2690,2691,2692,2693,2694,2695,2696,2697,2698,2699,2700,2702,2703,2704,2705,2706,2707,2708,2709,2710,2711,2712,2713,2714,2715,2716,2717,2718,2719,2720,2721,2722,2723,2724,2725,2726,2727,2728,2729,2730,2740,2741,2742,2743,2744,2745,2746,2754,2796,2797,2799,2800,2801,2802,2804,2805,2807,2808,2809,2812,2814,2815,2816,2817,2818,2819,2820,2823,
2018-05-08_12:16:46.340-0500: [E] Read Callback err 2. No user exit event is 
registered

This is GPFS 4.2.3-8.  We have not done any addition or deletion of nodes and 
have not had a bunch of nodes go offline, either.  Thanks…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Not recommended, but why not?

2018-05-07 Thread Buterbaugh, Kevin L
Hi All,

I want to thank all of you who took the time to respond to this question … your 
thoughts / suggestions are much appreciated.

What I’m taking away from all of this is that it is OK to run CES on NSD 
servers as long as you are very careful in how you set things up.  This would 
include:

1.  Making sure you have enough CPU horsepower and using cgroups to limit how 
much CPU SMB and NFS can utilize.
2.  Making sure you have enough RAM … 256 GB sounds like it should be “enough” 
when using SMB.
3.  Making sure you have your network config properly set up.  We would be able 
to provide three separate, dedicated 10 GbE links for GPFS daemon 
communication, GPFS multi-cluster link to our HPC cluster, and SMB / NFS 
communication.
4.  Making sure you have good monitoring of all of the above in place.

Have I missed anything or does anyone have any additional thoughts?  Thanks…

Kevin

On May 4, 2018, at 11:26 AM, Sven Oehme 
<oeh...@gmail.com<mailto:oeh...@gmail.com>> wrote:

there is nothing wrong with running CES on NSD Servers, in fact if all CES 
nodes have access to all LUN's of the filesystem thats the fastest possible 
configuration as you eliminate 1 network hop.
the challenge is always to do the proper sizing, so you don't run out of CPU 
and memory on the nodes as you overlay functions. as long as you have good 
monitoring in place you are good. if you want to do the extra precaution, you 
could 'jail' the SMB and NFS daemons into a c-group on the node, i probably 
wouldn't limit memory but CPU as this is the more critical resource  to prevent 
expels and other time sensitive issues.

sven

On Fri, May 4, 2018 at 8:39 AM Buterbaugh, Kevin L 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>> wrote:
Hi All,

In doing some research, I have come across numerous places (IBM docs, 
DeveloperWorks posts, etc.) where it is stated that it is not recommended to 
run CES on NSD servers … but I’ve not found any detailed explanation of why not.

I understand that CES, especially if you enable SMB, can be a resource hog.  
But if I size the servers appropriately … say, late model boxes with 2 x 8 core 
CPU’s, 256 GB RAM, 10 GbE networking … is there any reason why I still should 
not combine the two?

To answer the question of why I would want to … simple, server licenses.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633<tel:(615)%20875-9633>



___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C6ec06d262ea84752b1d408d5b1dbe2cc%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636610480314880560=WcjhER5Bbl8PsgvvDGYp%2BybpNcn1tB6ChyCaSI%2FrcJU%3D=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C6ec06d262ea84752b1d408d5b1dbe2cc%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636610480314880560=J5%2F9X4dNeLrGKH%2BwmhIObVK%2BQ4oyoIa1vZ9F2yTU854%3D=0>
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C6ec06d262ea84752b1d408d5b1dbe2cc%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636610480314880560=J5%2F9X4dNeLrGKH%2BwmhIObVK%2BQ4oyoIa1vZ9F2yTU854%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Not recommended, but why not?

2018-05-04 Thread Buterbaugh, Kevin L
Hi Anderson,

Thanks for the response … however, the scenario you describe below wouldn’t 
impact us.  We have 8 NSD servers and they can easily provide the needed 
performance to native GPFS clients.  We could also take a downtime if we ever 
did need to expand in the manner described below.

In fact, one of the things that’s kinda surprising to me is that upgrading the 
SMB portion of CES requires a downtime.  Let’s just say that I know for a fact 
that sernet-samba can be done rolling / live.

Kevin

On May 4, 2018, at 10:52 AM, Anderson Ferreira Nobre 
<ano...@br.ibm.com<mailto:ano...@br.ibm.com>> wrote:

Hi Kevin,

I think one of the reasons is if you need to add or remove nodes from cluster 
you will start to face the constrains of this kind of solution. Let's say you 
have a cluster with two nodes  and share the same set of LUNs through SAN. And 
for some reason you need to add more two nodes that are NSD Servers and 
Protocol nodes. For the new nodes become NSD Servers, you will have to 
redistribute the NSD disks among four nodes. But for you do that you will have 
to umount the filesystems. And for you umount the filesystems you would need to 
stop protocol services. At the end you will realize that a simple task like 
that is disrruptive. You won't be able to do online.


Abraços / Regards / Saludos,

Anderson Nobre
AIX & Power Consultant
Master Certified IT Specialist
IBM Systems Hardware Client Technical Team – IBM Systems Lab Services

[community_general_lab_services]



Phone: 55-19-2132-4317
E-mail: ano...@br.ibm.com<mailto:ano...@br.ibm.com> [IBM]


- Original message -----
From: "Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Cc:
Subject: [gpfsug-discuss] Not recommended, but why not?
Date: Fri, May 4, 2018 12:39 PM

Hi All,

In doing some research, I have come across numerous places (IBM docs, 
DeveloperWorks posts, etc.) where it is stated that it is not recommended to 
run CES on NSD servers … but I’ve not found any detailed explanation of why not.

I understand that CES, especially if you enable SMB, can be a resource hog.  
But if I size the servers appropriately … say, late model boxes with 2 x 8 core 
CPU’s, 256 GB RAM, 10 GbE networking … is there any reason why I still should 
not combine the two?

To answer the question of why I would want to … simple, server licenses.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2b0fc12c4dc24aa1f7fb08d5b1d70c9e%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636610459542553835=8aArQLzU5q%2BySqHcoQ3SI420XzP08ICph7F18G7C4pw%3D=0>


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2b0fc12c4dc24aa1f7fb08d5b1d70c9e%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636610459542553835=8aArQLzU5q%2BySqHcoQ3SI420XzP08ICph7F18G7C4pw%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Not recommended, but why not?

2018-05-04 Thread Buterbaugh, Kevin L
Hi All,

In doing some research, I have come across numerous places (IBM docs, 
DeveloperWorks posts, etc.) where it is stated that it is not recommended to 
run CES on NSD servers … but I’ve not found any detailed explanation of why not.

I understand that CES, especially if you enable SMB, can be a resource hog.  
But if I size the servers appropriately … say, late model boxes with 2 x 8 core 
CPU’s, 256 GB RAM, 10 GbE networking … is there any reason why I still should 
not combine the two?

To answer the question of why I would want to … simple, server licenses.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] GPFS GUI - DataPool_capUtil error

2018-04-09 Thread Buterbaugh, Kevin L
Hi All,

I’m pretty new to using the GPFS GUI for health and performance monitoring, but 
am finding it very useful.  I’ve got an issue that I can’t figure out.  In my 
events I see:

Event name:pool-data_high_error
Component:File SystemEntity
type:PoolEntity
name: 
Event time:3/26/18 4:44:10 PM
Message:The pool  of file system  reached a nearly 
exhausted data level. DataPool_capUtilDescription:The pool reached a nearly 
exhausted level.
Cause:The pool reached a nearly exhausted level.
User action:Add more capacity to pool or move data to different pool or delete 
data and/or snapshots.
Reporting node:
Event type:Active health state of an entity which is monitored by the system.

Now this is for a “capacity” pool … i.e. one that mmapplypolicy is going to 
fill up to 97% full.  Therefore, I’ve modified the thresholds:

### Threshold Rules ###
rule_name metricerror  warndirection  filterBy  
groupBysensitivity
--
InodeCapUtil_Rule Fileset_inode 90.0   80.0high 
gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name  300
MemFree_Rule  mem_memfree   5  10  low  
node   300
MetaDataCapUtil_Rule  MetaDataPool_capUtil  90.0   80.0high 
gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
DataCapUtil_Rule  DataPool_capUtil  99.0   90.0high 
gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300

But it’s still in an “Error” state.  I see that the time of the event is March 
26th at 4:44 PM, so I’m thinking this is something that’s just stale, but I 
can’t figure out how to clear it.  The mmhealth command shows the error, too, 
and from that message it appears as if the event was triggered prior to my 
adjusting the thresholds:

Event Parameter SeverityActive Since 
Event Message
--
pool-data_high_error  redactedERROR   2018-03-26 16:44:10  
The pool redacted of file system redacted reached a nearly exhausted data 
level. 90.0

What do I need to do to get the GUI / mmhealth to recognize the new thresholds 
and clear this error?  I’ve searched and searched in the GUI for a way to clear 
it.  I’ve read the “Monitoring and Managing IBM Spectrum Scale Using the GUI” 
rebook pretty much cover to cover and haven’t found anything there about how to 
clear this.  Thanks...

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Dual server NSDs

2018-04-04 Thread Buterbaugh, Kevin L
Hi John,

Yes, you can remove one of the servers and yes, we’ve done it and yes, the 
documentation is clear and correct.  ;-)

Last time I did this we were in a full cluster downtime, so unmounting wasn’t 
an issue.  We were changing our network architecture and so the IP addresses of 
all NSD servers save one were changing.  It was a bit … uncomfortable … for the 
brief period of time I had to make the one NSD server the one and only NSD 
server for ~1 PB of storage!  But it worked just fine…

HTHAL…

Kevin

On Apr 4, 2018, at 4:11 AM, John Hearns 
> wrote:

I should say I already have a support ticket open for advice on this issue.
We have a filesystem which has NSDs which have two servers defined, for 
instance:
nsd:
  device=/dev/sdb
  servers=sn007,sn008
  nsd=nsd1
  usage=dataOnly

Can I remove one of these servers?  The object is to upgrade this server and 
change its hostname, the physical server will stay in place.
Has anyone carried out an operation similar to this?

I guess the documentation here is quite clear:
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20server%20balance
“If you want to change configuration for a NSD which is already belongs to a 
file system, you need to unmount the file system before running mmchnsd 
command.”
-- The information contained in this communication and any attachments is 
confidential and may be privileged, and is for the sole use of the intended 
recipient(s). Any unauthorized review, use, disclosure or distribution is 
prohibited. Unless explicitly stated otherwise in the body of this 
communication or the attachment thereto (if any), the information is provided 
on an AS-IS basis without any express or implied warranties or liabilities. To 
the extent you are relying on this information, you are doing so at your own 
risk. If you are not the intended recipient, please notify the sender 
immediately by replying to this message and destroy all copies of this message 
and any attachments. Neither the sender nor the company/group of companies he 
or she represents shall be liable for the proper and complete transmission of 
the information contained in this communication, or for any delay in its 
receipt. ___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cf2ffa137afda4368e32708d59a5c513c%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636584643653030858=Wqpqck%2FuCuzJnolVxElWG6Eky5R%2Bsc4tyvEp6we85Sw%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Local event

2018-04-04 Thread Buterbaugh, Kevin L
Hi All,

According to the man page for mmaddcallback:

A local
 event triggers a callback only on the node on which the
 event occurred, such as mounting a file system on one of
 the nodes.


We have two GPFS clusters here (well, three if you count our small test 
cluster).  Cluster one has 8 NSD servers and one client, which is used only for 
tape backup … i.e. no one logs on to any of the nodes in the cluster.  Files on 
it are accessed one of three ways:  1) CNFS mount to local computer, 2) SAMBA 
mount to local computer, 3) GPFS multi-cluster remote mount to cluster two.  On 
cluster one there is a user callback for softQuotaExceeded that e-mails the 
user … and that we know works.

Cluster two has two local GPFS filesystems and over 600 clients natively 
mounting those filesystems (it’s our HPC cluster).  I’m trying to implement a 
similar callback for softQuotaExceeded events on cluster two as well.  I’ve 
tested the callback by manually running the (Python) script and passing it in 
the parameters I want and it works - I get the e-mail.  Then I added it via 
mmcallback, but only on the GPFS servers.

I did that because I thought that since callbacks work on cluster one with no 
local access to the GPFS servers that “local” must mean “when an NSD server 
does a write that puts the user over quota”.  However, on cluster two the 
callback is not being triggered.  Does this mean that I actually need to 
install the callback on every node in cluster two?  If so, then how / why are 
callbacks working on cluster one?

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmfind performance

2018-03-07 Thread Buterbaugh, Kevin L
Hi Marc,

Thanks, I’m going to give this a try as the first mmfind finally finished 
overnight, but produced no output:

/root
root@gpfsmgrb# bash -x ~/bin/klb.sh
+ cd /usr/lpp/mmfs/samples/ilm
+ ./mmfind /gpfs23 -inum 113769917 -o -inum 132539418 -o -inum 135584191 -o 
-inum 136471839 -o -inum 137009371 -o -inum 137314798 -o -inum 137939675 -o 
-inum 137997971 -o -inum 138013736 -o -inum 138029061 -o -inum 138029065 -o 
-inum 138029076 -o -inum 138029086 -o -inum 138029093 -o -inum 138029099 -o 
-inum 138029101 -o -inum 138029102 -o -inum 138029106 -o -inum 138029112 -o 
-inum 138029113 -o -inum 138029114 -o -inum 138029119 -o -inum 138029120 -o 
-inum 138029121 -o -inum 138029130 -o -inum 138029131 -o -inum 138029132 -o 
-inum 138029141 -o -inum 138029146 -o -inum 138029147 -o -inum 138029152 -o 
-inum 138029153 -o -inum 138029154 -o -inum 138029163 -o -inum 138029164 -o 
-inum 138029165 -o -inum 138029174 -o -inum 138029175 -o -inum 138029176 -o 
-inum 138083075 -o -inum 138083148 -o -inum 138083149 -o -inum 138083155 -o 
-inum 138216465 -o -inum 138216483 -o -inum 138216507 -o -inum 138216535 -o 
-inum 138235320 -ls
/root
root@gpfsmgrb#

BTW, I had put that in a simple script simply because I had a list of those 
inodes and it was easier for me to get that in the format I wanted via a script 
that I was editing than trying to do that on the command line.

However, in the log file it was producing it “hit” on 48 files:

[I] Inodes scan: 978275821 files, 99448202 directories, 37189547 other objects, 
1967508 'skipped' files and/or errors.
[I] 2018-03-06@23:43:15.988 Policy evaluation. 1114913570 files scanned.
[I] 2018-03-06@23:43:16.016 Sorting 48 candidate file list records.
[I] 2018-03-06@23:43:16.040 Sorting 48 candidate file list records.
[I] 2018-03-06@23:43:16.065 Choosing candidate files. 0 records scanned.
[I] 2018-03-06@23:43:16.066 Choosing candidate files. 48 records scanned.
[I] Summary of Rule Applicability and File Choices:
 Rule#Hit_Cnt KB_Hit Chosen  KB_Chosen KB_Ill Rule
 0 48 1274453504 48 1274453504  0 RULE 'mmfind' 
LIST 'mmfindList' DIRECTORIES_PLUS SHOW(.) WHERE(.)

[I] Filesystem objects with no applicable rules: 1112946014.

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to list 1274453504KB: 48 of 48 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name  KB_Occupied   KB_Total Percent_Occupied
gpfs23capacity 564722407424   62491774976090.367477583%
gpfs23data 304797672448   53120350617657.378701177%
system0  0 0.0% (no user 
data)
[I] 2018-03-06@23:43:16.066 Policy execution. 0 files dispatched.
[I] 2018-03-06@23:43:16.102 Policy execution. 0 files dispatched.
[I] A total of 0 files have been migrated, deleted or processed by an EXTERNAL 
EXEC/script;
0 'skipped' files and/or errors.

While I’m going to follow your suggestion next, if you (or anyone else on the 
list) can explain why the “Hit_Cnt” is 48 but the “-ls” I passed to mmfind 
didn’t result in anything being listed, my curiosity is piqued.

And I’ll go ahead and say it before someone else does … I haven’t just chosen a 
special case, I AM a special case… ;-)

Kevin

On Mar 6, 2018, at 4:27 PM, Marc A Kaplan 
<makap...@us.ibm.com<mailto:makap...@us.ibm.com>> wrote:

Please try:

mmfind --polFlags '-N a_node_list  -g /gpfs23/tmp'  directory find-flags ...

Where a_node_list is a node list of your choice and /gpfs23/tmp is a temp 
directory of your choice...

And let us know how that goes.

Also, you have chosen a special case, just looking for some inode numbers -- so 
find can skip stating the other inodes...
whereas mmfind is not smart enough to do that -- but still with parallelism, 
I'd guess mmapplypolicy might still beat find in elapsed time to complete, even 
for this special case.

-- Marc K of GPFS



From:    "Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
To:gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date:03/06/2018 01:52 PM
Subject:[gpfsug-discuss] mmfind performance
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

In the README for the mmfind command it says:

mmfind
  A highly efficient file system traversal tool, designed to serve
   as a drop-in replacement for the 'find' command as used against GPFS FSes.

And:

mmfind is expected to be slower than find on file systems with relatively few 
inodes.
This is due to the overhead of using mmapplypolicy.
However, if you make use of the -exec flag to carry out a relatively expensive 
operation
on each file (e.g. compute a checksum), using mmfind should yield a significant 
performance
improvement, even on a fi

[gpfsug-discuss] mmfind performance

2018-03-06 Thread Buterbaugh, Kevin L
Hi All,

In the README for the mmfind command it says:

mmfind
  A highly efficient file system traversal tool, designed to serve
   as a drop-in replacement for the 'find' command as used against GPFS FSes.

And:

mmfind is expected to be slower than find on file systems with relatively few 
inodes.
This is due to the overhead of using mmapplypolicy.
However, if you make use of the -exec flag to carry out a relatively expensive 
operation
on each file (e.g. compute a checksum), using mmfind should yield a significant 
performance
improvement, even on a file system with relatively few inodes.

I have a list of just shy of 50 inode numbers that I need to figure out what 
file they correspond to, so I decided to give mmfind a try:

+ cd /usr/lpp/mmfs/samples/ilm
+ ./mmfind /gpfs23 -inum 113769917 -o -inum 132539418 -o -inum 135584191 -o 
-inum 136471839 -o -inum 137009371 -o -inum 137314798 -o -inum 137939675 -o 
-inum 137997971 -o -inum 138013736 -o -inum 138029061 -o -inum 138029065 -o 
-inum 138029076 -o -inum 138029086 -o -inum 138029093 -o -inum 138029099 -o 
-inum 138029101 -o -inum 138029102 -o -inum 138029106 -o -inum 138029112 -o 
-inum 138029113 -o -inum 138029114 -o -inum 138029119 -o -inum 138029120 -o 
-inum 138029121 -o -inum 138029130 -o -inum 138029131 -o -inum 138029132 -o 
-inum 138029141 -o -inum 138029146 -o -inum 138029147 -o -inum 138029152 -o 
-inum 138029153 -o -inum 138029154 -o -inum 138029163 -o -inum 138029164 -o 
-inum 138029165 -o -inum 138029174 -o -inum 138029175 -o -inum 138029176 -o 
-inum 138083075 -o -inum 138083148 -o -inum 138083149 -o -inum 138083155 -o 
-inum 138216465 -o -inum 138216483 -o -inum 138216507 -o -inum 138216535 -o 
-inum 138235320 -ls

I kicked that off last Friday and it is _still_ running.  By comparison, I have 
a Perl script that I have run in the past that simple traverses the entire 
filesystem tree and stat’s each file and outputs that to a log file.  That 
script would “only” run ~24 hours.

Clearly mmfind as I invoked it is much slower than the corresponding Perl 
script, so what am I doing wrong?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Meltdown, Spectre, and impacts on GPFS

2018-03-06 Thread Buterbaugh, Kevin L
Hi Leandro,

I think the silence in response to your question says a lot, don’t you?  :-O

IBM has said (on this list, I believe) that the Meltdown / Spectre patches do 
not impact GPFS functionality.  They’ve been silent as to performance impacts, 
which can and will be taken various ways.

In the absence of information from IBM, the approach we have chosen to take is 
to patch everything except our GPFS servers … only we (the SysAdmins, oh, and 
the NSA, of course!) can log in to them, so we feel that the risk of not 
patching them is minimal.

HTHAL…

Kevin

On Mar 1, 2018, at 9:02 AM, Avila-Diaz, Leandro 
<lav...@illinois.edu<mailto:lav...@illinois.edu>> wrote:

Good morning,

Does anyone know if IBM has an official statement and/or perhaps a FAQ document 
about the Spectre/Meltdown impact on GPFS?
Thank you

From: 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of IBM Spectrum Scale <sc...@us.ibm.com<mailto:sc...@us.ibm.com>>
Reply-To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: Thursday, January 4, 2018 at 20:36
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Meltdown, Spectre, and impacts on GPFS

Kevin,

The team is aware of Meltdown and Spectre. Due to the late availability of 
production-ready test patches (they became available today) we started today 
working on evaluating the impact of applying these patches. The focus would be 
both on any potential functional impacts (especially to the kernel modules 
shipped with GPFS) and on the performance degradation which affects user/kernel 
mode transitions. Performance characterization will be complex, as some system 
calls which may get invoked often by the mmfsd daemon will suddenly become 
significantly more expensive because of the kernel changes. Depending on the 
main areas affected, code changes might be possible to alleviate the impact, by 
reducing frequency of certain calls, etc. Any such changes will be deployed 
over time.

At this point, we can't say what impact this will have on stability or 
Performance on systems running GPFS — until IBM issues an official statement on 
this topic. We hope to have some basic answers soon.



Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum 
athttps://www.ibm.com/developerworks/community/forums/html/forum?id=----0479<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__www.ibm.com_developerworks_community_forums_html_forum-3Fid-3D-2D-2D-2D-2D0479%26d%3DDwMGaQ%26c%3D8hUWFZcy2Z-Za5rBPlktOQ%26r%3DcLEEIBn42RKkb7Uaw7awSyh9R6FQnSiXaB-OA4kE6vo%26m%3DB_3Q9il30tD-RgLvGVp5L7NDgT_4zoa4AwHUcf-ayBw%26s%3DHosnnsJuEYnDPQMX7VSEdm4oftDIXxU62SV-1H2WnGE%26e%3D=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ceec49ab3ce144a81db3d08d57f86b59d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636555138937139546=TyIu%2BFpjVUuCZGIxs3xlBibs1KmjT0%2F3alw4eGX98sU%3D=0>.

If your query concerns a potential software error in Spectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact 1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.

"Buterbaugh, Kevin L" ---01/04/2018 01:11:59 PM---Happy New Year 
everyone, I’m sure that everyone is aware of Meltdown and Spectre by now … we, 
like m

From: "Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: 01/04/2018 01:11 PM
Subject: [gpfsug-discuss] Meltdown, Spectre, and impacts on GPFS
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Happy New Year everyone,

I’m sure that everyone is aware of Meltdown and Spectre by now … we, like many 
other institutions, will be patching for it at the earliest possible 
opportunity.

Our understanding is that the most serious of the negative performance impacts 
of these patches will be for things like I/O (disk / network) … given that, we 
are curious if IBM has any plans for a GPFS update that could help mitigate 
those impacts? Or is there simply nothing that can be done?

If there is a GPFS update planned for this we’d be interested i

Re: [gpfsug-discuss] mmchdisk suspend / stop

2018-02-13 Thread Buterbaugh, Kevin L
Hi JAB,

OK, let me try one more time to clarify.  I’m not naming the vendor … they’re a 
small maker of commodity storage and we’ve been using their stuff for years 
and, overall, it’s been very solid.  The problem in this specific case is that 
a major version firmware upgrade is required … if the controllers were only a 
minor version apart we could do it live.

And yes, we can upgrade our QLogic SAN switches firmware live … in fact, we’ve 
done that in the past.  Should’ve been more clear there … we just try to do 
that as infrequently as possible.

So the bottom line here is that we were unaware that “major version” firmware 
upgrades could not be done live on our storage, but we’ve got a plan to work 
around this this time.

Kevin

> On Feb 13, 2018, at 7:43 AM, Jonathan Buzzard <jonathan.buzz...@strath.ac.uk> 
> wrote:
> 
> On Fri, 2018-02-09 at 15:07 +, Buterbaugh, Kevin L wrote:
>> Hi All,
>> 
>> Since several people have made this same suggestion, let me respond
>> to that.  We did ask the vendor - twice - to do that.  Their response
>> boils down to, “No, the older version has bugs and we won’t send you
>> a controller with firmware that we know has bugs in it.”
>> 
>> We have not had a full cluster downtime since the summer of 2016 -
>> and then it was only a one day downtime to allow the cleaning of our
>> core network switches after an electrical fire in our data center!
>>  So the firmware on not only our storage arrays, but our SAN switches
>> as well, it a bit out of date, shall we say…
>> 
>> That is an issue we need to address internally … our users love us
>> not having regularly scheduled downtimes quarterly, yearly, or
>> whatever, but there is a cost to doing business that way...
>> 
> 
> What sort of storage arrays are you using that don't allow you to do a
> live update of the controller firmware? Heck these days even cheapy
> Dell MD3 series storage arrays allow you to do live drive firmware
> updates.
> 
> Similarly with SAN switches surely you have separate A/B fabrics and
> can upgrade them one at a time live.
> 
> In a properly designed system one should not need to schedule downtime
> for firmware updates. He says as he plans a firmware update on his
> routers for next Tuesday morning, with no scheduled downtime and no
> interruption to service.
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C16b7c1eca3d846afc65208d572e7b6f1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636541261898197334=fY66HEDEia55g2x18VETOmE755IH7lXAfoznAewCe5A%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmchdisk suspend / stop

2018-02-09 Thread Buterbaugh, Kevin L
Hi All,

Since several people have made this same suggestion, let me respond to that.  
We did ask the vendor - twice - to do that.  Their response boils down to, “No, 
the older version has bugs and we won’t send you a controller with firmware 
that we know has bugs in it.”

We have not had a full cluster downtime since the summer of 2016 - and then it 
was only a one day downtime to allow the cleaning of our core network switches 
after an electrical fire in our data center!  So the firmware on not only our 
storage arrays, but our SAN switches as well, it a bit out of date, shall we 
say…

That is an issue we need to address internally … our users love us not having 
regularly scheduled downtimes quarterly, yearly, or whatever, but there is a 
cost to doing business that way...

Kevin

On Feb 8, 2018, at 10:46 AM, Paul Ward 
> wrote:

We tend to get the maintenance company to down-grade the firmware to match what 
we have for our aging hardware, before sending it to us.
I assume this isn’t an option?

Paul Ward
Technical Solutions Infrastructure Architect
Natural History Museum
T: 02079426450
E: p.w...@nhm.ac.uk

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] hdisk suspend / stop (Buterbaugh, Kevin L)

2018-02-08 Thread Buterbaugh, Kevin L
Hi again all,

It sounds like doing the “mmchconfig unmountOnDiskFail=meta -i” suggested by 
Steve and Bob followed by using mmchdisk to stop the disks temporarily is the 
way we need to go.  We will, as an aside, also run a mmapplypolicy first to 
pull any files users have started accessing again back to the “regular” pool 
before doing any of this.

Given that this is our “capacity” pool and files have to have an atime > 90 
days to get migrated there in the 1st place I think this is reasonable.  
Especially since users will get an I/O error if they happen to try to access 
one of those NSDs during the brief maintenance window.

As to naming and shaming the vendor … I’m not going to do that at this point in 
time.  We’ve been using their stuff for well over a decade at this point and 
have had a generally positive experience with them.  In fact, I have spoken 
with them via phone since my original post today and they have clarified that 
the problem with the mismatched firmware is only an issue because we are a 
major version off of what is current due to us choosing to not have a downtime 
and therefore not having done any firmware upgrades in well over 18 months.

Thanks, all...

Kevin

On Feb 8, 2018, at 11:17 AM, Steve Xiao 
<sx...@us.ibm.com<mailto:sx...@us.ibm.com>> wrote:

You can change the cluster configuration to online unmount the file system when 
there is error accessing metadata.   This can be done run the following command:
   mmchconfig unmountOnDiskFail=meta -i

After this configuration change, you should be able to stop all 5 NSDs with 
mmchdisk stop command.While these NSDs are in down state, any user IO to 
files resides on these disks will fail but your file system should state 
mounted and usable.

Steve Y. Xiao

> Date: Thu, 8 Feb 2018 15:59:44 +
> From: "Buterbaugh, Kevin L" 
> <kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
> To: gpfsug main discussion list 
> <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
> Subject: [gpfsug-discuss] mmchdisk suspend / stop
> Message-ID: 
> <8dca682d-9850-4c03-8930-ea6c68b41...@vanderbilt.edu<mailto:8dca682d-9850-4c03-8930-ea6c68b41...@vanderbilt.edu>>
> Content-Type: text/plain; charset="utf-8"
>
> Hi All,
>
> We are in a bit of a difficult situation right now with one of our
> non-IBM hardware vendors (I know, I know, I KNOW - buy IBM hardware!
> ) and are looking for some advice on how to deal with this
> unfortunate situation.
>
> We have a non-IBM FC storage array with dual-?redundant?
> controllers.  One of those controllers is dead and the vendor is
> sending us a replacement.  However, the replacement controller will
> have mis-matched firmware with the surviving controller and - long
> story short - the vendor says there is no way to resolve that
> without taking the storage array down for firmware upgrades.
> Needless to say there?s more to that story than what I?ve included
> here, but I won?t bore everyone with unnecessary details.
>
> The storage array has 5 NSDs on it, but fortunately enough they are
> part of our ?capacity? pool ? i.e. the only way a file lands here is
> if an mmapplypolicy scan moved it there because the *access* time is
> greater than 90 days.  Filesystem data replication is set to one.
>
> So ? what I was wondering if I could do is to use mmchdisk to either
> suspend or (preferably) stop those NSDs, do the firmware upgrade,
> and resume the NSDs?  The problem I see is that suspend doesn?t stop
> I/O, it only prevents the allocation of new blocks ? so, in theory,
> if a user suddenly decided to start using a file they hadn?t needed
> for 3 months then I?ve got a problem.  Stopping all I/O to the disks
> is what I really want to do.  However, according to the mmchdisk man
> page stop cannot be used on a filesystem with replication set to one.
>
> There?s over 250 TB of data on those 5 NSDs, so restriping off of
> them or setting replication to two are not options.
>
> It is very unlikely that anyone would try to access a file on those
> NSDs during the hour or so I?d need to do the firmware upgrades, but
> how would GPFS itself react to those (suspended) disks going away
> for a while?  I?m thinking I could be OK if there was just a way to
> actually stop them rather than suspend them.  Any undocumented
> options to mmchdisk that I?m not aware of???
>
> Are there other options - besides buying IBM hardware - that I am
> overlooking?  Thanks...
>
> ?
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and Education
> kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu><mailto:kevin.buterba...@vanderbilt.edu
> > - (615)87

[gpfsug-discuss] mmchdisk suspend / stop

2018-02-08 Thread Buterbaugh, Kevin L
Hi All,

We are in a bit of a difficult situation right now with one of our non-IBM 
hardware vendors (I know, I know, I KNOW - buy IBM hardware! ) and are 
looking for some advice on how to deal with this unfortunate situation.

We have a non-IBM FC storage array with dual-“redundant” controllers.  One of 
those controllers is dead and the vendor is sending us a replacement.  However, 
the replacement controller will have mis-matched firmware with the surviving 
controller and - long story short - the vendor says there is no way to resolve 
that without taking the storage array down for firmware upgrades.  Needless to 
say there’s more to that story than what I’ve included here, but I won’t bore 
everyone with unnecessary details.

The storage array has 5 NSDs on it, but fortunately enough they are part of our 
“capacity” pool … i.e. the only way a file lands here is if an mmapplypolicy 
scan moved it there because the *access* time is greater than 90 days.  
Filesystem data replication is set to one.

So … what I was wondering if I could do is to use mmchdisk to either suspend or 
(preferably) stop those NSDs, do the firmware upgrade, and resume the NSDs?  
The problem I see is that suspend doesn’t stop I/O, it only prevents the 
allocation of new blocks … so, in theory, if a user suddenly decided to start 
using a file they hadn’t needed for 3 months then I’ve got a problem.  Stopping 
all I/O to the disks is what I really want to do.  However, according to the 
mmchdisk man page stop cannot be used on a filesystem with replication set to 
one.

There’s over 250 TB of data on those 5 NSDs, so restriping off of them or 
setting replication to two are not options.

It is very unlikely that anyone would try to access a file on those NSDs during 
the hour or so I’d need to do the firmware upgrades, but how would GPFS itself 
react to those (suspended) disks going away for a while?  I’m thinking I could 
be OK if there was just a way to actually stop them rather than suspend them.  
Any undocumented options to mmchdisk that I’m not aware of???

Are there other options - besides buying IBM hardware - that I am overlooking?  
Thanks...

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Metadata only system pool

2018-01-23 Thread Buterbaugh, Kevin L
Hi All,

This is all making sense and I appreciate everyone’s responses … and again I 
apologize for not thinking about the indirect blocks.

Marc - we specifically chose 4K inodes when we created this filesystem a little 
over a year ago so that small files could fit in the inode and therefore be 
stored on the metadata SSDs.

This is more of a curiosity question … is it documented somewhere how a 4K 
inode is used?  I understand that for very small files up to 3.5K of that can 
be for data, but what about for large files?  I.e., how much of that 4K is used 
for block addresses  (3.5K plus whatever portion was already allocated to block 
addresses??) … or what I’m really asking is, given 4K inodes and a 1M block 
size how big does a file have to be before it will need to use indirect blocks?

Thanks again…

Kevin

On Jan 23, 2018, at 1:12 PM, Marc A Kaplan 
<makap...@us.ibm.com<mailto:makap...@us.ibm.com>> wrote:

If one were starting over, it might make sense to use a  smaller inode size.  I 
believe we still support 512, 1K, 2K.
Tradeoff with the fact that inodes can store data and EAs.




From:"Uwe Falke" <uwefa...@de.ibm.com<mailto:uwefa...@de.ibm.com>>
To:gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date:01/23/2018 04:04 PM
Subject:Re: [gpfsug-discuss] Metadata only system pool
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




rough calculation (assuming 4k inodes):
350 x 10^6 x 4096 Bytes=1.434TB=1.304TiB. With replication that uses
2.877TB or 2.308TiB
As already mentioned here, directory and indirect blocks come on top. Even
if you could get rid of a portion of the allocated and unused inodes that
metadata pool appears bit small to me.
If that is a large filesystem there should be some funding to extend it.
If you have such a many-but-small-files system as discussed recently in
this theatre, you might still beg for more MD storage but that makes than
a larger portion of the total cost (assuming data storage is on HDD and md
storage on SSD) and that again reduces your chances.




Mit freundlichen Grüßen / Kind regards


Dr. Uwe Falke

IT Specialist
High Performance Computing Services / Integrated Technology Services /
Data Center Services
---
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefa...@de.ibm.com<mailto:uwefa...@de.ibm.com>
---
IBM Deutschland Business & Technology Services GmbH / Geschäftsführung:
Thomas Wolter, Sven Schooß
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 17122




From:   "Buterbaugh, Kevin L" <kevin.buterba...@vanderbilt.edu>
To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Date:   01/23/2018 06:17 PM
Subject:[gpfsug-discuss] Metadata only system pool
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hi All,

I was under the (possibly false) impression that if you have a filesystem
where the system pool contains metadata only then the only thing that
would cause the amount of free space in that pool to change is the
creation of more inodes ? is that correct?  In other words, given that I
have a filesystem with 130 million free (but allocated) inodes:

Inode Information
-
Number of used inodes:   218635454
Number of free inodes:   131364674
Number of allocated inodes:  35128
Maximum number of inodes:35128

I would not expect that a user creating a few hundred or thousands of
files could cause a ?no space left on device? error (which I?ve got one
user getting).  There?s plenty of free data space, BTW.

Now my system pool is almost ?full?:

(pool total)   2.878T   34M (  0%)
  140.9M ( 0%)

But again, what - outside of me creating more inodes - would cause that to
change??

Thanks?

Kevin

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and
Education
kevin.buterba...@vanderbilt.edu - (615)875-9633


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwIFAw=jf_iaSHvJObTbx-siA1ZOg=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8=8WgQlUnhzFycOZf-YvqYpdUkCiyEdRvWukE-KKRuFbE=aCywbK-1heVHPR8Fg74z9VxkGbNfCxMdtEKIDMWVIwI=<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3D

Re: [gpfsug-discuss] Metadata only system pool

2018-01-23 Thread Buterbaugh, Kevin L
Hi All,

I do have metadata replication set to two, so Alex, does that make more sense?

And I had forgotten about indirect blocks for large files, which actually makes 
sense with the user in question … my apologies for that … due to a very gravely 
ill pet and a recovering at home from pneumonia family member I’m way more 
sleep deprived right now than I’d like.  :-(

Fred - I think you’ve already answered this … but mmchfs can only create / 
allocate more inodes … it cannot be used to shrink the number of inodes?  That 
would make sense, and if that’s the case then I can allocate more NSDs to the 
system pool.

Thanks…

Kevin

On Jan 23, 2018, at 11:27 AM, Alex Chekholko 
<a...@calicolabs.com<mailto:a...@calicolabs.com>> wrote:

2.8TB seems quite high for only 350M inodes.  Are you sure you only have 
metadata in there?

On Tue, Jan 23, 2018 at 9:25 AM, Frederick Stock 
<sto...@us.ibm.com<mailto:sto...@us.ibm.com>> wrote:
One possibility is the creation/expansion of directories or allocation of 
indirect blocks for large files.

Not sure if this is the issue here but at one time inode allocation was 
considered slow and so folks may have pre-allocated inodes to avoid that 
overhead during file creation.  To my understanding inode creation time is not 
so slow that users need to pre-allocate inodes.  Yes, there are likely some 
applications where pre-allocating may be necessary but I expect they would be 
the exception.  I mention this because you have a lot of free inodes and of 
course once they are allocated they cannot be de-allocated.

Fred
__
Fred Stock | IBM Pittsburgh Lab | 720-430-8821<tel:(720)%20430-8821>
sto...@us.ibm.com<mailto:sto...@us.ibm.com>



From:"Buterbaugh, Kevin L" <kevin.buterba...@vanderbilt.edu>
To:gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date:01/23/2018 12:17 PM
Subject:[gpfsug-discuss] Metadata only system pool
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

I was under the (possibly false) impression that if you have a filesystem where 
the system pool contains metadata only then the only thing that would cause the 
amount of free space in that pool to change is the creation of more inodes … is 
that correct?  In other words, given that I have a filesystem with 130 million 
free (but allocated) inodes:

Inode Information
-
Number of used inodes:   218635454
Number of free inodes:   131364674
Number of allocated inodes:  35128
Maximum number of inodes:35128

I would not expect that a user creating a few hundred or thousands of files 
could cause a “no space left on device” error (which I’ve got one user 
getting).  There’s plenty of free data space, BTW.

Now my system pool is almost “full”:

(pool total)   2.878T   34M (  0%)  
  140.9M ( 0%)

But again, what - outside of me creating more inodes - would cause that to 
change??

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>- 
(615)875-9633<tel:(615)%20875-9633>


___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C1607a3fe872e4241587b08d56286a746%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636523252830007825=h%2BCEzaXFYl%2By89m3IVPo960AeN2CL7idGpgCLZgOip8%3D=0>
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw=gou0xYZwz8M-5i8mT6Tthafi8JW2aMrzQGMK1hUEUls=jcHOB_vmJjE8PnrpfHqzMkm1nk6QWwkn2npTEP6kcKs=<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3Dp_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw%26m%3Dgou0xYZwz8M-5i8mT6Tthafi8JW2aMrzQGMK1hUEUls%26s%3DjcHOB_vmJjE8PnrpfHqzMkm1nk6QWwkn2npTEP6kcKs%26e%3D=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C1607a3fe872e4241587b08d56286a746%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636523252830007825=%2B87gWCFaIiPJUDwTWW5KxsR11rJJB6o6EfrIgIOGPKE%3D=0>




___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C1607a3fe872e4241587b08d56286a746%7Cba5a7f39e3be4ab3b

[gpfsug-discuss] Metadata only system pool

2018-01-23 Thread Buterbaugh, Kevin L
Hi All,

I was under the (possibly false) impression that if you have a filesystem where 
the system pool contains metadata only then the only thing that would cause the 
amount of free space in that pool to change is the creation of more inodes … is 
that correct?  In other words, given that I have a filesystem with 130 million 
free (but allocated) inodes:

Inode Information
-
Number of used inodes:   218635454
Number of free inodes:   131364674
Number of allocated inodes:  35128
Maximum number of inodes:35128

I would not expect that a user creating a few hundred or thousands of files 
could cause a “no space left on device” error (which I’ve got one user 
getting).  There’s plenty of free data space, BTW.

Now my system pool is almost “full”:

(pool total)   2.878T   34M (  0%)  
  140.9M ( 0%)

But again, what - outside of me creating more inodes - would cause that to 
change??

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] GPFS best practises : end user standpoint

2018-01-17 Thread Buterbaugh, Kevin L
Inline…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Jan 16, 2018, at 11:25 AM, Jonathan Buzzard 
<jonathan.buzz...@strath.ac.uk<mailto:jonathan.buzz...@strath.ac.uk>> wrote:

On Tue, 2018-01-16 at 16:35 +, Buterbaugh, Kevin L wrote:

[SNIP]

I am quite sure someone storing 1PB has to pay more than someone
storing 1TB, so why should someone storing 20 million files not have to
pay more than someone storing 100k files?


Because they won’t … they’ll do something more brain dead like put a WD MyBook 
they bought at Costco on their desk and expect their job to copy data back and 
forth from it to /tmp on the compute node.  We have to offer a service that 
users are willing to pay for … we can’t dictate to them the way things WILL be.

There’s a big difference between the way things should be and the way things 
actually are … trust me, those of us in the United States know that better than 
most people around the world after the past year!  Bigly!  Buh-leave ME!  :-O


JAB.


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] GPFS best practises : end user standpoint

2018-01-16 Thread Buterbaugh, Kevin L
Hi Jonathan,

Comments / questions inline.  Thanks!

Kevin

> On Jan 16, 2018, at 10:08 AM, Jonathan Buzzard 
>  wrote:
> 
> On Tue, 2018-01-16 at 15:47 +, Carl Zetie wrote:
>> Maybe this would make for a good session at a future user group
>> meeting -- perhaps as an interactive session? IBM could potentially
>> provide a facilitator from our Design practice.
>>  
> 
> Most of it in my view is standard best practice regardless of the file
> system in use.
> 
> So in our mandatory training for the HPC, we tell our users don't use
> whacked out characters in your file names and directories. Specifically
> no backticks, no asterik's, no question marks, no newlines (yes
> really), no slashes (either forward or backward) and for Mac users
> don't start the name with a space (forces sorting to the top). We
> recommend sticking to plain ASCII so no accented characters either
> (harder if your native language is not English I guess but we are UK
> based so...). We don't enforce that but if it causes the user problems
> then they are on their own.

We’re in Tennessee, so not only do we not speak English, we barely speak 
American … y’all will just have to understand, bless your hearts!  ;-). 

But seriously, like most Universities, we have a ton of users for whom English 
is not their “primary” language, so dealing with “interesting” filenames is 
pretty hard to avoid.  And users’ problems are our problems whether or not 
they’re our problem.

> 
> We also strongly recommend using ISO 8601 date formats in file names to
> get date sorting from a directory listing too. Surprisingly not widely
> known about, but a great "life hack".
> 
> Then it boils down to don't create zillions of files. I would love to
> be able to somehow do per directory file number quota's where one could
> say set a default of a few thousand. Users would then have to justify
> needing a larger quota. Sure you can set a file number quota but that
> does not stop them putting them all in one directory.

If you’ve got (bio)medical users using your cluster I don’t see how you avoid 
this … they’re using commercial apps that do this kind of stupid stuff (10’s of 
thousands of files in a directory and the full path to each file is longer than 
the contents of the files themselves!).

This reminds me of way back in 2005 when we moved from an NFS server to GPFS … 
I was moving users over by tarring up their home directories on the NFS server, 
copying the tarball over to GPFS and untarring it there … worked great for 699 
out of 700 users.  But there was one user for whom the untar would fail every 
time I tried … turned out that back in early versions of GPFS 2.3 IBM hadn’t 
considered that someone would put 6 million files in one directory!  :-O

> 
> If users really need to have zillions of files then charge them more so
> you can afford to beef up your metadata disks to SSD.

OK, so here’s my main question … you’re right that SSD’s are the answer … but 
how do you charge them more?  SSDs are move expensive than hard disks, and 
enterprise SSDs are stupid expensive … and users barely want to pay hard drive 
prices for their storage.  If you’ve got the magic answer to how to charge them 
enough to pay for SSDs I’m sure I’m not the only one who’d love to hear how you 
do it?!?!

> 
> 
> JAB.
> 
> -- 
> Jonathan A. Buzzard Tel: +44141-5483420
> HPC System Administrator, ARCHIE-WeSt.
> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cdd3310fd309b4986f95c08d55cfb5d10%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636517157039256068=jZZV718gaMie92MW43qaxlDl6EQcULdk6FONrXpsP7c%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Meltdown, Spectre, and impacts on GPFS

2018-01-08 Thread Buterbaugh, Kevin L
Hi GPFS Team,

Thanks for this response.  If it is at all possible I know that we (and I would 
suspect many others are in this same boat) would greatly appreciate a update 
from IBM on how a patched kernel impacts GPFS functionality.  Yes, we’d love to 
know the performance impact of the patches on GPFS, but that pales in 
significance to knowing whether GPFS version 4.x.x.x will even *start* with the 
patched kernel(s).

Thanks again…

Kevin

On Jan 4, 2018, at 4:55 PM, IBM Spectrum Scale 
<sc...@us.ibm.com<mailto:sc...@us.ibm.com>> wrote:


Kevin,

The team is aware of Meltdown and Spectre. Due to the late availability of 
production-ready test patches (they became available today) we started today 
working on evaluating the impact of applying these patches. The focus would be 
both on any potential functional impacts (especially to the kernel modules 
shipped with GPFS) and on the performance degradation which affects user/kernel 
mode transitions. Performance characterization will be complex, as some system 
calls which may get invoked often by the mmfsd daemon will suddenly become 
significantly more expensive because of the kernel changes. Depending on the 
main areas affected, code changes might be possible to alleviate the impact, by 
reducing frequency of certain calls, etc. Any such changes will be deployed 
over time.

At this point, we can't say what impact this will have on stability or 
Performance on systems running GPFS — until IBM issues an official statement on 
this topic. We hope to have some basic answers soon.



Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D----0479=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C78bfa7bc56e6403dfa2008d553e510e7%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636507165938314209=D%2BormYVXAwlnskLb525U2nuaUXncu5Wgt98V6U4xBZc%3D=0>.

If your query concerns a potential software error in Spectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact 1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.

"Buterbaugh, Kevin L" ---01/04/2018 01:11:59 PM---Happy New Year 
everyone, I’m sure that everyone is aware of Meltdown and Spectre by now … we, 
like m

From: "Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: 01/04/2018 01:11 PM
Subject: [gpfsug-discuss] Meltdown, Spectre, and impacts on GPFS
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





Happy New Year everyone,

I’m sure that everyone is aware of Meltdown and Spectre by now … we, like many 
other institutions, will be patching for it at the earliest possible 
opportunity.

Our understanding is that the most serious of the negative performance impacts 
of these patches will be for things like I/O (disk / network) … given that, we 
are curious if IBM has any plans for a GPFS update that could help mitigate 
those impacts? Or is there simply nothing that can be done?

If there is a GPFS update planned for this we’d be interested in knowing so 
that we could coordinate the kernel and GPFS upgrades on our cluster.

Thanks…

Kevin

P.S. The “Happy New Year” wasn’t intended as sarcasm … I hope it is a good year 
for everyone despite how it’s starting out. :-O

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Password to GUI forgotten

2018-01-05 Thread Buterbaugh, Kevin L
Hi GPFS team,

I did open a PMR and they (mainly Matthais) did help me get that issue 
resolved.  Thanks for following up!

Kevin

On Jan 5, 2018, at 6:39 AM, IBM Spectrum Scale 
<sc...@us.ibm.com<mailto:sc...@us.ibm.com>> wrote:

Hi Kevin,

If you are stuck then please open a PMR and work with the IBM support folks to 
get this resolved.

Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D----0479=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C76c0a9c8b92d44c14a4908d554394839%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636507527637309395=HwmbM1rdwM37%2F3QFpQY%2BA%2FDAnEvkRIyhONhDVDkOjwo%3D=0>.

If your query concerns a potential software error in Spectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact  1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.



From:    "Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
To:"Hanley, Jesse A." <hanle...@ornl.gov<mailto:hanle...@ornl.gov>>
Cc:gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date:12/19/2017 01:42 AM
Subject:Re: [gpfsug-discuss] Password to GUI forgotten
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi Jesse,

Thanks for the suggestion … I find the following error very interesting:

/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/rmuser admin
EFSSP0010C CLI parser: The object "admin" specified for "userID" does not exist.
/root
root@testnsd1#

That says to me that I don’t have an admin user, which - if true - would 
explain why not a single password I can think of works.  ;-)

But as I mentioned in my original post I had this up and working earlier this 
fall.  While I can’t prove anything, I can’t imagine a scenario where I would 
deliberately choose a non-default username.  So if “admin” has been the default 
login for the GPFS GUI all along then I am really mystified.

Thanks!

Kevin

On Dec 18, 2017, at 1:58 PM, Hanley, Jesse A. 
<hanle...@ornl.gov<mailto:hanle...@ornl.gov>> wrote:

Kevin,

I ran into this a couple times using 4.2.3.  This is what we used to get around 
it:

/usr/lpp/mmfs/gui/cli/rmuser admin
/usr/lpp/mmfs/gui/cli/mkuser admin -p  -g Administrator,SecurityAdmin

You may need to run the initgui command if those objects are present.  That 
typically gets run on first login to the GUI.

Thanks,
--
Jesse


From: 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of "Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
Reply-To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: Monday, December 18, 2017 at 2:52 PM
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Password to GUI forgotten

Hi All,

Sorry for the delay in getting back with you all … didn’t mean to leave this 
hanging, but some higher priority things came up.

Bottom line - I’m still stuck and probably going to open up a PMR with IBM 
after sending this.  Richards’ suggestion below errors for me on the “-g 
Administrator” part.  Other suggestions sent directly to me up to and including 
completely deleting the GPFS GUI and reinstalling have also not worked.

No matter what I do, I cannot log in to the GUI.  Thanks for the suggestions, 
though…

Kevin


On Dec 7, 2017, at 6:10 AM, Sobey, Richard A 
<r.so...@imperial.ac.uk<mailto:r.so...@imperial.ac.uk>> wrote:

Sorry I need to learn to read… didn’t see the “object ‘Administrator’ does not 
exist” error.

That said, my workaround for the problem of forgetting the password was to 
create a new “admin2” user and use that to reset the password on admin itself.

[root@gpfs cli]# ./mkuser admin2 -p Passw0rd -g Administrator,SecurityAdmin
EFSSG0019I The user admin2 has been successfully created.
EFSSG1000I The command completed successfully.


Cheers
Richard

From: 
gpfsu

Re: [gpfsug-discuss] Password to GUI forgotten

2017-12-18 Thread Buterbaugh, Kevin L
Hi Jesse,

Thanks for the suggestion … I find the following error very interesting:

/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/rmuser admin
EFSSP0010C CLI parser: The object "admin" specified for "userID" does not exist.
/root
root@testnsd1#

That says to me that I don’t have an admin user, which - if true - would 
explain why not a single password I can think of works.  ;-)

But as I mentioned in my original post I had this up and working earlier this 
fall.  While I can’t prove anything, I can’t imagine a scenario where I would 
deliberately choose a non-default username.  So if “admin” has been the default 
login for the GPFS GUI all along then I am really mystified.

Thanks!

Kevin

On Dec 18, 2017, at 1:58 PM, Hanley, Jesse A. 
<hanle...@ornl.gov<mailto:hanle...@ornl.gov>> wrote:

Kevin,

I ran into this a couple times using 4.2.3.  This is what we used to get around 
it:

/usr/lpp/mmfs/gui/cli/rmuser admin
/usr/lpp/mmfs/gui/cli/mkuser admin -p  -g Administrator,SecurityAdmin

You may need to run the initgui command if those objects are present.  That 
typically gets run on first login to the GUI.

Thanks,
--
Jesse


From: 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of "Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
Reply-To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: Monday, December 18, 2017 at 2:52 PM
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Password to GUI forgotten

Hi All,

Sorry for the delay in getting back with you all … didn’t mean to leave this 
hanging, but some higher priority things came up.

Bottom line - I’m still stuck and probably going to open up a PMR with IBM 
after sending this.  Richards’ suggestion below errors for me on the “-g 
Administrator” part.  Other suggestions sent directly to me up to and including 
completely deleting the GPFS GUI and reinstalling have also not worked.

No matter what I do, I cannot log in to the GUI.  Thanks for the suggestions, 
though…

Kevin


On Dec 7, 2017, at 6:10 AM, Sobey, Richard A 
<r.so...@imperial.ac.uk<mailto:r.so...@imperial.ac.uk>> wrote:

Sorry I need to learn to read… didn’t see the “object ‘Administrator’ does not 
exist” error.

That said, my workaround for the problem of forgetting the password was to 
create a new “admin2” user and use that to reset the password on admin itself.

[root@gpfs cli]# ./mkuser admin2 -p Passw0rd -g Administrator,SecurityAdmin
EFSSG0019I The user admin2 has been successfully created.
EFSSG1000I The command completed successfully.


Cheers
Richard

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Sobey, Richard A
Sent: 07 December 2017 11:57
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Password to GUI forgotten

This happened to me a while back, I opened a pmr to get it sorted but it's just 
a case of running some cli commands. I'll dig it out.
Get Outlook for 
Android<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fghei36=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cba030691159e473668f408d53d6b930f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636482454631155492=sPRr5u4lo%2BAwBPBQ2%2BdXw%2F2EWUAqy30Fk0UNssRFWHU%3D=0>


From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Buterbaugh, Kevin L 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
Sent: Wednesday, December 6, 2017 10:41:12 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Password to GUI forgotten

All,

/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/chuser admin -p abc1231 -g 
Administrator,SecurityAdmin
EFSSP0010C CLI parser: The object "Administrator" specified for "-g" does not 
exist.
/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/chuser admin -p abc1231 -g SecurityAdmin
EFSSP0010C CLI parser: The object "SecurityAdmin" specified for "-g" does not 
exist.
/root
root@testnsd1#

I’ll also add that all of the work I did earlier in the fall was with the test 
cluster running an earlier version of GPFS and it’s subsequently been updated 
to GPFS 4.2.3.5 … not sure that’s relevant but wanted to mention it just in 
case.

Thanks!

Kevin



On Dec 6, 2017, at 4:32 PM, Joshua Kwedar 
<jdkwe...@gmail.com<mailto:jdkwe...@gmail.com>&g

Re: [gpfsug-discuss] FW: Spectrum Scale 5.0 now available on Fix Central

2017-12-18 Thread Buterbaugh, Kevin L
Hi All,

GPFS 5.0 was announced on Friday … and today:

IBM Spectrum Scale
: IBM Spectrum Scale: NFS operations may fail with 
IO-Error

IBM has identified an issue with IBM Spectrum Scale 5.0.0.0 Protocol support 
for NFSv3/v4 in which IO-errors may be returned to the NFS client if the NFS 
server accumulates file-descriptor resources beyond the defined limit. 
Accumulation of file descriptor resources will occur when NFSv3 file create 
operations are sent against files that are already in use.
Bob’s suggestion in a previous e-mail to the list about installing this on a 
test cluster is almost certainly very, VERY good advice.  That’s certainly what 
we will do after the holiday break...

Kevin

On Dec 18, 2017, at 1:43 PM, Oesterlin, Robert 
> wrote:

The Scale 5.0 fix level is now up on Fix Central.

You need to be at Scale 4.2.3 (cluster level) to do a rolling upgrade to this 
level.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

From: "dw-not...@us.ibm.com" 
>
Reply-To: "dw-not...@us.ibm.com" 
>
Date: Monday, December 18, 2017 at 1:27 PM
Subject: [EXTERNAL] [Forums] 'g...@us.ibm.com' replied 
to the 'IBM Spectrum Scale V5.0 announcements' topic thread in the 'General 
Parallel File System - Announce (GPFS - Announce)' forum.

 
g...@us.ibm.com
 replied to the IBM Spectrum Scale V5.0 
announcements
 topic thread in the General Parallel File System - Announce (GPFS - 
Announce)
 forum.

IBM Spectrum Scale 5.0.0.0 is now available from IBM Fix Central:

http://www-933.ibm.com/support/fixcentral

This topic summarizes changes to the IBM Spectrum Scale licensed
program and the IBM Spectrum Scale library.

Summary of changes
for IBM Spectrum Scale version 5 release 0.0
as updated, April 2017

Changes to this release of the IBM Spectrum Scale licensed
program and the IBM Spectrum Scale library include the following:


Re: [gpfsug-discuss] Password to GUI forgotten

2017-12-18 Thread Buterbaugh, Kevin L
Hi All,

Sorry for the delay in getting back with you all … didn’t mean to leave this 
hanging, but some higher priority things came up.

Bottom line - I’m still stuck and probably going to open up a PMR with IBM 
after sending this.  Richards’ suggestion below errors for me on the “-g 
Administrator” part.  Other suggestions sent directly to me up to and including 
completely deleting the GPFS GUI and reinstalling have also not worked.

No matter what I do, I cannot log in to the GUI.  Thanks for the suggestions, 
though…

Kevin

On Dec 7, 2017, at 6:10 AM, Sobey, Richard A 
<r.so...@imperial.ac.uk<mailto:r.so...@imperial.ac.uk>> wrote:

Sorry I need to learn to read… didn’t see the “object ‘Administrator’ does not 
exist” error.

That said, my workaround for the problem of forgetting the password was to 
create a new “admin2” user and use that to reset the password on admin itself.

[root@gpfs cli]# ./mkuser admin2 -p Passw0rd -g Administrator,SecurityAdmin
EFSSG0019I The user admin2 has been successfully created.
EFSSG1000I The command completed successfully.


Cheers
Richard

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Sobey, Richard A
Sent: 07 December 2017 11:57
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Password to GUI forgotten

This happened to me a while back, I opened a pmr to get it sorted but it's just 
a case of running some cli commands. I'll dig it out.
Get Outlook for 
Android<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fghei36=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cba030691159e473668f408d53d6b930f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636482454631155492=sPRr5u4lo%2BAwBPBQ2%2BdXw%2F2EWUAqy30Fk0UNssRFWHU%3D=0>


From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Buterbaugh, Kevin L 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
Sent: Wednesday, December 6, 2017 10:41:12 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Password to GUI forgotten

All,

/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/chuser admin -p abc1231 -g 
Administrator,SecurityAdmin
EFSSP0010C CLI parser: The object "Administrator" specified for "-g" does not 
exist.
/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/chuser admin -p abc1231 -g SecurityAdmin
EFSSP0010C CLI parser: The object "SecurityAdmin" specified for "-g" does not 
exist.
/root
root@testnsd1#

I’ll also add that all of the work I did earlier in the fall was with the test 
cluster running an earlier version of GPFS and it’s subsequently been updated 
to GPFS 4.2.3.5 … not sure that’s relevant but wanted to mention it just in 
case.

Thanks!

Kevin


On Dec 6, 2017, at 4:32 PM, Joshua Kwedar 
<jdkwe...@gmail.com<mailto:jdkwe...@gmail.com>> wrote:

Hmm.. odd.

Here’s what the lsuser output should look like.

# /usr/lpp/mmfs/gui/cli/lsuser
Name  Long name Password status Group names Failed login 
attempts
admin   active  Administrator,SecurityAdmin 0
EFSSG1000I The command completed successfully.

Can you try something like…

# /usr/lpp/mmfs/gui/cli/mkuser admin -p abc1231 -g Administrator,SecurityAdmin



From: 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of "Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
Reply-To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: Wednesday, December 6, 2017 at 5:15 PM
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Password to GUI forgotten

All,

Sorry - should’ve mentioned that:

/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/chuser admin -p abc1231
EFSSG0001C Cannot validate option: login
/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/lsuser -Y
lsuser:user:HEADER:version:reserved:reserved:Name:Long name:Password 
status:Group names:Failed login attempts:
/root
root@testnsd1#

Weird - it’s like the login doesn’t exist … but like I said, I had logged into 
it prior to November.  Thanks...

Kevin

On Dec 6, 2017, at 4:10 PM, Joshua Kwedar (froz1) 
<jdkwe...@gmail.com<mailto:jdkwe...@gmail.com>> wrote:

The GUI password can be changed via command line using chuser.

/usr/lpp/mmfs/gui/cli/chuser

Usage is as follows (where userID = admin)


chuser userID {-p  | -l  | -a  | -

[gpfsug-discuss] mmbackup log file size after GPFS 4.2.3.5 upgrade

2017-12-14 Thread Buterbaugh, Kevin L
Hi All,

 26 mmbackupDors-20171023.log
 26 mmbackupDors-20171024.log
 26 mmbackupDors-20171025.log
 26 mmbackupDors-20171026.log
2922752 mmbackupDors-20171027.log
137 mmbackupDors-20171028.log
  59328 mmbackupDors-20171029.log
2748095 mmbackupDors-20171030.log
 124953 mmbackupDors-20171031.log

That’s “wc -l” output … and the difference in size occurred with the GPFS 
4.2.3.5 upgrade.  I’m not technically “responsible” for mmbackup here, so I’m 
not at all familiar with it.  However, we’ve asked a certain vendor (not IBM) 
about it and they don’t know either, so I don’t feel too awfully bad.

And we have looked at the man page and didn’t see any obvious options to 
decrease the verbosity.  We did not make any changes to the backup script 
itself, so the mmbackup invocation is the same.  Any ideas?  Thanks…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Password to GUI forgotten

2017-12-06 Thread Buterbaugh, Kevin L
All,

/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/chuser admin -p abc1231 -g 
Administrator,SecurityAdmin
EFSSP0010C CLI parser: The object "Administrator" specified for "-g" does not 
exist.
/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/chuser admin -p abc1231 -g SecurityAdmin
EFSSP0010C CLI parser: The object "SecurityAdmin" specified for "-g" does not 
exist.
/root
root@testnsd1#

I’ll also add that all of the work I did earlier in the fall was with the test 
cluster running an earlier version of GPFS and it’s subsequently been updated 
to GPFS 4.2.3.5 … not sure that’s relevant but wanted to mention it just in 
case.

Thanks!

Kevin

On Dec 6, 2017, at 4:32 PM, Joshua Kwedar 
<jdkwe...@gmail.com<mailto:jdkwe...@gmail.com>> wrote:

Hmm.. odd.

Here’s what the lsuser output should look like.

# /usr/lpp/mmfs/gui/cli/lsuser
Name  Long name Password status Group names Failed login 
attempts
admin   active  Administrator,SecurityAdmin 0
EFSSG1000I The command completed successfully.

Can you try something like…

# /usr/lpp/mmfs/gui/cli/mkuser admin -p abc1231 -g Administrator,SecurityAdmin



From: 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of "Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
Reply-To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: Wednesday, December 6, 2017 at 5:15 PM
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Password to GUI forgotten

All,

Sorry - should’ve mentioned that:

/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/chuser admin -p abc1231
EFSSG0001C Cannot validate option: login
/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/lsuser -Y
lsuser:user:HEADER:version:reserved:reserved:Name:Long name:Password 
status:Group names:Failed login attempts:
/root
root@testnsd1#

Weird - it’s like the login doesn’t exist … but like I said, I had logged into 
it prior to November.  Thanks...

Kevin


On Dec 6, 2017, at 4:10 PM, Joshua Kwedar (froz1) 
<jdkwe...@gmail.com<mailto:jdkwe...@gmail.com>> wrote:

The GUI password can be changed via command line using chuser.

/usr/lpp/mmfs/gui/cli/chuser


Usage is as follows (where userID = admin)



chuser userID {-p  | -l  | -a  | -d 
 | -g  | --expirePassword} [-o ]



Josh K

On Dec 6, 2017, at 4:56 PM, Buterbaugh, Kevin L 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>> wrote:
Hi All,

So this is embarrassing to admit but I was playing around with setting up the 
GPFS GUI on our test cluster earlier this fall.  However, I was gone pretty 
much the entire month of November for a combination of vacation and SC17 and 
the vacation was so relaxing that I’ve forgotten the admin password for the 
GPFS GUI.  :-(

Is there anything I can do to recover from this short of deleting the GPFS GUI 
related RPM’s, re-installing, and starting over from scratch?  If that’s what I 
have to do, it’s no big deal as this is just our little 6-node test cluster, 
but I thought I’d ask before going down that route.

Oh, and if someone has a way to accomplish this that they’d rather not share in 
a public mailing list for any reason, please feel free to e-mail me directly, 
let me know, and I won’t tell if you won’t tell (and hopefully Michael Flynn 
won’t tell either!)…. ;-)

Thanks…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2c4a1bef0e00499c674b08d53cf622f5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636481950193934604=GyfwFULjOHTUHI2iRFDMdE1BA1A9zJebk8IxeTRGSiY%3D=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2c4a1bef0e00499c674b08d53cf622f5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636481950193934604=Nr824%2F2JVtw4EosfKUypg3mvvaxTJOeHxZETl3mN2tI%3D=0>
___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb77cd03d335947ea677008d53cf93ccf%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636481963514931393=aADPZlmzYyxhM0IhmZ4rUPOIJrR8DLodnS

Re: [gpfsug-discuss] Password to GUI forgotten

2017-12-06 Thread Buterbaugh, Kevin L
All,

Sorry - should’ve mentioned that:

/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/chuser admin -p abc1231
EFSSG0001C Cannot validate option: login
/root
root@testnsd1# /usr/lpp/mmfs/gui/cli/lsuser -Y
lsuser:user:HEADER:version:reserved:reserved:Name:Long name:Password 
status:Group names:Failed login attempts:
/root
root@testnsd1#

Weird - it’s like the login doesn’t exist … but like I said, I had logged into 
it prior to November.  Thanks...

Kevin

On Dec 6, 2017, at 4:10 PM, Joshua Kwedar (froz1) 
<jdkwe...@gmail.com<mailto:jdkwe...@gmail.com>> wrote:

The GUI password can be changed via command line using chuser.

/usr/lpp/mmfs/gui/cli/chuser

Usage is as follows (where userID = admin)


chuser userID  {-p  | -l  | -a  | -d 
 | -g  | --expirePassword} [-o ]


Josh K

On Dec 6, 2017, at 4:56 PM, Buterbaugh, Kevin L 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi All,

So this is embarrassing to admit but I was playing around with setting up the 
GPFS GUI on our test cluster earlier this fall.  However, I was gone pretty 
much the entire month of November for a combination of vacation and SC17 and 
the vacation was so relaxing that I’ve forgotten the admin password for the 
GPFS GUI.  :-(

Is there anything I can do to recover from this short of deleting the GPFS GUI 
related RPM’s, re-installing, and starting over from scratch?  If that’s what I 
have to do, it’s no big deal as this is just our little 6-node test cluster, 
but I thought I’d ask before going down that route.

Oh, and if someone has a way to accomplish this that they’d rather not share in 
a public mailing list for any reason, please feel free to e-mail me directly, 
let me know, and I won’t tell if you won’t tell (and hopefully Michael Flynn 
won’t tell either!)…. ;-)

Thanks…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2c4a1bef0e00499c674b08d53cf622f5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636481950193934604=GyfwFULjOHTUHI2iRFDMdE1BA1A9zJebk8IxeTRGSiY%3D=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2c4a1bef0e00499c674b08d53cf622f5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636481950193934604=Nr824%2F2JVtw4EosfKUypg3mvvaxTJOeHxZETl3mN2tI%3D=0>
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2c4a1bef0e00499c674b08d53cf622f5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636481950193934604=Nr824%2F2JVtw4EosfKUypg3mvvaxTJOeHxZETl3mN2tI%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Password to GUI forgotten

2017-12-06 Thread Buterbaugh, Kevin L
Hi All,

So this is embarrassing to admit but I was playing around with setting up the 
GPFS GUI on our test cluster earlier this fall.  However, I was gone pretty 
much the entire month of November for a combination of vacation and SC17 and 
the vacation was so relaxing that I’ve forgotten the admin password for the 
GPFS GUI.  :-(

Is there anything I can do to recover from this short of deleting the GPFS GUI 
related RPM’s, re-installing, and starting over from scratch?  If that’s what I 
have to do, it’s no big deal as this is just our little 6-node test cluster, 
but I thought I’d ask before going down that route.

Oh, and if someone has a way to accomplish this that they’d rather not share in 
a public mailing list for any reason, please feel free to e-mail me directly, 
let me know, and I won’t tell if you won’t tell (and hopefully Michael Flynn 
won’t tell either!)…. ;-)

Thanks…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] 5.0 features?

2017-11-29 Thread Buterbaugh, Kevin L
Simon in correct … I’d love to be able to support a larger block size for my 
users who have sane workflows while still not wasting a ton of space for the 
biomedical folks…. ;-)

A question … will the new, much improved, much faster mmrestripefs that was 
touted at SC17 require a filesystem that was created with GPFS / Tiger Shark / 
Spectrum Scale / Multi-media filesystem () version 5 or simply one that 
has been “upgraded” to that format?

Thanks…

Kevin

> On Nov 29, 2017, at 11:43 AM, Simon Thompson (IT Research Support) 
>  wrote:
> 
> You can in place upgrade.
> 
> I think what people are referring to is likely things like the new sub block 
> sizing for **new** filesystems.
> 
> Simon
> 
> From: gpfsug-discuss-boun...@spectrumscale.org 
> [gpfsug-discuss-boun...@spectrumscale.org] on behalf of 
> jfosb...@mdanderson.org [jfosb...@mdanderson.org]
> Sent: 29 November 2017 17:40
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] 5.0 features?
> 
> I haven’t even heard it’s been released or has been announced.  I’ve 
> requested a roadmap discussion.
> 
> From:  on behalf of Marc A Kaplan 
> 
> Reply-To: gpfsug main discussion list 
> Date: Wednesday, November 29, 2017 at 11:38 AM
> To: gpfsug main discussion list 
> Subject: Re: [gpfsug-discuss] 5.0 features?
> 
> Which features of 5.0 require a not-in-place upgrade of a file system?  Where 
> has this information been published?
> 
> 
> The information contained in this e-mail message may be privileged, 
> confidential, and/or protected from disclosure. This e-mail message may 
> contain protected health information (PHI); dissemination of PHI should 
> comply with applicable federal and state laws. If you are not the intended 
> recipient, or an authorized representative of the intended recipient, any 
> further review, disclosure, use, dissemination, distribution, or copying of 
> this message or any attachment (or the information contained therein) is 
> strictly prohibited. If you think that you have received this e-mail message 
> in error, please notify the sender by return e-mail and delete all references 
> to it and its contents from your systems.
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C755e8b13215f48e4e21508d53750ac45%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636475741979446614=RpfsLbGTRtlZQ06Winrn65jXQlDYjFHdWuKMvEyZwBI%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Online data migration tool

2017-11-29 Thread Buterbaugh, Kevin L
Hi All,

Well, actually a year ago we started the process of doing pretty much what 
Richard describes below … the exception being that we rsync’d data over to the 
new filesystem group by group.  It was no fun but it worked.  And now GPFS (and 
it will always be GPFS … it will never be Spectrum Scale) version 5 is coming 
and there are compelling reasons to want to do the same thing over again … 
despite the pain.

Having said all that, I think it would be interesting to have someone from IBM 
give an explanation of why Apple can migrate millions of devices to a new 
filesystem with 99.99% of the users never even knowing they did it … but 
IBM can’t provide a way to migrate to a new filesystem “in place.”

And to be fair to IBM, they do ship AIX with root having a password and Apple 
doesn’t, so we all have our strengths and weaknesses!  ;-)

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Nov 29, 2017, at 10:39 AM, Sobey, Richard A 
> wrote:

Could we utilise free capacity in the existing filesystem and empty NSDs, 
create a new FS and AFM migrate data in stages? Terribly long winded and 
frought with danger and peril... do not pass go... ah, answered my own question.



Richard

-Original Message-
From: 
gpfsug-discuss-boun...@spectrumscale.org
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Jonathan Buzzard
Sent: 29 November 2017 16:35
To: gpfsug main discussion list 
>
Subject: Re: [gpfsug-discuss] Online data migration tool

On Wed, 2017-11-29 at 11:00 -0500, Yugendra Guvvala wrote:
Hi,

I am trying to understand the technical challenges to migrate to GPFS
5.0 from GPFS 4.3. We currently run GPFS 4.3 and i was all exited to
see 5.0 release and hear about some promising features available. But
not sure about complexity involved to migrate.


Oh that's simple. You copy all your data somewhere else (good luck if you 
happen to have a few hundred TB or maybe a PB or more) then reformat your files 
system with the new disk format then restore all your data to your shiny new 
file system.

Over the years there have been a number of these "reformats" to get all the new 
shiny features, which is the cause of the grumbles because it is not funny and 
most people don't have the disk space to just hold another copy of the data, 
and even if they did it is extremely disruptive.

JAB.

--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Rainy days and Mondays and GPFS lying to me always get me down...

2017-10-23 Thread Buterbaugh, Kevin L
Hi All,

And I’m not really down, but it is a rainy Monday morning here and GPFS did 
give me a scare in the last hour, so I thought that was a funny subject line.

So I have a >1 PB filesystem with 3 pools:  1) the system pool, which contains 
metadata only,  2) the data pool, which is where all I/O goes to by default, 
and 3) the capacity pool, which is where old crap gets migrated to.

I logged on this morning to see an alert that my data pool was 100% full.  I 
ran an mmdf from the cluster manager and, sure enough:

(pool total)   509.3T 0 (  0%)  
   0 ( 0%)

I immediately tried copying a file to there and it worked, so I figured GPFS 
must be failing writes over to the capacity pool, but an mmlsattr on the file I 
copied showed it being in the data pool.  Hmmm.

I also noticed that “df -h” said that the filesystem had 399 TB free, while 
mmdf said it only had 238 TB free.  Hmmm.

So after some fruitless poking around I decided that whatever was going to 
happen, I should kill the mmrestripefs I had running on the capacity pool … let 
me emphasize that … I had a restripe running on the capacity pool only (via the 
“-P” option to mmrestripefs) but it was the data pool that said it was 100% 
full.

I’m sure many of you have already figured out where this is going … after 
killing the restripe I ran mmdf again and:

(pool total)   509.3T  159T ( 31%)  
  1.483T ( 0%)

I have never seen anything like this before … any ideas, anyone?  PMR time?

Thanks!

Kevin
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] CCR cluster down for the count?

2017-09-21 Thread Buterbaugh, Kevin L
Hi All,

Ralf Eberhard of IBM helped me resolve this off list.  The key was to 
temporarily make testnsd1 and testnsd3 not be quorum nodes by making sure GPFS 
was down and then executing:

mmchnode --nonquorum -N testnsd1,testnsd3 --force

That gave me some scary messages about overriding normal GPFS quorum semantics, 
but nce that was done I was able to run an “mmstartup -a” and bring up the 
cluster!  Once it was up and I had verified things were working properly I then 
shut it back down so that I could rerun the mmchnode (without the —force) to 
make testnsd1 and testnsd3 quorum nodes again.

Thanks to all who helped me out here…

Kevin

On Sep 20, 2017, at 2:07 PM, Edward Wahl <ew...@osc.edu<mailto:ew...@osc.edu>> 
wrote:


So who was the ccrmaster before?
What is/was the quorum config?  (tiebreaker disks?)

what does 'mmccr check' say?


Have you set DEBUG=1 and tried mmstartup to see if it teases out any more info
from the error?


Ed


On Wed, 20 Sep 2017 16:27:48 +
"Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi Ed,

Thanks for the suggestion … that’s basically what I had done yesterday after
Googling and getting a hit or two on the IBM DeveloperWorks site.  I’m
including some output below which seems to show that I’ve got everything set
up but it’s still not working.

Am I missing something?  We don’t use CCR on our production cluster (and this
experience doesn’t make me eager to do so!), so I’m not that familiar with
it...

Kevin

/var/mmfs/gen
root@testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v
grep" | sort testdellnode1:  root  2583 1  0 May30 ?
00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
testdellnode1:  root  6694  2583  0 11:19 ?
00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
testgateway:  root  2023  5828  0 11:19 ?
00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
testgateway:  root  5828 1  0 Sep18 ?
00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1:
root 19356  4628  0 11:19 tty1
00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1:
root  4628 1  0 Sep19 tty1
00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2:
root 22149  2983  0 11:16 ?
00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2:
root  2983 1  0 Sep18 ?
00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3:
root 15685  6557  0 11:19 ?
00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3:
root  6557 1  0 Sep19 ?
00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
testsched:  root 29424  6512  0 11:19 ?
00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
testsched:  root  6512 1  0 Sep18 ?
00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor
15 /var/mmfs/gen root@testnsd2# mmstartup -a get file failed: Not enough CCR
quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr
fget mmsdrfs.  Return code: 158 mmstartup: Command failed. Examine previous
error messages to determine cause. /var/mmfs/gen root@testnsd2# mmdsh
-F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort testdellnode1:
drwxr-xr-x 2 root root 4096 Mar  3  2017 cached testdellnode1:  drwxr-xr-x 2
root root 4096 Nov 10  2016 committed testdellnode1:  -rw-r--r-- 1 root
root   99 Nov 10  2016 ccr.nodes testdellnode1:  total 12 testgateway:
drwxr-xr-x. 2 root root 4096 Jun 29  2016 committed testgateway:  drwxr-xr-x.
2 root root 4096 Mar  3  2017 cached testgateway:  -rw-r--r--. 1 root root
99 Jun 29  2016 ccr.nodes testgateway:  total 12 testnsd1:  drwxr-xr-x 2 root
root  6 Sep 19 15:38 cached testnsd1:  drwxr-xr-x 2 root root  6 Sep 19 15:38
committed testnsd1:  -rw-r--r-- 1 root root  0 Sep 19 15:39 ccr.disks
testnsd1:  -rw-r--r-- 1 root root  4 Sep 19 15:38 ccr.noauth testnsd1:
-rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes testnsd1:  total 8
testnsd2:  drwxr-xr-x 2 root root   22 Mar  3  2017 cached testnsd2:
drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed testnsd2:  -rw--- 1
root root 4096 Sep 18 11:50 ccr.paxos.1 testnsd2:  -rw--- 1 root root
4096 Sep 18 11:50 ccr.paxos.2 testnsd2:  -rw-r--r-- 1 root root0 Jun 29
2016 ccr.disks testnsd2:  -rw-r--r-- 1 root root   99 Jun 29  2016 ccr.nodes
testnsd2:  total 16 testnsd3:  drwxr-xr-x 2 root root  6 Sep 19 15:41 cached
testnsd3:  drwxr-xr-x 2 root root  6 Sep 19 15:41 committed testnsd3:
-rw-r--r-- 1 root root  0 Jun 29  2016 ccr.disks testnsd3:  -rw-r--r-- 1 root
root  4 Sep 19 15:41 ccr.noauth testnsd3:  -rw-r--r-- 1 root root 99 Jun 29
2016 ccr.nodes testnsd3:  total 8 testsched:  drwxr-xr-x. 2 root root 4096
Jun 29  2016 committed testsched:  drwxr-xr-x. 2 root root 4096 Mar  3  2017
cached testsched:  -rw-r--r--. 1 root root

Re: [gpfsug-discuss] CCR cluster down for the count?

2017-09-20 Thread Buterbaugh, Kevin L
 code 255.
vmp608.vampire:  Host key verification failed.
mmdsh: vmp608.vampire remote shell process had return code 255.
vmp609.vampire:  Host key verification failed.
mmdsh: vmp609.vampire remote shell process had return code 255.
testnsd1.vampire:  Host key verification failed.
mmdsh: testnsd1.vampire remote shell process had return code 255.
vmp610.vampire:  Permission denied, please try again.
vmp610.vampire:  Permission denied, please try again.
vmp610.vampire:  Permission denied 
(publickey,gssapi-keyex,gssapi-with-mic,password).
mmdsh: vmp610.vampire remote shell process had return code 255.
mmchcluster: Command failed. Examine previous error messages to determine cause.
/var/mmfs/gen
root@testnsd2#

I believe that part of the problem may be that there are 4 client nodes that 
were removed from the cluster without removing them from the cluster (done by 
another SysAdmin who was in a hurry to repurpose those machines).  They’re up 
and pingable but not reachable by GPFS anymore, which I’m pretty sure is making 
things worse.

Nor does Loic’s suggestion of running mmcommon work (but thanks for the 
suggestion!) … actually the mmcommon part worked, but a subsequent attempt to 
start the cluster up failed:

/var/mmfs/gen
root@testnsd2# mmstartup -a
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158
mmstartup: Command failed. Examine previous error messages to determine cause.
/var/mmfs/gen
root@testnsd2#

Thanks.

Kevin

On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale 
<sc...@us.ibm.com<mailto:sc...@us.ibm.com>> wrote:


Hi Kevin,

Let's me try to understand the problem you have. What's the meaning of node 
died here. Are you mean that there are some hardware/OS issue which cannot be 
fixed and OS cannot be up anymore?

I agree with Bob that you can have a try to disable CCR temporally, restore 
cluster configuration and enable it again.

Such as:

1. Login to a node which has proper GPFS config, e.g NodeA
2. Shutdown daemon in all client cluster.
3. mmchcluster --ccr-disable -p NodeA
4. mmsdrrestore -a -p NodeA
5. mmauth genkey propagate -N testnsd1, testnsd3
6. mmchcluster --ccr-enable

Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D----0479=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768=rDOjWbVnVsp5M75VorQgDtZhxMrgvwIgV%2BReJgt5ZUs%3D=0>.

If your query concerns a potential software error in Spectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact 1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.

"Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK – I’ve run 
across this before, and it’s because of a bug (as I recall) having to do with 
CCR and

From: "Oesterlin, Robert" 
<robert.oester...@nuance.com<mailto:robert.oester...@nuance.com>>
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: 09/20/2017 07:39 AM
Subject: Re: [gpfsug-discuss] CCR cluster down for the count?
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





OK – I’ve run across this before, and it’s because of a bug (as I recall) 
having to do with CCR and quorum. What I think you can do is set the cluster to 
non-ccr (mmchcluster –ccr-disable) with all the nodes down, bring it back up 
and then re-enable ccr.

I’ll see if I can find this in one of the recent 4.2 release nodes.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of "Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
Reply-To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date: Tuesday, September 19, 2017 at 4:03 PM
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count?

Hi All,

We ha

[gpfsug-discuss] CCR cluster down for the count?

2017-09-19 Thread Buterbaugh, Kevin L
Hi All,

We have a small test cluster that is CCR enabled.  It only had/has 3 NSD 
servers (testnsd1, 2, and 3) and maybe 3-6 clients.  testnsd3 died a while 
back.  I did nothing about it at the time because it was due to be life-cycled 
as soon as I finished a couple of higher priority projects.

Yesterday, testnsd1 also died, which took the whole cluster down.  So now 
resolving this has become higher priority… ;-)

I took two other boxes and set them up as testnsd1 and 3, respectively.  I’ve 
done a “mmsdrrestore -p testnsd2 -R /usr/bin/scp” on both of them.  I’ve also 
done a "mmccr setup -F” and copied the ccr.disks and ccr.nodes files from 
testnsd2 to them.  And I’ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to 
testnsd1 and 3.  In case it’s not obvious from the above, networking is fine … 
ssh without a password between those 3 boxes is fine.

However, when I try to startup GPFS … or run any GPFS command I get:

/root
root@testnsd2# mmstartup -a
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158
mmstartup: Command failed. Examine previous error messages to determine cause.
/root
root@testnsd2#

I’ve got to run to a meeting right now, so I hope I’m not leaving out any 
crucial details here … does anyone have an idea what I need to do?  Thanks…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Permissions issue in GPFS 4.2.3-4?

2017-08-30 Thread Buterbaugh, Kevin L
Hi All,

We have a script that takes the output of mmlsfs and mmlsquota and formats a 
users’ GPFS quota usage into something a little “nicer” than what mmlsquota 
displays (and doesn’t display 50 irrelevant lines of output for filesets they 
don’t have access to).  After upgrading to 4.2.3-4 over the weekend it started 
throwing errors it hadn’t before:

awk: cmd. line:11: fatal: cannot open file `/var/mmfs/gen/mmfs.cfg.show' for 
reading (Permission denied)
mmlsfs: Unexpected error from awk. Return code: 2
awk: cmd. line:11: fatal: cannot open file `/var/mmfs/gen/mmfs.cfg.show' for 
reading (Permission denied)
mmlsfs: Unexpected error from awk. Return code: 2
Home (user): 11.82G 30G 40G 10807 20 30
awk: cmd. line:11: fatal: cannot open file `/var/mmfs/gen/mmfs.cfg.show' for 
reading (Permission denied)
mmlsquota: Unexpected error from awk. Return code: 2

It didn’t take long to track down that the mmfs.cfg.show file had permissions 
of 600 and a chmod 644 of it (on our login gateways only, which is the only 
place users run that script anyway) fixed the problem.

So I just wanted to see if this was a known issue in 4.2.3-4?  Notice that the 
error appears to be coming from the GPFS commands my script runs, not my script 
itself … I sure don’t call awk!  ;-)

Thanks…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] GPFS 4.2.3.4 question

2017-08-30 Thread Buterbaugh, Kevin L
Hi Bryan,

NO - it has the fix for the mmrestripefs data loss bug, but you need the efix 
on top of 4.2.3-4 for the mmadddisk / mmdeldisk issue.

Let me take this opportunity to also explain a workaround that has worked for 
us so far for that issue … the basic problem is two-fold (on our cluster, at 
least).  First, the /var/mmfs/gen/mmsdrfs file isn’t making it out to all nodes 
all the time.  That is simple enough to fix (mmrefresh -fa) and verify that 
it’s fixed (md5sum /var/mmfs/gen/mmsdrfs).

Second, however - and this is the real problem … some nodes are never actually 
rereading that file and therefore have incorrect information *in memory*.  This 
has been especially problematic for us as we are replacing a batch of 80 8 TB 
drives with bad firmware.  I am therefore deleting and subsequently recreating 
NSDs *with the same name*.  If a client node still has the “old” information in 
memory then it unmounts the filesystem when I try to mmadddisk the new NSD.

The workaround is to identify those nodes (mmfsadm dump nsd and grep for the 
identifier of the NSD(s) in question) and force them to reread the info (tsctl 
rereadnsd).

HTH…

Kevin

On Aug 30, 2017, at 9:21 AM, Bryan Banister 
<bbanis...@jumptrading.com<mailto:bbanis...@jumptrading.com>> wrote:

Ok, I’m completely confused… You’re saying 4.2.3-4 *has* the fix for 
adding/deleting NSDs?
-Bryan

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Sobey, Richard A
Sent: Wednesday, August 30, 2017 9:13 AM
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] GPFS 4.2.3.4 question

Note: External Email

Aha, I’ve just realised what you actually said, having seen Simon’s response 
and twigged. The defect 1020461 matches what IBM has told me in my PMR about 
adding/deleting NSDs. I’m not sure why the description mentions networking 
though!

Richard

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Sobey, Richard A
Sent: 30 August 2017 14:56
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] GPFS 4.2.3.4 question

No worries, I’ve got it sorted and hopefully about to grab the 4.2.3-4 efix2.

Cheers for your help!
Richard

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Buterbaugh, 
Kevin L
Sent: 30 August 2017 14:55
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] GPFS 4.2.3.4 question

Hi Richard,

Well, I’m not sure, which is why it’s taken me a while to respond.  In the 
README that comes with the efix it lists:

Defect APARDescription

1032655 None AFM: Fix Truncate filtering Write incorrectly
1020461 None FS can't be mounted after weird networking error

That 1st one is obviously not it and that 2nd one doesn’t reference mmadddisk / 
mmdeldisk.  Plus neither show an APAR number.

Sorry I can’t be of more help…

Kevin

On Aug 29, 2017, at 12:52 PM, Sobey, Richard A 
<r.so...@imperial.ac.uk<mailto:r.so...@imperial.ac.uk>> wrote:

Thanks Kevin, that's good to know. Is there an apar I need to quote in my pmr?
Get Outlook for 
Android<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fghei36=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C493f1f9e41e343324f1508d4efb25f4f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636396996783027614=op1veAuespuhwL9zsBMZe%2FwVAn6NC%2FkL0U4IKtAmVT4%3D=0>


From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
<gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Buterbaugh, Kevin L 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
Sent: Tuesday, August 29, 2017 4:53:51 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS 4.2.3.4 question

Hi Richard,

Since I upgraded my cluster to GPFS 4.2.3.4 over the weekend IBM created an 
efix for it for the NSD deletion / creation fix.  I’m sure they’ll give it to 
you, too…  ;-)

Kevin

On Aug 29, 2017, at 9:30 AM, Sobey, Richard A 
<r.so...@imperial.ac.uk<mailto:r.so...@imperial.ac.uk>> wrote:

So I can upgrade to 4.2.3-4 to get the mmrestripe fix, or 4.2.3-3 efix3 to get 
the NSD deletion and creation fix? Not great when on Monday I’m doing a load of 
all this. What’s the recommendation? Is there a one size fits all patch?

Re: [gpfsug-discuss] GPFS 4.2.3.4 question

2017-08-29 Thread Buterbaugh, Kevin L
Hi Richard,

Since I upgraded my cluster to GPFS 4.2.3.4 over the weekend IBM created an 
efix for it for the NSD deletion / creation fix.  I’m sure they’ll give it to 
you, too…  ;-)

Kevin

On Aug 29, 2017, at 9:30 AM, Sobey, Richard A 
<r.so...@imperial.ac.uk<mailto:r.so...@imperial.ac.uk>> wrote:

So I can upgrade to 4.2.3-4 to get the mmrestripe fix, or 4.2.3-3 efix3 to get 
the NSD deletion and creation fix? Not great when on Monday I’m doing a load of 
all this. What’s the recommendation? Is there a one size fits all patch?

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Frederick Stock
Sent: 27 August 2017 01:35
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Subject: Re: [gpfsug-discuss] GPFS 4.2.3.4 question

The only change missing is the change delivered  in 4.2.3 PTF3 efix3 which was 
provided on August 22.  The problem had to do with NSD deletion and creation.

Fred
__
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
sto...@us.ibm.com<mailto:sto...@us.ibm.com>



From:"Buterbaugh, Kevin L" 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>
To:gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Date:08/26/2017 03:40 PM
Subject:[gpfsug-discuss] GPFS 4.2.3.4 question
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

Does anybody know if GPFS 4.2.3.4, which came out today, contains all the 
patches that are in GPFS 4.2.3.3 efix3?

If anybody does, and can respond, I’d greatly appreciate it.  Our cluster is in 
a very, very bad state right now and we may need to just take it down and bring 
it back up.  I was already planning on rolling out GPFS 4.2.3.3 efix 3 over the 
next few weeks anyway, so if I can just go to 4.2.3.4 that would be great…

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>- 
(615)875-9633


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw=7r9GsD1C2HiY4j21vPYIoQPHXePHxeMhzQeaw_ne4lM=-SFnqoJw--FN3wqClEEBGa9-XSLljgSseIU_SxGoWy0=


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


  1   2   >