from:"Buterbaugh, Kevin L"

Hi Jonathan,

Thanks.  We have done a very similar thing when we’re dealing with a situation 
where:  1) all files and directories in the fileset are starting out with the 
same existing ACL, and 2) all need the same modification made to them.

Unfortunately, in this situation item 2 is true, but item 1 is _not_.  That’s 
what’s making this one a bit thorny…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Mar 27, 2019, at 11:33 AM, Fosburgh,Jonathan 
mailto:jfosb...@mdanderson.org>> wrote:

I misunderstood you.

Pretty much what we've been doing is maintaining "ACL template" files based on 
how our filesystem hierarchy is set up.  Basically, fileset foo has a foo.acl 
file that contains what the ACL is supposed to be.  If we need to change the 
ACL, we modify that file with the new ACL and then pass it through a simple 
(and expensive, I'm sure) script.  This wouldn't be necessary if in heritance 
flowed down on existing files and directories.  If you have CIFS access, you 
can also use Windows to do this, but it is MUCH slower.

--
Jonathan Fosburgh
Principal Application Systems Analyst
IT Operations Storage Team
The University of Texas MD Anderson Cancer Center
(713) 745-9346
[X]

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>>
Sent: Wednesday, March 27, 2019 11:19:03 AM
To: gpfsug main discussion list
Subject: [EXT] Re: [gpfsug-discuss] Adding to an existing GPFS ACL

WARNING: This email originated from outside of MD Anderson. Please validate the 
sender's email address before clicking on links or attachments as they may not 
be safe.
Hi Jonathan,

Thanks for the response.  I did look at mmeditacl, but unless I’m missing 
something it’s interactive (kind of like mmedquota is by default).  If I had 
only a handful of files / directories to modify that would be fine, but in this 
case there are thousands of ACL’s that need modifying.

Am I missing something?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Mar 27, 2019, at 11:02 AM, Fosburgh,Jonathan 
mailto:jfosb...@mdanderson.org>> wrote:

Try mmeditacl.

--
Jonathan Fosburgh
Principal Application Systems Analyst
IT Operations Storage Team
The University of Texas MD Anderson Cancer Center
(713) 745-9346
[X]

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>>
Sent: Wednesday, March 27, 2019 10:59:17 AM
To: gpfsug main discussion list
Subject: [EXT] [gpfsug-discuss] Adding to an existing GPFS ACL

WARNING: This email originated from outside of MD Anderson. Please validate the 
sender's email address before clicking on links or attachments as they may not 
be safe.
Hi All,

First off, I have very limited experience with GPFS ACL’s, so please forgive me 
if I’m missing something obvious here.  AFAIK, this is the first time we’ve hit 
something like this…

We have a fileset where all the files / directories have GPFS NFSv4 ACL’s set 
on them.  However, unlike most of our filesets where the same ACL is applied to 
every file / directory in the share, this one has different ACL’s on different 
files / directories.  Now we have the need to add to the existing ACL’s … 
another group needs access.  Unlike regular Unix / Linux ACL’s where setfacl 
can be used to just add to an ACL (i.e. setfacl -R g:group_name:rwx), I’m not 
seeing where GPFS has a similar command … i.e. mmputacl seems to expect the 
_entire_ new ACL to be supplied via either manual entry or an input file.  
That’s obviously problematic in this scenario.

So am I missing something?  Is there an easier solution than writing a script 
which recurses over the fileset, gets the existing ACL with mmgetacl and 
outputs that to a file, edits that file to add in the new group, and passes 
that as input to mmputacl?  That seems very cumbersome and error prone, 
especially if I’m the one writing the script!

Thanks…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

The information contained in this e-mail message may be privileged, 
confidential, and/or protected from disclosure. This e-mail m

Re: [gpfsug-discuss] Adding to an existing GPFS ACL

Hi Jonathan,

Thanks for the response.  I did look at mmeditacl, but unless I’m missing 
something it’s interactive (kind of like mmedquota is by default).  If I had 
only a handful of files / directories to modify that would be fine, but in this 
case there are thousands of ACL’s that need modifying.

Am I missing something?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Mar 27, 2019, at 11:02 AM, Fosburgh,Jonathan 
mailto:jfosb...@mdanderson.org>> wrote:

Try mmeditacl.

--
Jonathan Fosburgh
Principal Application Systems Analyst
IT Operations Storage Team
The University of Texas MD Anderson Cancer Center
(713) 745-9346
[X]

From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 on behalf of Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>>
Sent: Wednesday, March 27, 2019 10:59:17 AM
To: gpfsug main discussion list
Subject: [EXT] [gpfsug-discuss] Adding to an existing GPFS ACL

WARNING: This email originated from outside of MD Anderson. Please validate the 
sender's email address before clicking on links or attachments as they may not 
be safe.
Hi All,

First off, I have very limited experience with GPFS ACL’s, so please forgive me 
if I’m missing something obvious here.  AFAIK, this is the first time we’ve hit 
something like this…

We have a fileset where all the files / directories have GPFS NFSv4 ACL’s set 
on them.  However, unlike most of our filesets where the same ACL is applied to 
every file / directory in the share, this one has different ACL’s on different 
files / directories.  Now we have the need to add to the existing ACL’s … 
another group needs access.  Unlike regular Unix / Linux ACL’s where setfacl 
can be used to just add to an ACL (i.e. setfacl -R g:group_name:rwx), I’m not 
seeing where GPFS has a similar command … i.e. mmputacl seems to expect the 
_entire_ new ACL to be supplied via either manual entry or an input file.  
That’s obviously problematic in this scenario.

So am I missing something?  Is there an easier solution than writing a script 
which recurses over the fileset, gets the existing ACL with mmgetacl and 
outputs that to a file, edits that file to add in the new group, and passes 
that as input to mmputacl?  That seems very cumbersome and error prone, 
especially if I’m the one writing the script!

Thanks…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

The information contained in this e-mail message may be privileged, 
confidential, and/or protected from disclosure. This e-mail message may contain 
protected health information (PHI); dissemination of PHI should comply with 
applicable federal and state laws. If you are not the intended recipient, or an 
authorized representative of the intended recipient, any further review, 
disclosure, use, dissemination, distribution, or copying of this message or any 
attachment (or the information contained therein) is strictly prohibited. If 
you think that you have received this e-mail message in error, please notify 
the sender by return e-mail and delete all references to it and its contents 
from your systems.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb2040f23087c4aac0b4908d6b2cf11ed%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636892999763011551sdata=pXhLlRfQuJ4bKfib4bQBlWY4OP5WoZh1YQ%2Bjne2ycEY%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Adding to an existing GPFS ACL

Hi All,

First off, I have very limited experience with GPFS ACL’s, so please forgive me 
if I’m missing something obvious here.  AFAIK, this is the first time we’ve hit 
something like this…

We have a fileset where all the files / directories have GPFS NFSv4 ACL’s set 
on them.  However, unlike most of our filesets where the same ACL is applied to 
every file / directory in the share, this one has different ACL’s on different 
files / directories.  Now we have the need to add to the existing ACL’s … 
another group needs access.  Unlike regular Unix / Linux ACL’s where setfacl 
can be used to just add to an ACL (i.e. setfacl -R g:group_name:rwx), I’m not 
seeing where GPFS has a similar command … i.e. mmputacl seems to expect the 
_entire_ new ACL to be supplied via either manual entry or an input file.  
That’s obviously problematic in this scenario.

So am I missing something?  Is there an easier solution than writing a script 
which recurses over the fileset, gets the existing ACL with mmgetacl and 
outputs that to a file, edits that file to add in the new group, and passes 
that as input to mmputacl?  That seems very cumbersome and error prone, 
especially if I’m the one writing the script!

Thanks…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] GPFS v5: Blocksizes and subblocks

Hi All,

So I was looking at the presentation referenced below and it states - on 
multiple slides - that there is one system storage pool per cluster.  Really?  
Shouldn’t that be one system storage pool per filesystem?!?  If not, please 
explain how in my GPFS cluster with two (local) filesystems I see two different 
system pools with two different sets of NSDs, two different capacities, and two 
different percentages full???

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Mar 26, 2019, at 11:27 AM, Dorigo Alvise (PSI) 
mailto:alvise.dor...@psi.ch>> wrote:

Hi Marc,
"Indirect block size" is well explained in this presentation:

http://files.gpfsug.org/presentations/2016/south-bank/D2_P2_A_spectrum_scale_metadata_dark_V2a.pdf

pages 37-41

Cheers,

   Alvise


From: 
gpfsug-discuss-boun...@spectrumscale.org
 
[gpfsug-discuss-boun...@spectrumscale.org]
 on behalf of Caubet Serrabou Marc (PSI) 
[marc.cau...@psi.ch]
Sent: Tuesday, March 26, 2019 4:39 PM
To: gpfsug main discussion list
Subject: [gpfsug-discuss] GPFS v5: Blocksizes and subblocks

Hi all,

according to several GPFS presentations as well as according to the man pages:

 Table 1. Block sizes and subblock sizes

+---+---+
| Block size| Subblock size |
+---+---+
| 64 KiB| 2 KiB |
+---+---+
| 128 KiB   | 4 KiB |
+---+---+
| 256 KiB, 512 KiB, 1 MiB, 2| 8 KiB |
| MiB, 4 MiB|   |
+---+---+
| 8 MiB, 16 MiB | 16 KiB|
+---+---+

A block size of 8MiB or 16MiB should contain subblocks of 16KiB.

However, when creating a new filesystem with 16MiB blocksize, looks like is 
using 128KiB subblocks:

[root@merlindssio01 ~]# mmlsfs merlin
flagvaluedescription
---  ---
 -f 8192 Minimum fragment (subblock) size 
in bytes (system pool)
131072   Minimum fragment (subblock) size 
in bytes (other pools)
 -i 4096 Inode size in bytes
 -I 32768Indirect block size in bytes
.
.
.
 -n 128  Estimated number of nodes that 
will mount file system
 -B 1048576  Block size (system pool)
16777216 Block size (other pools)
.
.
.

What am I missing? According to documentation, I expect this to be a fixed 
value, or it isn't at all?

On the other hand, I don't really understand the concept 'Indirect block size 
in bytes', can somebody clarify or provide some details about this setting?

Thanks a lot and best regards,
Marc
_
Paul Scherrer Institut
High Performance Computing
Marc Caubet Serrabou
Building/Room: WHGA/019A
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland

Telephone: +41 56 310 46 67
E-Mail: marc.cau...@psi.ch
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C5b28a9a0d39a47fd3f0608d6b208186a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636892145179836634sdata=23F22sUiyCYEg0H3AdbkBAnhPpLVBVTh39zRr%2FLYCmc%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] SSDs for data - DWPD?

2019-03-18 Thread Buterbaugh, Kevin L

Thanks for the suggestion, Simon.  Yes, we’ve looked at that, but we think that 
we’re going to potentially be in a situation where we’re using fairly big SSDs 
already.  For example, if we bought 30 6.4 TB SSDs rated at 1 DWPD and 
configured them as 6 4+1P RAID 5 LUNs, then we’d end up with a usable capacity 
of 6 * 4 * 6 = ~144 TB usable space in our “hot” pool.  That would satisfy our 
capacity needs and also not exceed the 1 DWPD rating of the drives.

BTW, we noticed with one particular vendor that their 3 DWPD drives were 
exactly 1/3rd the size of their 1 DWPD drives … which makes us wonder if that’s 
coincidence or not.  Anybody know for sure?

Thanks…

Kevin

> On Mar 18, 2019, at 4:13 PM, Simon Thompson  wrote:
> 
> Did you look at pricing larger SSDs than you need and only using partial 
> capacity to get more DWPD out of them?
> 
> I.e. 1TB drive 3dpwd = 3TBpd
> 2TB drive (using 1/2 capacity) = 6TBpd
> 
> Simon
> 
> From: gpfsug-discuss-boun...@spectrumscale.org 
> [gpfsug-discuss-boun...@spectrumscale.org] on behalf of Buterbaugh, Kevin L 
> [kevin.buterba...@vanderbilt.edu]
> Sent: 18 March 2019 19:09
> To: gpfsug main discussion list
> Subject: Re: [gpfsug-discuss] SSDs for data - DWPD?
> 
> Hi All,
> 
> Just wanted to follow up with the results of my survey … I received a grand 
> total of two responses (Thanks Alex and John).  In their case, they’re using 
> SSDs with a 10 DWPD rating.
> 
> The motivation behind my asking this question was … money!  ;-).  Seriously, 
> 10 DWPD drives are still very expensive, while 3 DWPD drives are 
> significantly less expensive and 1 DWPD drives are even cheaper still.  While 
> we would NOT feel comfortable using anything less than 10 DWPD drives for 
> metadata, we’re wondering about using less expensive drives for data.
> 
> For example, let’s just say that you’re getting ready to set up a brand new 
> GPFS 5 formatted filesystem of 1-2 PB in size.  You’re considering having 3 
> pools:
> 
> 1) a metadata only system pool of 10 DWPD SSDs.  4K inodes, and a ton of 
> small files that’ll fit in the inode.
> 2) a data only “hot” pool (i.e. the default pool for writes) of SSDs.
> 3) a data only “capacity” pool of 12 TB spinning disks.
> 
> And let’s just say that you have looked back at the historical data you’ve 
> collected and you see that over the last 6 months or so you’ve been averaging 
> 10-12 TB of data being written into your existing filesystem per day.  You 
> want to do migrations between pools only on the weekends if at all possible.
> 
> 12 * 7 = 84 TB.  So if you had somewhere between 125 - 150 TB of SSDs ... 1 
> DWPD SSDs … then in theory you should easily be able to handle your 
> anticipated workload without coming close to exceeding the 1 DWPD rating of 
> the SSDs.
> 
> However, as the saying goes, while in theory there’s no difference between 
> theory and practice, in practice there is ... so am I overlooking anything 
> here from a GPFS perspective???
> 
> If anybody still wants to respond on the DWPD rating of the SSDs they use for 
> data, I’m still listening.
> 
> Thanks…
> 
> Kevin
> 
> P.S.  I still have a couple of “outstanding issues” to respond to that I’ve 
> posted to the list about previously:
> 
> 1) the long I/O’s we see occasionally in the output of “mmdiag —iohist” on 
> our NSD servers.  We’re still trying to track that down … it seems to happen 
> only with a subset of our hardware - most of the time at least - but we’re 
> still working to track down what triggers it … i.e. at this point I can’t say 
> whether it’s really the hardware or a user abusing the hardware.
> 
> 2) I promised to post benchmark results of 3 different metadata configs:  a) 
> RAID 1 mirrors, b) a RAID 5 stripe, c) no RAID, but GPFS metadata replication 
> of 3.  That benchmarking has been put on hold for reasons I can’t really 
> discuss on this mailing list at this time … but hopefully soon.
> 
> I haven’t forgotten the above and will respond back on the list when it’s 
> appropriate.  Thanks...
> 
> On Mar 8, 2019, at 10:24 AM, Buterbaugh, Kevin L 
> mailto:kevin.buterba...@vanderbilt.edu>> 
> wrote:
> 
> Hi All,
> 
> This is kind of a survey if you will, so for this one it might be best if you 
> responded directly to me and I’ll summarize the results next week.
> 
> Question 1 - do you use SSDs for data?  If not - i.e. if you only use SSDs 
> for metadata (as we currently do) - thanks, that’s all!  If, however, you do 
> use SSDs for data, please see Question 2.
> 
> Question 2 - what is the DWPD (daily writes per day) of the SSDs that you use 
> for data?
> 
> Question 3 - is that different than the DWPD o

Re: [gpfsug-discuss] SSDs for data - DWPD?

2019-03-18 Thread Buterbaugh, Kevin L

Hi All,

Just wanted to follow up with the results of my survey … I received a grand 
total of two responses (Thanks Alex and John).  In their case, they’re using 
SSDs with a 10 DWPD rating.

The motivation behind my asking this question was … money!  ;-).  Seriously, 10 
DWPD drives are still very expensive, while 3 DWPD drives are significantly 
less expensive and 1 DWPD drives are even cheaper still.  While we would NOT 
feel comfortable using anything less than 10 DWPD drives for metadata, we’re 
wondering about using less expensive drives for data.

For example, let’s just say that you’re getting ready to set up a brand new 
GPFS 5 formatted filesystem of 1-2 PB in size.  You’re considering having 3 
pools:

1) a metadata only system pool of 10 DWPD SSDs.  4K inodes, and a ton of small 
files that’ll fit in the inode.
2) a data only “hot” pool (i.e. the default pool for writes) of SSDs.
3) a data only “capacity” pool of 12 TB spinning disks.

And let’s just say that you have looked back at the historical data you’ve 
collected and you see that over the last 6 months or so you’ve been averaging 
10-12 TB of data being written into your existing filesystem per day.  You want 
to do migrations between pools only on the weekends if at all possible.

12 * 7 = 84 TB.  So if you had somewhere between 125 - 150 TB of SSDs ... 1 
DWPD SSDs … then in theory you should easily be able to handle your anticipated 
workload without coming close to exceeding the 1 DWPD rating of the SSDs.

However, as the saying goes, while in theory there’s no difference between 
theory and practice, in practice there is ... so am I overlooking anything here 
from a GPFS perspective???

If anybody still wants to respond on the DWPD rating of the SSDs they use for 
data, I’m still listening.

Thanks…

Kevin

P.S.  I still have a couple of “outstanding issues” to respond to that I’ve 
posted to the list about previously:

1) the long I/O’s we see occasionally in the output of “mmdiag —iohist” on our 
NSD servers.  We’re still trying to track that down … it seems to happen only 
with a subset of our hardware - most of the time at least - but we’re still 
working to track down what triggers it … i.e. at this point I can’t say whether 
it’s really the hardware or a user abusing the hardware.

2) I promised to post benchmark results of 3 different metadata configs:  a) 
RAID 1 mirrors, b) a RAID 5 stripe, c) no RAID, but GPFS metadata replication 
of 3.  That benchmarking has been put on hold for reasons I can’t really 
discuss on this mailing list at this time … but hopefully soon.

I haven’t forgotten the above and will respond back on the list when it’s 
appropriate.  Thanks...

On Mar 8, 2019, at 10:24 AM, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi All,

This is kind of a survey if you will, so for this one it might be best if you 
responded directly to me and I’ll summarize the results next week.

Question 1 - do you use SSDs for data?  If not - i.e. if you only use SSDs for 
metadata (as we currently do) - thanks, that’s all!  If, however, you do use 
SSDs for data, please see Question 2.

Question 2 - what is the DWPD (daily writes per day) of the SSDs that you use 
for data?

Question 3 - is that different than the DWPD of the SSDs for metadata?

Question 4 - any pertinent information in regards to your answers above (i.e. 
if you’ve got a filesystem that data is uploaded to only once and never 
modified after that then that’s useful to know!)?

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] SSDs for data - DWPD?

2019-03-10 Thread Buterbaugh, Kevin L

Hi All,

This is kind of a survey if you will, so for this one it might be best if you 
responded directly to me and I’ll summarize the results next week.

Question 1 - do you use SSDs for data?  If not - i.e. if you only use SSDs for 
metadata (as we currently do) - thanks, that’s all!  If, however, you do use 
SSDs for data, please see Question 2.

Question 2 - what is the DWPD (daily writes per day) of the SSDs that you use 
for data?

Question 3 - is that different than the DWPD of the SSDs for metadata?

Question 4 - any pertinent information in regards to your answers above (i.e. 
if you’ve got a filesystem that data is uploaded to only once and never 
modified after that then that’s useful to know!)?

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Clarification of mmdiag --iohist output

2019-02-21 Thread Buterbaugh, Kevin L

ectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact  1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.



From:    "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To:gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:02/16/2019 08:18 PM
Subject:[gpfsug-discuss] Clarification of mmdiag --iohist output
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

Been reading man pages, docs, and Googling, and haven’t found a definitive 
answer to this question, so I knew exactly where to turn… ;-)

I’m dealing with some slow I/O’s to certain storage arrays in our environments 
… like really, really slow I/O’s … here’s just one example from one of my NSD 
servers of a 10 second I/O:

08:49:34.943186  Wdata   30:41615622144   2048 10115.192  srv   dm-92   
   

So here’s my question … when mmdiag —iohist tells me that that I/O took 
slightly over 10 seconds, is that:

1.  The time from when the NSD server received the I/O request from the client 
until it shipped the data back onto the wire towards the client?
2.  The time from when the client issued the I/O request until it received the 
data back from the NSD server?
3.  Something else?

I’m thinking it’s #1, but want to confirm.  Which one it is has very obvious 
implications for our troubleshooting steps.  Thanks in advance…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>- 
(615)875-9633
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2bfb2e8e30e64fa06c0f08d6959b2d38%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636860891056277100=PP%2Bs3UFJOHEIFNk7aOXJgo46GVeQr6P%2FLwgDUIGzAXQ%3D=0>



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2bfb2e8e30e64fa06c0f08d6959b2d38%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636860891056297114sdata=5pL67mhVyScJovkRHRqZog9bM5BZG8F2q972czIYAbA%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Clarification of mmdiag --iohist output

2019-02-20 Thread Buterbaugh, Kevin L

ing on fast condvar for signal 
0x7F6D7FA106F8 RdmaSend_NSD_SVC

0.000426716 16691   TRACE_RDMA: handleRecvComp: success 1 of 1 nWrSuccess 1 
index 11 cookie 12 wr_id 0xB0E6E bufferP 0x7F6D7EEE1700 byte_len 4144

0.000432140 37005   TRACE_NSD: nsdDoIO_ReadAndCheck: read complete, len 0 
status 6 err 0 bufP 0x180656C1058 dioIsOverRdma 1 ioDataP 0x20BC000 
ckSumType NsdCksum_None
0.000432163 37005   TRACE_NSD: nsdDoIO_ReadAndCheck: exit err 0
0.000433707 37005   TRACE_GCRYPTO: EncBufPool::releaseTmpBuf(): exit 
bsize=8192 err=0 inBufP=0x180656C1058 bufP=0x180656C1058 index=0
0.000433777 37005   TRACE_NSD: nsdDoIO exit: err 0 0

0.000433844 37005   TRACE_IO: FIO: read data tag 743942 108137 ioVecSize 1 
1st buf 0x122F000 nsdId 0A011103:5C59DBAC da 34:490710888 nSectors 8 err 0
0.000434236 37005   TRACE_DISK: postIO: qosid A00D91E read data disk 
 ioVecSize 1 1st buf 0x122F000 err 0 duration 0.000215000 by 
iocMBHandler (DioHandlerThread)

I'd suggest looking at "mmdiag --iohist" on the NSD server itself and see 
if/how that differs from the client. The other thing you could do is see if 
your NSD server queues are backing up (e.g. "mmfsadm saferdump nsd" and look 
for "requests pending" on queues where the "active" field is > 0). That doesn't 
necessarily mean you need to tune your queues but I'd suggest that if the disk 
I/O on your NSD server looks healthy (e.g. low latency, not overly-taxed) that 
you could benefit from queue tuning.

-Aaron

On Sat, Feb 16, 2019 at 9:47 AM Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:
Hi All,

Been reading man pages, docs, and Googling, and haven’t found a definitive 
answer to this question, so I knew exactly where to turn… ;-)

I’m dealing with some slow I/O’s to certain storage arrays in our environments 
… like really, really slow I/O’s … here’s just one example from one of my NSD 
servers of a 10 second I/O:

08:49:34.943186  Wdata   30:41615622144   2048 10115.192  srv   dm-92   
   

So here’s my question … when mmdiag —iohist tells me that that I/O took 
slightly over 10 seconds, is that:

1.  The time from when the NSD server received the I/O request from the client 
until it shipped the data back onto the wire towards the client?
2.  The time from when the client issued the I/O request until it received the 
data back from the NSD server?
3.  Something else?

I’m thinking it’s #1, but want to confirm.  Which one it is has very obvious 
implications for our troubleshooting steps.  Thanks in advance…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org%2F=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd18386b226474395328208d696ada1a9%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636862069849496039=ZWdCsYl%2BHFHKGOSQRgW1JtdllbJ4786gk0GMX6DQuX0%3D=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd18386b226474395328208d696ada1a9%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636862069849506035=EjqZg1KP9NLMV1ki8KMzIhJQeTnacyxQHm6ueCzekP8%3D=0>
___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd18386b226474395328208d696ada1a9%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636862069849506035=sO2GgFf9Nhryu0q73EbFOgJI%2B2amId%2BMttXpPejUe1E%3D=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd18386b226474395328208d696ada1a9%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636862069849516053=vqg3be8uYOiLnX%2FLIy4OnLa9lS0ikkPC6j9wZUJ4RxY%3D=0>

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd18386b226474395328208d696ada1a9%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636862069849536067sdata=O5p52oLmSxQMWo2wwkVx8Z%2FapYpsAU9lAJ2cKvB095c%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Clarification of mmdiag --iohist output

2019-02-16 Thread Buterbaugh, Kevin L

Hi All,

Been reading man pages, docs, and Googling, and haven’t found a definitive 
answer to this question, so I knew exactly where to turn… ;-)

I’m dealing with some slow I/O’s to certain storage arrays in our environments 
… like really, really slow I/O’s … here’s just one example from one of my NSD 
servers of a 10 second I/O:

08:49:34.943186  Wdata   30:41615622144   2048 10115.192  srv   dm-92   
   

So here’s my question … when mmdiag —iohist tells me that that I/O took 
slightly over 10 seconds, is that:

1.  The time from when the NSD server received the I/O request from the client 
until it shipped the data back onto the wire towards the client?
2.  The time from when the client issued the I/O request until it received the 
data back from the NSD server?
3.  Something else?

I’m thinking it’s #1, but want to confirm.  Which one it is has very obvious 
implications for our troubleshooting steps.  Thanks in advance…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Node ‘crash and restart’ event using GPFS callback?

2019-01-31 Thread Buterbaugh, Kevin L

Hi Bob,

We use the nodeLeave callback to detect node expels … for what you’re wanting 
to do I wonder if nodeJoin might work??  If a node joins the cluster and then 
has an uptime of a few minutes you could go looking in /tmp/mmfs.  HTH...

--
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Jan 30, 2019, at 3:02 PM, Sanchez, Paul 
mailto:paul.sanc...@deshaw.com>> wrote:

There are some cases which I don’t believe can be caught with callbacks (e.g. 
DMS = Dead Man Switch).  But you could possibly use preStartup to check the 
host uptime to make an assumption if GPFS was restarted long after the host 
booted.  You could also peek in /tmp/mmfs and only report if you find something 
there.  That said, the docs say that preStartup fires after the node joins the 
cluster.  So if that means once the node is ‘active’ then you might miss out on 
nodes stuck in ‘arbitrating’ for a while due to a waiter problem.

We run a script with cron which monitors the myriad things which can go wrong 
and attempt to right those which are safe to fix, and raise alerts 
appropriately.  Something like that, outside the reach of GPFS, is often a good 
choice if you don’t need to know something the moment it happens.

Thx
Paul

From: 
gpfsug-discuss-boun...@spectrumscale.org
 
mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 On Behalf Of Oesterlin, Robert
Sent: Wednesday, January 30, 2019 3:52 PM
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Subject: [gpfsug-discuss] Node ‘crash and restart’ event using GPFS callback?

Anyone crafted a good way to detect a node ‘crash and restart’ event using GPFS 
callbacks? I’m thinking “preShutdown” but I’m not sure if that’s the best. What 
I’m really looking for is did the node shutdown (abort) and create a dump in 
/tmp/mmfs


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cccd012a939124326a53908d686f64117%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636844789557921185sdata=9bMPd%2F%2B%2Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset?

2019-01-21 Thread Buterbaugh, Kevin L

Hi All,

I just wanted to follow up on this thread … the only way I have found to obtain 
a list of filesets and their associated junction paths as a non-root user is 
via the REST API (and thanks to those who suggested that).  However, AFAICT 
querying the REST API via a script would expose the username / password used to 
do so to anyone who bothered to look at the code, which would in turn allow a 
knowledgeable and curious user to query the REST API themselves for other 
information we do not necessarily want to expose to them.  Therefore, it is not 
an acceptable solution to us.

Therefore, unless someone responds with a way to allow a non-root user to 
obtain fileset junction paths that doesn’t involve the REST API, I’m afraid I’m 
at a dead end in terms of making our quota usage Python script something that I 
can share with the broader community.  It just has too much site-specific code 
in it.  Sorry…

Kevin

P.S.  In case you’re curious about how the quota script is obtaining those 
junction paths … we have a cron job that runs once per hour on the cluster 
manager that dumps the output of mmlsfileset to a text file, which the script 
then reads.  The cron job used to just run once per day and used to just run 
mmlsfileset.  I have modified it to be a shell script which checks for the load 
average on the cluster manager being less than 10 and that there are no waiters 
of more than 10 seconds duration.  If both of those conditions are true, it 
runs mmlsfileset.  If either are not, it simply exits … the idea being that one 
or both of those would likely be true if something were going on with the 
cluster manager that would cause the mmlsfileset to hang.

I have also modified the quota script itself so that it checks that the 
junction path for a fileset actually exists before attempting to stat it (duh - 
should’ve done that from the start), which handles the case where a user would 
run the quota script and it would bomb off with an exception because the 
fileset was deleted and the cron job hadn’t run yet.  If a new fileset is 
created, well, it just won’t get checked by the quota script until the cron job 
runs successfully.  We have decided that this is an acceptable compromise.

On Jan 15, 2019, at 8:46 AM, Marc A Kaplan 
mailto:makap...@us.ibm.com>> wrote:

Personally, I agree that there ought to be a way in the product.

In the meawhile, you no doubt already have some ways to tell your users where 
to find their filesets as pathnames.
Otherwise, how are they accessing their files?

And to keep things somewhat sane, I'd bet filesets are all linked to one or 
small number of well known paths in the filesystem.
Like  /AGpfsFilesystem/filesets/...  Plus you could add symlinks and/or as has 
been suggested post info extracted from mmlsfileset and/or mmlsquota.

So as a practical matter, is this an urgent problem...?  Why?  How?
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset?

2019-01-15 Thread Buterbaugh, Kevin L

Hi Marc (All),

Yes, I can easily determine where filesets are linked here … it is, as you
said, in just one or two paths. The script as it stands now has been doing
that for several years and only needs a couple of relatively minor tweaks to be
even more useful to _us_ by whittling down a couple of edge cases relating to
fileset creation / deletion.

However … there was a request to share the script with the broader community …
something I’m willing to do if I can get it in a state where it would be useful
to others with little or no modification. Anybody who’s been on this list for
any length of time knows how much help I’ve received from the community over
the years. I truly appreciate that and would like to give back, even in a
minor way, if possible.

But in order to do that the script can’t be full of local assumptions … that’s
it in a nutshell … that’s why I want to programmatically determine the junction
path at run time as a non-root user.

I’ll also mention here that early on in this thread Simon Thompson suggested
looking into the REST API. Sure enough, you can get the information that way …
but, AFAICT, that would require the script to contain a username / password
combination that would allow anyone with access to the script to then use that
authentication information to access other information within GPFS that we
probably don’t want them to have access to. If I’m mistaken about that, then
please feel free to enlighten me.

Thanks again…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu -
(615)875-9633

On Jan 15, 2019, at 8:46 AM, Marc A Kaplan
mailto:makap...@us.ibm.com>> wrote:

Personally, I agree that there ought to be a way in the product.

In the meawhile, you no doubt already have some ways to tell your users where
to find their filesets as pathnames.
Otherwise, how are they accessing their files?

And to keep things somewhat sane, I'd bet filesets are all linked to one or
small number of well known paths in the filesystem.
Like /AGpfsFilesystem/filesets/... Plus you could add symlinks and/or as has
been suggested post info extracted from mmlsfileset and/or mmlsquota.

So as a practical matter, is this an urgent problem...? Why? How?
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbd2c28fdb60041f3434e08d67af83b11%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636831603904557717sdata=A74TTq%2FQvyhEMHaolklbiMAEnaGVuHNiyhVYfn4wRek%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

2019-01-15 Thread Buterbaugh, Kevin L

Hi Scott and Valdis (and everyone else),

Thanks for your responses.

Yes, we _could_ easily build a local naming scheme … the name of the fileset 
matches the name of a folder in one of a couple of parent directories.  
However, an earlier response to my post asked if we’d be willing to share our 
script with the community and we would … _if_ we can make it generic enough to 
be useful.  Local naming schemes hardcoded in the script make it much less 
generically useful.

Plus, it just seems to me that there ought to be a way to do this … to get a 
list of fileset names from mmlsquota and then programmatically determine their 
junction path without having root privileges.  GPFS has got to be storing that 
information somewhere, and I’m frankly quite surprised that no IBMer has 
responded with an answer to that.  But I also know that when IBM is silent, 
there’s typically a reason.

And yes, we could regularly create a static file … in fact, that’s what we do 
now once per day (in the early morning hours).  While this is not a huge deal - 
we only create / delete filesets a handful of times per month - on the day we 
do the script won’t function properly unless we manually update the file.  I’m 
wanting to eliminate that, if possible … which as I stated in the preceding 
paragraph, I have a hard time believing is not possible.

I did look at the list of callbacks again (good thought!) and there’s not one 
specifically related to the creation / deletion of a fileset.  There was only 
one that I saw that I think could even possibly be of use … ccrFileChange.  Can 
anyone on the list confirm or deny that the creation / deletion of a fileset 
would cause that callback to be triggered??  If it is triggered, then we could 
use that to update the static filesets within a minute or two of the change 
being made, which would definitely be acceptable.  I realize that many things 
likely trigger a ccrFileChange, so I’m thinking of having a callback script 
that checks the current list of filesets against the static file and updates 
that appropriately.

Thanks again for the responses…

Kevin

> On Jan 13, 2019, at 10:09 PM, Scott Goldman  wrote:
> 
> Kevin,
> Something I've done in the past is to create a service that once an 
> hour/day/week that would build a static file that consists of the needed 
> output.
> 
> As long as you can take the update delay (or perhaps trigger the update with 
> a callback), this should work and could actually be lighter on the system.
> 
> Sent from my BlackBerry - the most secure mobile device
> 
>   Original Message  
> From: valdis.kletni...@vt.edu
> Sent: January 12, 2019 4:07 PM
> To: gpfsug-discuss@spectrumscale.org
> Reply-to: gpfsug-discuss@spectrumscale.org
> Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running  
> mmlsfileset?
> 
> On Sat, 12 Jan 2019 03:07:29 +, "Buterbaugh, Kevin L" said:
>> But from there I need to then be able to find out where that fileset is
>> mounted in the directory tree so that I can see who the owner and group of 
>> that
>> directory are.
> 
> You're not able to leverage a local naming scheme? There's no connection 
> between
> the name of the fileset and where it is in the tree?  I would hope there is, 
> because
> otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user 
> will
> now be confused over what director(y/ies) need to be cleaned up.  If your tool
> says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at
> /gpfs/foo/bar/baz then it's actionable.
> 
> And if the user knows what the mapping is, your script can know it too
> 

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

2019-01-12 Thread Buterbaugh, Kevin L

Hi All,

I appreciate the time several of you have taken to respond to my inquiry.  
However, unless I’m missing something - and my apologies if I am - none so far 
appear to allow me to obtain the list of junction paths as a non-root user.  
Yes, mmlsquota shows all the filesets.  But from there I need to then be able 
to find out where that fileset is mounted in the directory tree so that I can 
see who the owner and group of that directory are.  Only if the user running 
the script is either the owner or a member of the group do I want to display 
the fileset quota for that fileset to the user.

Thanks again…

Kevin

On Jan 11, 2019, at 10:24 AM, Jeffrey R. Lang 
mailto:jrl...@uwyo.edu>> wrote:

What we do is the use “mmlsquota -Y ” which will list out all the 
filesets in an easily parseable format.   And the command can be run by the 
user.


From: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
 
mailto:gpfsug-discuss-boun...@spectrumscale.org>>
 On Behalf Of Peter Childs
Sent: Friday, January 11, 2019 6:50 AM
To: gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>
Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running 
mmlsfileset?

◆ This message was sent from a non-UWYO address. Please exercise caution when 
clicking links or opening attachments from external sources.

We have a similar issue, I'm wondering if getting mmlsfileset to work as a user 
is a reasonable "request for enhancement" I suspect it would need better 
wording.


We too have a rather complex script to report on quota's that I suspect does a 
similar job. It works by having all the filesets mounted in known locations and 
names matching mount point names. It then works out which ones are needed by 
looking at the group ownership, Its very slow and a little cumbersome. Not 
least because it was written ages ago in a mix of bash, sed, awk and find.








On Tue, 2019-01-08 at 22:12 +, Buterbaugh, Kevin L wrote:
Hi All,

Happy New Year to all!  Personally, I’ll gladly and gratefully settle for 2019 
not being a dumpster fire like 2018 was (those who attended my talk at the user 
group meeting at SC18 know what I’m referring to), but I certainly wish all of 
you the best!

Is there a way to get a list of the filesets in a filesystem without running 
mmlsfileset?  I was kind of expecting to find them in one of the config files 
somewhere under /var/mmfs but haven’t found them yet in the searching I’ve done.

The reason I’m asking is that we have a Python script that users can run that 
needs to get a list of all the filesets in a filesystem.  There are obviously 
multiple issues with that, so the workaround we’re using for now is to have a 
cron job which runs mmlsfileset once a day and dumps it out to a text file, 
which the script then reads.  That’s sub-optimal for any day on which a fileset 
gets created or deleted, so I’m looking for a better way … one which doesn’t 
require root privileges and preferably doesn’t involve running a GPFS command 
at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on 
that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks 
to the capacity pool and things are working fine.

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633



___

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>

http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cee10c1e22a474fedceb408d678318231%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636828551398045995=m0nd3Ln0qalNEUCGQmj%2B2ZEQXYCTPzKiYJcSmFXkGZQ%3D=0>



--

Peter Childs
ITS Research Storage
Queen Mary, University of London

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cee10c1e22a474fedceb408d678318231%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636828551398056004sdata=F56RKhMef0zYjAj2dKFu3bAuq7xQvFoulYhwDnfN1Ms%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Get list of filesets _without_runningmmlsfileset?

2019-01-10 Thread Buterbaugh, Kevin L

Hi Andrew / All,

Well, it does _sound_ useful, but in its current state it’s really not for 
several reasons, mainly having to do with it being coded in a moderately 
site-specific way.  It needs an overhaul anyway, so I’m going to look at 
getting rid of as much of that as possible (there’s some definite low-hanging 
fruit there) and, for the site-specific things that can’t be gotten rid of, 
maybe consolidating them into one place in the code so that the script could be 
more generally useful if you just change those values.

If I can accomplish those things, then yes, we’d be glad to share the script.

But I’ve also realized that I didn’t _entirely_ answer my original question.  
Yes, mmlsquota will show me all the filesets … but I also need to know the 
junction path for each of those filesets.  One of the main reasons we wrote 
this script in the first place is that if you run mmlsquota you see that you 
have no limits on about 60 filesets (currently we use fileset quotas only on 
our filesets) … and that’s because there are no user (or group) quotas in those 
filesets.  The script, however, reads in that text file that is created nightly 
by root that is nothing more than the output of “mmlsfileset ”, gets the junction path, looks up the GID of the junction path, and sees 
if you’re a member of that group.  If you’re not, well, no sense in showing you 
anything about that fileset.  But, of course, if you are a member of that 
group, then we do want to show you the fileset quota for that fileset.

So … my question now is, “Is there a way for a non-root user” to get the 
junction path for the fileset(s)?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Jan 9, 2019, at 7:13 PM, Andrew Beattie 
mailto:abeat...@au1.ibm.com>> wrote:

Kevin,

That sounds like a useful script
would you care to share?

Thanks
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeat...@au1.ibm.com<mailto:abeat...@au1.ibm.com>


- Original message -
From: "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Cc:
Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running 
mmlsfileset?
Date: Thu, Jan 10, 2019 9:22 AM

Hi All,

Let me answer Skylar’s questions in another e-mail, which may also tell whether 
the rest API is a possibility or not.

The Python script in question is to display quota information for a user.  The 
mmlsquota command has a couple of issues:  1) its output is confusing to some 
of our users, 2) more significantly, it displays a ton of information that 
doesn’t apply to the user running it.  For example, it will display all the 
filesets in a filesystem whether or not the user has access to them.  So the 
Python script figures out what group(s) the user is a member of and only 
displays information pertinent to them (i.e. the group of the fileset junction 
path is a group this user is a member of) … and in a simplified (and 
potentially colorized) output format.

And typing that preceding paragraph caused the lightbulb to go off … I know the 
answer to my own question … have the script run mmlsquota and get the full list 
of filesets from that, then parse that to determine which ones I actually need 
to display quota information for.  Thanks!

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Jan 9, 2019, at 4:42 PM, Simon Thompson 
mailto:s.j.thomp...@bham.ac.uk>> wrote:

Hi Kevin,

Have you looked at the rest API?

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htmdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3Dreserved=0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cc1ffac821c5f4524104908d67698e948%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636826796466979672=0%2FsV4sgFJmyJjaETdNtmsRP21pm5zFSYdpr9eNtDSs4%3D=0>

I don't know how much access control there is available in the API so not sure 
if you could lock some sort of service user down to just the get filesets 
comma

Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

2019-01-09 Thread Buterbaugh, Kevin L

Hi All,

Let me answer Skylar’s questions in another e-mail, which may also tell whether 
the rest API is a possibility or not.

The Python script in question is to display quota information for a user.  The 
mmlsquota command has a couple of issues:  1) its output is confusing to some 
of our users, 2) more significantly, it displays a ton of information that 
doesn’t apply to the user running it.  For example, it will display all the 
filesets in a filesystem whether or not the user has access to them.  So the 
Python script figures out what group(s) the user is a member of and only 
displays information pertinent to them (i.e. the group of the fileset junction 
path is a group this user is a member of) … and in a simplified (and 
potentially colorized) output format.

And typing that preceding paragraph caused the lightbulb to go off … I know the 
answer to my own question … have the script run mmlsquota and get the full list 
of filesets from that, then parse that to determine which ones I actually need 
to display quota information for.  Thanks!

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Jan 9, 2019, at 4:42 PM, Simon Thompson 
mailto:s.j.thomp...@bham.ac.uk>> wrote:

Hi Kevin,

Have you looked at the rest API?

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htmdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3Dreserved=0

I don't know how much access control there is available in the API so not sure 
if you could lock some sort of service user down to just the get filesets 
command?

Simon
___
From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Buterbaugh, Kevin L 
[kevin.buterba...@vanderbilt.edu]
Sent: 08 January 2019 22:12
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

Hi All,

Happy New Year to all!  Personally, I’ll gladly and gratefully settle for 2019 
not being a dumpster fire like 2018 was (those who attended my talk at the user 
group meeting at SC18 know what I’m referring to), but I certainly wish all of 
you the best!

Is there a way to get a list of the filesets in a filesystem without running 
mmlsfileset?  I was kind of expecting to find them in one of the config files 
somewhere under /var/mmfs but haven’t found them yet in the searching I’ve done.

The reason I’m asking is that we have a Python script that users can run that 
needs to get a list of all the filesets in a filesystem.  There are obviously 
multiple issues with that, so the workaround we’re using for now is to have a 
cron job which runs mmlsfileset once a day and dumps it out to a text file, 
which the script then reads.  That’s sub-optimal for any day on which a fileset 
gets created or deleted, so I’m looking for a better way … one which doesn’t 
require root privileges and preferably doesn’t involve running a GPFS command 
at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on 
that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks 
to the capacity pool and things are working fine.

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885sdata=WSijRrjhOgQyuWsh9K8ckpjf%2F2CkXfZW1n%2BJw5Gw5tw%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Get list of filesets _without_ running mmlsfileset?

2019-01-09 Thread Buterbaugh, Kevin L

Hi All,

Happy New Year to all!  Personally, I’ll gladly and gratefully settle for 2019 
not being a dumpster fire like 2018 was (those who attended my talk at the user 
group meeting at SC18 know what I’m referring to), but I certainly wish all of 
you the best!

Is there a way to get a list of the filesets in a filesystem without running 
mmlsfileset?  I was kind of expecting to find them in one of the config files 
somewhere under /var/mmfs but haven’t found them yet in the searching I’ve done.

The reason I’m asking is that we have a Python script that users can run that 
needs to get a list of all the filesets in a filesystem.  There are obviously 
multiple issues with that, so the workaround we’re using for now is to have a 
cron job which runs mmlsfileset once a day and dumps it out to a text file, 
which the script then reads.  That’s sub-optimal for any day on which a fileset 
gets created or deleted, so I’m looking for a better way … one which doesn’t 
require root privileges and preferably doesn’t involve running a GPFS command 
at all.

Thanks in advance.

Kevin

P.S.  I am still working on metadata and iSCSI testing and will report back on 
that when complete.
P.P.S.  We ended up adding our new NSDs comprised of (not really) 12 TB disks 
to the capacity pool and things are working fine.

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Anybody running GPFS over iSCSI?

2018-12-15 Thread Buterbaugh, Kevin L

Hi All,

Googling “GPFS and iSCSI” doesn’t produce a ton of hits!  But we are interested 
to know if anyone is actually using GPFS over iSCSI?

The reason why I’m asking is that we currently use an 8 Gb FC SAN … QLogic 
SANbox 5800’s, QLogic HBA’s in our NSD servers … but we’re seeing signs that, 
especially when we start using beefier storage arrays with more disks behind 
the controllers, the 8 Gb FC could be a bottleneck.

As many / most of you are already aware, I’m sure, while 16 Gb FC exists, 
there’s basically only one vendor in that game.  And guess what happens to 
prices when there’s only one vendor???  We bought our 8 Gb FC switches for 
approximately $5K apiece.  List price on a  16 Gb FC switch - $40K. 
 Ouch.

So the idea of being able to use commodity 10 or 40 Gb Ethernet switches and 
HBA’s is very appealing … both from a cost and a performance perspective (last 
I checked 40 Gb was more than twice 16 Gb!).  Anybody doing this already?

As those of you who’ve been on this list for a while and don’t filter out 
e-mails from me () already know, we have a much beefier Infortrend 
storage array we’ve purchased that I’m currently using to test various metadata 
configurations (and I will report back results on that when done, I promise).  
That array also supports iSCSI, so I actually have our test cluster GPFS 
filesystem up and running over iSCSI.  It was surprisingly easy to set up.  But 
any tips, suggestions, warnings, etc. about running GPFS over iSCSI are 
appreciated!

Two things that I am already aware of are:  1) use jumbo frames, and 2) run 
iSCSI over it’s own private network.  Other things I should be aware of?!?

Thanks all…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Best way to migrate data

2018-10-18 Thread Buterbaugh, Kevin L

Hi Dwayne,

I’m assuming you can’t just let an rsync run, possibly throttled in some way?  
If not, and if you’re just tapping out your network, then would it be possible 
to go old school?  We have parts of the Medical Center here where their network 
connections are … um, less than robust.  So they tar stuff up to a portable HD, 
sneaker net it to us, and we untar is from an NSD server.

HTH, and I really hope that someone has a better idea than that!

Kevin

> On Oct 18, 2018, at 12:19 PM, dwayne.h...@med.mun.ca wrote:
> 
> Hi,
> 
> Just wondering what the best recipe for migrating a user’s home directory 
> content from one GFPS file system to another which hosts a larger research 
> GPFS file system? I’m currently using rsync and it has maxed out the client 
> system’s IB interface.
> 
> Best,
> Dwayne 
> —
> Dwayne Hart | Systems Administrator IV
> 
> CHIA, Faculty of Medicine 
> Memorial University of Newfoundland 
> 300 Prince Philip Drive
> St. John’s, Newfoundland | A1B 3V6
> Craig L Dobbin Building | 4M409
> T 709 864 6631
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ccca728d2d61f4be06bcd08d6351f3650%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754805507359478sdata=2YAiqgqKl4CerlyCn3vJ9v9u%2FrGzbfa7aKxJ0PYV%2Fhc%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Job vacancy @Birmingham

2018-10-18 Thread Buterbaugh, Kevin L

Hi Nathan,

Well, while I’m truly sorry for what you’re going thru, at least a majority of 
the voters in the UK did vote for it.  Keep in mind that things could be worse.

Some of us do happen to live in a country where a far worse thing has happened 
despite the fact that the majority of the voters were _against_ it…. ;-)

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Oct 18, 2018, at 4:23 AM, Nathan Harper 
mailto:nathan.har...@cfms.org.uk>> wrote:

Olaf - we don't need any reminders of Bre.. this 
morning

On Thu, 18 Oct 2018 at 10:15, Olaf Weiser 
mailto:olaf.wei...@de.ibm.com>> wrote:
Hi  Simon ..
well - I would love to .. .but .. ;-) hey - what do you think, how long a 
citizen from the EU can live (and work) in UK ;-)
don't take me too serious... see you soon, consider you invited for a coffee 
for my rude comment .. ;-)
olaf




From:Simon Thompson 
mailto:s.j.thomp...@bham.ac.uk>>
To:
"gpfsug-discuss@spectrumscale.org" 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:10/17/2018 11:02 PM
Subject:[gpfsug-discuss] Job vacancy @Birmingham
Sent by:
gpfsug-discuss-boun...@spectrumscale.org




We're looking for someone to join our systems team here at University of 
Birmingham. In case you didn't realise, we're pretty reliant on Spectrum Scale 
to deliver our storage systems.

https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3=1763739_template=767=5032521=fair_id=0_code=15445_code=6876_code=117https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3=1763739_template=767=5032521=fair_id=0_code=15445_code=6876_code=117

Such a snappy URL :-)

Feel free to email me *OFFLIST* if you have informal enquiries!

Simon

___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



___
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org

Re: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously?

2018-10-15 Thread Buterbaugh, Kevin L

Marc,

Ugh - sorry, completely overlooked that…

Kevin

On Oct 15, 2018, at 1:44 PM, Marc A Kaplan 
mailto:makap...@us.ibm.com>> wrote:

How about using the -F option?

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb6d9700cd6ff4bbed85808d632ce4ff2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636752259026486137sdata=mBfANLkK8v2ZEahGumE4a7iVIAcVJXb1Dv2kgSrynrI%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] mmfileid on 2 NSDs simultaneously?

2018-10-15 Thread Buterbaugh, Kevin L

Hi All,

Is there a way to run mmfileid on two NSD’s simultaneously?  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Long I/O's on client but not on NSD server(s)

2018-10-04 Thread Buterbaugh, Kevin L

Hi All,

What does it mean if I have a few dozen very long I/O’s (50 - 75 seconds) on a 
gateway as reported by “mmdiag —iohist” and they all reference two of my eight 
NSD servers…

… but then I go to those 2 NSD servers and I don’t see any long I/O’s at all?

In other words, if the problem (this time) were the backend storage, I should 
see long I/O’s on the NSD servers, right?

I’m thinking this indicates that there is some sort of problem with either the 
client gateway itself or the network in between the gateway and the NSD 
server(s) … thoughts???

Thanks in advance…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] What is this error message telling me?

2018-09-27 Thread Buterbaugh, Kevin L

Hi Aaron,

No … just plain old ethernet.  Thanks!

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Sep 27, 2018, at 11:03 AM, Aaron Knister 
mailto:aaron.s.knis...@nasa.gov>> wrote:

Kevin,

Is the communication in this case by chance using IPoIB in connected mode?

-Aaron

On 9/27/18 11:04 AM, Buterbaugh, Kevin L wrote:
Hi All,
2018-09-27_09:48:50.923-0500: [E] The TCP connection to IP address 1.2.3.4 some 
client  (socket 442) state is unexpected: ca_state=1 unacked=3 
rto=27008000
Seeing errors like the above and trying to track down the root cause.  I know 
that at last weeks’ GPFS User Group meeting at ORNL this very error message was 
discussed, but I don’t recall the details and the slides haven’t been posted to 
the website yet.  IIRC, the “rto” is significant …
I’ve Googled, but haven’t gotten any hits, nor have I found anything in the 
GPFS 4.2.2 Problem Determination Guide.
Thanks in advance…
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> 
<mailto:kevin.buterba...@vanderbilt.edu> - (615)875-9633
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C639e397dfb514469f48d08d62492c8a2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636736610191929732sdata=GE1IIRL77bjWiFaa2%2FpV68sPtXJNUrtGPrc68GsOrtg%3Dreserved=0

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C639e397dfb514469f48d08d62492c8a2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636736610191929732sdata=GE1IIRL77bjWiFaa2%2FpV68sPtXJNUrtGPrc68GsOrtg%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] RAID type for system pool

2018-09-11 Thread Buterbaugh, Kevin L

Hi Marc,

Understood … I’m just trying to understand why some I/O’s are flagged as 
metadata, while others are flagged as inode?!?  Since this filesystem uses 512 
byte inodes, there is no data content from any files involved (for a metadata 
only disk), correct?  Thanks…

Kevin

On Sep 11, 2018, at 9:12 AM, Marc A Kaplan 
mailto:makap...@us.ibm.com>> wrote:

Metadata is anything besides the data contents of your files.
Inodes, directories, indirect blocks, allocation maps, log data ...  are the 
biggies.

Apparently, --iohist may sometimes distinguish some metadata as "inode", 
"logData", ...  that doesn't mean those aren't metadata also.




From:    "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To:gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:09/10/2018 03:12 PM
Subject:[gpfsug-discuss]  RAID type for system pool
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





From: 
gpfsug-discuss-ow...@spectrumscale.org<mailto:gpfsug-discuss-ow...@spectrumscale.org>
Subject: Re: [gpfsug-discuss] RAID type for system pool
Date: September 10, 2018 at 11:35:05 AM CDT
To: k...@accre.vanderbilt.edu<mailto:k...@accre.vanderbilt.edu>

Hi All,

So while I’m waiting for the purchase of new hardware to go thru, I’m trying to 
gather more data about the current workload.  One of the things I’m trying to 
do is get a handle on the ratio of reads versus writes for my metadata.

I’m using “mmdiag —iohist” … in this case “dm-12” is one of my metadataOnly 
disks and I’m running this on the primary NSD server for that NSD.  I’m seeing 
output like:

11:22:13.931117  W   inode4:29984416310.448  srv   dm-12
 
11:22:13.932344  Rmetadata4:36659676 40.307  srv   dm-12
 
11:22:13.932005  W logData4:49676176 10.726  srv   dm-12
 

And I’m confused as to the difference between “inode” and “metadata” (I at 
least _think_ I understand “logData”)?!?  The man page for mmdiag doesn’t help 
and I’ve not found anything useful yet in my Googling.

This is on a filesystem that currently uses 512 byte inodes, if that matters.  
Thanks…

Kevin
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2dbbb1fe9f5a4b80aa6b08d617f0a664%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636722719686349116=QfIeBZZ5%2Brpqqw1XL0cnIGqzswhtX3mYfJuTKSGpDEM%3D=0>



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C2dbbb1fe9f5a4b80aa6b08d617f0a664%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636722719686369131sdata=uOPawxUhx4Wvxja5%2FLvJJMpAHj3uRb0Q1eiogmRXGgw%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] RAID type for system pool

2018-09-10 Thread Buterbaugh, Kevin L

From: 
gpfsug-discuss-ow...@spectrumscale.org
Subject: Re: [gpfsug-discuss] RAID type for system pool
Date: September 10, 2018 at 11:35:05 AM CDT
To: k...@accre.vanderbilt.edu

Hi All,

So while I’m waiting for the purchase of new hardware to go thru, I’m trying to 
gather more data about the current workload.  One of the things I’m trying to 
do is get a handle on the ratio of reads versus writes for my metadata.

I’m using “mmdiag —iohist” … in this case “dm-12” is one of my metadataOnly 
disks and I’m running this on the primary NSD server for that NSD.  I’m seeing 
output like:

11:22:13.931117  W   inode4:29984416310.448  srv   dm-12

11:22:13.932344  Rmetadata4:36659676 40.307  srv   dm-12

11:22:13.932005  W logData4:49676176 10.726  srv   dm-12

And I’m confused as to the difference between “inode” and “metadata” (I at 
least _think_ I understand “logData”)?!?  The man page for mmdiag doesn’t help 
and I’ve not found anything useful yet in my Googling.

This is on a filesystem that currently uses 512 byte inodes, if that matters.  
Thanks…

Kevin

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] RAID type for system pool

2018-09-10 Thread Buterbaugh, Kevin L

Hi All,

So while I’m waiting for the purchase of new hardware to go thru, I’m trying to 
gather more data about the current workload.  One of the things I’m trying to 
do is get a handle on the ratio of reads versus writes for my metadata.

I’m using “mmdiag —iohist” … in this case “dm-12” is one of my metadataOnly 
disks and I’m running this on the primary NSD server for that NSD.  I’m seeing 
output like:

11:22:13.931117  W   inode4:29984416310.448  srv   dm-12
 
11:22:13.932344  Rmetadata4:36659676 40.307  srv   dm-12
 
11:22:13.932005  W logData4:49676176 10.726  srv   dm-12
 

And I’m confused as to the difference between “inode” and “metadata” (I at 
least _think_ I understand “logData”)?!?  The man page for mmdiag doesn’t help 
and I’ve not found anything useful yet in my Googling.

This is on a filesystem that currently uses 512 byte inodes, if that matters.  
Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] RAID type for system pool

2018-09-06 Thread Buterbaugh, Kevin L

Hi All,

Wow - my query got more responses than I expected and my sincere thanks to all 
who took the time to respond!

At this point in time we do have two GPFS filesystems … one which is basically 
“/home” and some software installations and the other which is “/scratch” and 
“/data” (former backed up, latter not).  Both of them have their metadata on 
SSDs set up as RAID 1 mirrors and replication set to two.  But at this point in 
time all of the SSDs are in a single storage array (albeit with dual redundant 
controllers) … so the storage array itself is my only SPOF.

As part of the hardware purchase we are in the process of making we will be 
buying a 2nd storage array that can house 2.5” SSDs.  Therefore, we will be 
splitting our SSDs between chassis and eliminating that last SPOF.  Of course, 
this includes the new SSDs we are getting for our new /home filesystem.

Our plan right now is to buy 10 SSDs, which will allow us to test 3 
configurations:

1) two 4+1P RAID 5 LUNs split up into a total of 8 LV’s (with each of my 8 NSD 
servers as primary for one of those LV’s and the other 7 as backups) and GPFS 
metadata replication set to 2.

2) four RAID 1 mirrors (which obviously leaves 2 SSDs unused) and GPFS metadata 
replication set to 2.  This would mean that only 4 of my 8 NSD servers would be 
a primary.

3) nine RAID 0 / bare drives with GPFS metadata replication set to 3 (which 
leaves 1 SSD unused).  All 8 NSD servers primary for one SSD and 1 serving up 
two.

The responses I received concerning RAID 5 and performance were not a surprise 
to me.  The main advantage that option gives is the most usable storage space 
for the money (in fact, it gives us way more storage space than we currently 
need) … but if it tanks performance, then that’s a deal breaker.

Personally, I like the four RAID 1 mirrors config like we’ve been using for 
years, but it has the disadvantage of giving us the least usable storage space 
… that config would give us the minimum we need for right now, but doesn’t 
really allow for much future growth.

I have no experience with metadata replication of 3 (but had actually thought 
of that option, so feel good that others suggested it) so option 3 will be a 
brand new experience for us.  It is the most optimal in terms of meeting 
current needs plus allowing for future growth without giving us way more space 
than we are likely to need).  I will be curious to see how long it takes GPFS 
to re-replicate the data when we simulate a drive failure as opposed to how 
long a RAID rebuild takes.

I am a big believer in Murphy’s Law (Sunday I paid off a bill, Wednesday my 
refrigerator died!) … and also believe that the definition of a pessimist is 
“someone with experience”  … so we will definitely not set GPFS metadata 
replication to less than two, nor will we use non-Enterprise class SSDs for 
metadata … but I do still appreciate the suggestions.

If there is interest, I will report back on our findings.  If anyone has any 
additional thoughts or suggestions, I’d also appreciate hearing them.  Again, 
thank you!

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] RAID type for system pool

2018-09-05 Thread Buterbaugh, Kevin L

Hi All,

We are in the process of finalizing the purchase of some new storage arrays (so 
no sales people who might be monitoring this list need contact me) to 
life-cycle some older hardware.  One of the things we are considering is the 
purchase of some new SSD’s for our “/home” filesystem and I have a question or 
two related to that.

Currently, the existing home filesystem has it’s metadata on SSD’s … two RAID 1 
mirrors and metadata replication set to two.  However, the filesystem itself is 
old enough that it uses 512 byte inodes.  We have analyzed our users files and 
know that if we create a new filesystem with 4K inodes that a very significant 
portion of the files would now have their _data_ stored in the inode as well 
due to the files being 3.5K or smaller (currently all data is on spinning HD 
RAID 1 mirrors).

Of course, if we increase the size of the inodes by a factor of 8 then we also 
need 8 times as much space to store those inodes.  Given that Enterprise class 
SSDs are still very expensive and our budget is not unlimited, we’re trying to 
get the best bang for the buck.

We have always - even back in the day when our metadata was on spinning disk 
and not SSD - used RAID 1 mirrors and metadata replication of two.  However, we 
are wondering if it might be possible to switch to RAID 5?  Specifically, what 
we are considering doing is buying 8 new SSDs and creating two 3+1P RAID 5 LUNs 
(metadata replication would stay at two).  That would give us 50% more usable 
space than if we configured those same 8 drives as four RAID 1 mirrors.

Unfortunately, unless I’m misunderstanding something, mean that the RAID stripe 
size and the GPFS block size could not match.  Therefore, even though we don’t 
need the space, would we be much better off to buy 10 SSDs and create two 4+1P 
RAID 5 LUNs?

I’ve searched the mailing list archives and scanned the DeveloperWorks wiki and 
even glanced at the GPFS documentation and haven’t found anything that says 
“bad idea, Kevin”… ;-)

Expanding on this further … if we just present those two RAID 5 LUNs to GPFS as 
NSDs then we can only have two NSD servers as primary for them.  So another 
thing we’re considering is to take those RAID 5 LUNs and further sub-divide 
them into a total of 8 logical volumes, each of which could be a GPFS NSD and 
therefore would allow us to have each of our 8 NSD servers be primary for one 
of them.  Even worse idea?!?  Good idea?

Anybody have any better ideas???  ;-)

Oh, and currently we’re on GPFS 4.2.3-10, but are also planning on moving to 
GPFS 5.0.1-x before creating the new filesystem.

Thanks much…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 79, Issue 21: mmaddcallback documentation issue

2018-08-07 Thread Buterbaugh, Kevin L

Hi All,

I was able to navigate down thru IBM’s website and find the GPFS 5.0.1 manuals 
but they contain the same typo, which Pete has correctly identified … and I 
have confirmed that his solution works.

Thanks...

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 7, 2018, at 6:35 AM, Chase, Peter 
mailto:peter.ch...@metoffice.gov.uk>> wrote:

Hi Kevin,

I'm running policy migrations on Spectrum Scale 4.2.3, but I use mmapplypolicy 
to kick off the policy runs, not mmstartpolicy. Docs here (which I admit are 
not for your version of Spectrum Scale) state that mmstartpolicy is for 
internal GPFS use only: 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fwikis%2Fhome%3Flang%3Den%23!%2Fwiki%2FGeneral%2BParallel%2BFile%2BSystem%2B(GPFS)%2Fpage%2FUsing%2BPoliciesdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912985631sdata=4PmYIvmKenhqtLRVhusaQpWHAjGcd6YFMkb5nMa%2Bwuw%3Dreserved=0

So if the above link is correct, I'd recommend switching to using 
mmapplypolicy, which handily comes with a man page, whereas mmstartpolicy 
doesn't and might have you fumbling around in the dark.

As for the issue you're experiencing with adding a callback, it looks like the 
mmaddcallback command is catching the --single-instance flag as an argument for 
it, not as a parameter for the mmstartpolicy command. After looking at the 
documentation you've referenced, I suspect that there's a typo/omission in the 
command and it should have a trailing double quote (") on the end of the parms 
argument list, i.e.:

mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event 
lowDiskSpace --parms "%eventName %fsName --single-instance"

I'm not sure how we go about asking IBM to correct their documentation, but 
expect someone in the user group will have some idea.

Regards,

Pete Chase
peter.ch...@metoffice.gov.uk<mailto:peter.ch...@metoffice.gov.uk>


-Original Message-
From: gpfsug-discuss-boun...@spectrumscale.org 
 On Behalf Of 
gpfsug-discuss-requ...@spectrumscale.org
Sent: 06 August 2018 23:47
To: gpfsug-discuss@spectrumscale.org
Subject: gpfsug-discuss Digest, Vol 79, Issue 21

Send gpfsug-discuss mailing list submissions to
gpfsug-discuss@spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C806e69ddb2294dbe5ad008d5fc5b2e70%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636692390912995641sdata=1kVV9WbthdhHHEX32bT0C3uUJlVTAtMrV6tEFiT9%2BzY%3Dreserved=0
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-requ...@spectrumscale.org

You can reach the person managing the list at
gpfsug-discuss-ow...@spectrumscale.org

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of gpfsug-discuss digest..."


Today's Topics:

  1. mmaddcallback documentation issue (Buterbaugh, Kevin L)
  2. Re: mmaddcallback documentation issue (Eric Sperley)


--

Message: 1
Date: Mon, 6 Aug 2018 21:42:54 +
From: "Buterbaugh, Kevin L" 
To: gpfsug main discussion list 
Subject: [gpfsug-discuss] mmaddcallback documentation issue
Message-ID: <735f4275-191a-4363-b98c-1ea289292...@vanderbilt.edu>
Content-Type: text/plain; charset="utf-8"

Hi All,

So I?m _still_ reading about and testing various policies for file placement 
and migration on our test cluster (which is now running GPFS 5).

On page 392 of the GPFS 5.0.0 Administration Guide it says:


To add a callback, run this command. The following command is on one line:

mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event 
lowDiskSpace --parms "%eventName %fsName --single-instance


The --single-instance flag is required to avoid running multiple migrations on 
the file system at the same time.

However, trying to issue that command gives:

mmaddcallback: Incorrect option: --single-instance

And the man page for mmaddcallback doesn?t mention it or anything similar to 
it.  Now my test cluster is running GPFS 5.0.1.1, so is this something that was 
added in GPFS 5.0.0 and then subsequently removed?

I can?t find the GPFS 5.0.1 Administration Guide with a Google search.

Thanks?

Kevin

?
Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced 
Computing Center for Research and Education 
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633



-- next part --
An HTML attachment w

[gpfsug-discuss] mmaddcallback documentation issue

2018-08-06 Thread Buterbaugh, Kevin L

Hi All,

So I’m _still_ reading about and testing various policies for file placement 
and migration on our test cluster (which is now running GPFS 5).

On page 392 of the GPFS 5.0.0 Administration Guide it says:


To add a callback, run this command. The following command is on one line:

mmaddcallback MIGRATION --command /usr/lpp/mmfs/bin/mmstartpolicy --event 
lowDiskSpace
--parms "%eventName %fsName --single-instance


The --single-instance flag is required to avoid running multiple migrations on 
the file system at the same time.

However, trying to issue that command gives:

mmaddcallback: Incorrect option: --single-instance

And the man page for mmaddcallback doesn’t mention it or anything similar to 
it.  Now my test cluster is running GPFS 5.0.1.1, so is this something that was 
added in GPFS 5.0.0 and then subsequently removed?

I can’t find the GPFS 5.0.1 Administration Guide with a Google search.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem?

2018-08-06 Thread Buterbaugh, Kevin L

Hi All,

So I was just reading the GPFS 5.0.0 Administration Guide (yes, I actually do 
look at the documentation even if it seems sometimes that I don’t!) for some 
other information and happened to come across this at the bottom of page 358:


The --metadata-block-size flag on the mmcrfs command can be used to create a 
system pool with a different block size from the user pools. This can be 
especially beneficial if the default block size is larger than 1 MB. If data 
and metadata block sizes differ, the system pool must contain only metadataOnly 
disks.

Given that one of the responses I received during this e-mail thread was from 
an IBM engineer basically pointing out that there is no benefit in setting the 
metadata-block-size to less than 4 MB if that’s what I want for the filesystem 
block size, this might be a candidate for a documentation update.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem?

2018-08-03 Thread Buterbaugh, Kevin L

Hi All,

Aargh - now I really do feel like an idiot!  I had set up the stanza file over 
a week ago … then had to work on production issues … and completely forgot 
about setting the block size in the pool stanzas there.  But at least we all 
now know that stanza files override command line arguments to mmcrfs.

My apologies…

Kevin

On Aug 3, 2018, at 1:01 AM, Olaf Weiser 
mailto:olaf.wei...@de.ibm.com>> wrote:

Can u share your stanza file ?

Von meinem iPhone gesendet

Am 02.08.2018 um 23:15 schrieb Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>>:

OK, so hold on … NOW what’s going on???  I deleted the filesystem … went to 
lunch … came back an hour later … recreated the filesystem with a metadata 
block size of 4 MB … and I STILL have a 1 MB block size in the system pool and 
the wrong fragment size in other pools…

Kevin

/root/gpfs
root@testnsd1# mmdelfs gpfs5
All data on the following disks of gpfs5 will be destroyed:
test21A3nsd
test21A4nsd
test21B3nsd
test21B4nsd
test23Ansd
test23Bnsd
test23Cnsd
test24Ansd
test24Bnsd
test24Cnsd
test25Ansd
test25Bnsd
test25Cnsd
Completed deletion of file system /dev/gpfs5.
mmdelfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
/root/gpfs
root@testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 
-j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v 
yes --nofilesetdf --metadata-block-size 4M

The following disks of gpfs5 will be formatted on node testnsd3:
test21A3nsd: size 953609 MB
test21A4nsd: size 953609 MB
test21B3nsd: size 953609 MB
test21B4nsd: size 953609 MB
test23Ansd: size 15259744 MB
test23Bnsd: size 15259744 MB
test23Cnsd: size 1907468 MB
test24Ansd: size 15259744 MB
test24Bnsd: size 15259744 MB
test24Cnsd: size 1907468 MB
test25Ansd: size 15259744 MB
test25Bnsd: size 15259744 MB
test25Cnsd: size 1907468 MB
Formatting file system ...
Disks up to size 8.29 TB can be added to storage pool system.
Disks up to size 16.60 TB can be added to storage pool raid1.
Disks up to size 132.62 TB can be added to storage pool raid6.
Creating Inode File
  12 % complete on Thu Aug  2 13:16:26 2018
  25 % complete on Thu Aug  2 13:16:31 2018
  38 % complete on Thu Aug  2 13:16:36 2018
  50 % complete on Thu Aug  2 13:16:41 2018
  62 % complete on Thu Aug  2 13:16:46 2018
  74 % complete on Thu Aug  2 13:16:52 2018
  85 % complete on Thu Aug  2 13:16:57 2018
  96 % complete on Thu Aug  2 13:17:02 2018
 100 % complete on Thu Aug  2 13:17:03 2018
Creating Allocation Maps
Creating Log Files
   3 % complete on Thu Aug  2 13:17:09 2018
  28 % complete on Thu Aug  2 13:17:15 2018
  53 % complete on Thu Aug  2 13:17:20 2018
  78 % complete on Thu Aug  2 13:17:26 2018
 100 % complete on Thu Aug  2 13:17:27 2018
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool system
  98 % complete on Thu Aug  2 13:17:34 2018
 100 % complete on Thu Aug  2 13:17:34 2018
Formatting Allocation Map for storage pool raid1
  52 % complete on Thu Aug  2 13:17:39 2018
 100 % complete on Thu Aug  2 13:17:43 2018
Formatting Allocation Map for storage pool raid6
  24 % complete on Thu Aug  2 13:17:48 2018
  50 % complete on Thu Aug  2 13:17:53 2018
  74 % complete on Thu Aug  2 13:17:58 2018
  99 % complete on Thu Aug  2 13:18:03 2018
 100 % complete on Thu Aug  2 13:18:03 2018
Completed creation of file system /dev/gpfs5.
mmcrfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
/root/gpfs
root@testnsd1# mmlsfs gpfs5
flagvaluedescription
---  ---
 -f 8192 Minimum fragment (subblock) size 
in bytes (system pool)
32768Minimum fragment (subblock) size 
in bytes (other pools)
 -i 4096 Inode size in bytes
 -I 32768Indirect block size in bytes
 -m 2Default number of metadata replicas
 -M 3Maximum number of metadata replicas
 -r 1Default number of data replicas
 -R 3Maximum number of data replicas
 -j scatter  Block allocation type
 -D nfs4 File locking semantics in effect
 -k all  ACL semantics in effect
 -n 32   Estimated number of nodes that 
will mount file system
 -B 1048576  Block size (system pool)
4194304  Block size (other pools)
 -Q user;g

Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem?

2018-08-02 Thread Buterbaugh, Kevin L

 Yes  Exact mtime mount option
 -S relatime Suppress atime mount option
 -K whenpossible Strict replica allocation option
 --fastea   Yes  Fast external attributes enabled?
 --encryption   No   Encryption enabled?
 --inode-limit  101095424Maximum number of inodes
 --log-replicas 0Number of log replicas
 --is4KAligned  Yes  is4KAligned?
 --rapid-repair Yes  rapidRepair enabled?
 --write-cache-threshold 0   HAWC Threshold (max 65536)
 --subblocks-per-full-block 128  Number of subblocks per full block
 -P system;raid1;raid6   Disk storage pools in file system
 --file-audit-log   No   File Audit Logging enabled?
 --maintenance-mode No   Maintenance Mode enabled?
 -d 
test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd
  Disks in file system
 -A yes  Automatic mount option
 -o none Additional mount options
 -T /gpfs5   Default mount point
 --mount-priority   0Mount priority
/root/gpfs
root@testnsd1#


—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi All,

Thanks for all the responses on this, although I have the sneaking suspicion 
that the most significant thing that is going to come out of this thread is the 
knowledge that Sven has left IBM for DDN.  ;-) or :-( or :-O depending on your 
perspective.

Anyway … we have done some testing which has shown that a 4 MB block size is 
best for those workloads that use “normal” sized files.  However, we - like 
many similar institutions - support a mixed workload, so the 128K fragment size 
that comes with that is not optimal for the primarily biomedical type 
applications that literally create millions of very small files.  That’s why we 
settled on 1 MB as a compromise.

So we’re very eager to now test with GPFS 5, a 4 MB block size, and a 8K 
fragment size.  I’m recreating my test cluster filesystem now with that config 
… so 4 MB block size on the metadata only system pool, too.

Thanks to all who took the time to respond to this thread.  I hope it’s been 
beneficial to others as well…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Aug 1, 2018, at 7:11 PM, Andrew Beattie 
mailto:abeat...@au1.ibm.com>> wrote:

I too would second the comment about doing testing specific to your environment

We recently deployed a number of ESS building blocks into a customer site that 
was specifically being used for a mixed HPC workload.

We spent more than a week playing with different block sizes for both data and 
metadata trying to identify which variation would provide the best mix of both 
metadata performance and data performance.  one thing we noticed very early on 
is that MDtest and IOR both respond very differently as you play with both 
block size and subblock size.  What works for one use case may be a very poor 
option for another use case.

Interestingly enough it turned out that the best overall option for our 
particular use case was an 8MB block size with 32k sub blocks -- as that gave 
us good Metadata performance and good sequential data performance

which is probably why 32k sub block was the default for so many years 
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeat...@au1.ibm.com<mailto:abeat...@au1.ibm.com>


- Original message -
From: "Marc A Kaplan" mailto:makap...@us.ibm.com>>
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Cc:
Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 
filesystem?
Date: Thu, Aug 2, 2018 10:01 AM

Firstly, I do suggest that you run some tests and see how much, if any, 
difference the settings that are available make in performance and/or storage 
utilization.

Secondly, as I and others have hinted at, deeper in the system, there may be 
additional parameters and settings.  Sometimes they are available via commands, 
and/or configuration settings, sometimes not.

Sometimes that's

Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem?

2018-08-02 Thread Buterbaugh, Kevin L

Hi All,

Thanks for all the responses on this, although I have the sneaking suspicion 
that the most significant thing that is going to come out of this thread is the 
knowledge that Sven has left IBM for DDN.  ;-) or :-( or :-O depending on your 
perspective.

Anyway … we have done some testing which has shown that a 4 MB block size is 
best for those workloads that use “normal” sized files.  However, we - like 
many similar institutions - support a mixed workload, so the 128K fragment size 
that comes with that is not optimal for the primarily biomedical type 
applications that literally create millions of very small files.  That’s why we 
settled on 1 MB as a compromise.

So we’re very eager to now test with GPFS 5, a 4 MB block size, and a 8K 
fragment size.  I’m recreating my test cluster filesystem now with that config 
… so 4 MB block size on the metadata only system pool, too.

Thanks to all who took the time to respond to this thread.  I hope it’s been 
beneficial to others as well…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Aug 1, 2018, at 7:11 PM, Andrew Beattie 
mailto:abeat...@au1.ibm.com>> wrote:

I too would second the comment about doing testing specific to your environment

We recently deployed a number of ESS building blocks into a customer site that 
was specifically being used for a mixed HPC workload.

We spent more than a week playing with different block sizes for both data and 
metadata trying to identify which variation would provide the best mix of both 
metadata performance and data performance.  one thing we noticed very early on 
is that MDtest and IOR both respond very differently as you play with both 
block size and subblock size.  What works for one use case may be a very poor 
option for another use case.

Interestingly enough it turned out that the best overall option for our 
particular use case was an 8MB block size with 32k sub blocks -- as that gave 
us good Metadata performance and good sequential data performance

which is probably why 32k sub block was the default for so many years 
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeat...@au1.ibm.com


- Original message -
From: "Marc A Kaplan" mailto:makap...@us.ibm.com>>
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Cc:
Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 
filesystem?
Date: Thu, Aug 2, 2018 10:01 AM

Firstly, I do suggest that you run some tests and see how much, if any, 
difference the settings that are available make in performance and/or storage 
utilization.

Secondly, as I and others have hinted at, deeper in the system, there may be 
additional parameters and settings.  Sometimes they are available via commands, 
and/or configuration settings, sometimes not.

Sometimes that's just because we didn't want to overwhelm you or ourselves with 
yet more "tuning knobs".

Sometimes it's because we made some component more tunable than we really 
needed, but did not make all the interconnected components equally or as widely 
tunable.
Sometimes it's because we want to save you from making ridiculous settings that 
would lead to problems...

OTOH, as I wrote before, if a burning requirement surfaces, things may change 
from release to release... Just as for so many years subblocks per block seemed 
forever frozen at the number 32.  Now it varies... and then the discussion 
shifts to why can't it be even more flexible?


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Hi Sven (and Stephen and everyone else),

I know there are certainly things you know but can’t talk about, but I suspect 
that I am not the only one to wonder about the possible significance of “with 
the released code” in your response below?!?

I understand the technical point you’re making and maybe the solution for me is 
to just use a 4 MB block size for my metadata only system pool?  As Stephen 
Ulmer said in his response … ("Why the desire for a 1MB block size for 
metadata? It is RAID1 so no re-write penalty or need to hit a stripe size. Are 
you just trying to save the memory?  If you had a 4MB block size, an 8KB 
sub-block size and things were 4K-aligned, you would always read 2 4K inodes,”) 
… so if I’m using RAID 1 with 4K inodes then am I gaining anything by going 
with a smaller block size for metadata?

So why was I choosing 1 MB in the first place?  Well, I was planning on doing 
some experimenting with different block sizes for metadata to see if it made 
any difference.  Historically, we had used a metadata block size of 64K to 
match the hardware “stripe” size on the storage arrays (RAID 1 mirrors of hard 
drives back in the day).  Now our metadata is on SSDs so with our latest 
filesystem we used 1 MB for both data and metadata because of the 1/32nd 
sub-block thing in GPFS 4.x.  Since GPFS 5 removes that restriction, I was 
going to do some experimenting, but if the correct answer is just “if 4 MB is 
what’s best for your data, then use it for metadata too” then I don’t mind 
saving some time…. ;-)

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Aug 1, 2018, at 4:01 PM, Sven Oehme 
mailto:oeh...@gmail.com>> wrote:

the only way to get max number of subblocks for a 5.0.x filesystem with the 
released code is to have metadata and data use the same blocksize.

sven

On Wed, Aug 1, 2018 at 11:52 AM Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:
All,

Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB … so does this 
go back to what Marc is saying that there’s really only one sub blocks per 
block parameter?  If so, is there any way to get what I want as described below?

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi Sven,

OK … but why?  I mean, that’s not what the man page says.  Where does that “4 
x” come from?

And, most importantly … that’s not what I want.  I want a smaller block size 
for the system pool since it’s metadata only and on RAID 1 mirrors (HD’s on the 
test cluster but SSD’s on the production cluster).  So … side question … is 1 
MB OK there?

But I want a 4 MB block size for data with an 8 KB sub block … I want good 
performance for the sane people using our cluster without unduly punishing the 
… ahem … fine folks whose apps want to create a bazillion tiny files!

So how do I do that?

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 1, 2018, at 1:41 PM, Sven Oehme 
mailto:oeh...@gmail.com>> wrote:

the number of subblocks is derived by the smallest blocksize in any pool of a 
given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in 
the metadata pool, but 4 x of that in the data pool if your data pool is 4M.

sven
On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop 
mailto:k...@us.ibm.com>> wrote:

Marc, Kevin,

We'll be looking into this issue, since at least at a first glance, it does 
look odd. A 4MB block size should have resulted in an 8KB subblock size. I 
suspect that, somehow, the --metadata-block-size 1M may have resulted in

32768 Minimum fragment (subblock) size in bytes (other pools)


but I do not yet understand how.

The subblocks-per-full-block parameter is not supported with mmcrfs .

Felipe


Felipe Knop k...@us.ibm.com<mailto:k...@us.ibm.com>
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314




"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into 
all the details but here's a clue -- notice there is only one "subblocks-per-

From: "Marc A Kaplan" mailto:makap...@us.ibm.com>>


To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>

Date: 08/01/2018 01:21 PM
Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Se

Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Hi Sven,

OK … but why?  I mean, that’s not what the man page says.  Where does that “4 
x” come from?

And, most importantly … that’s not what I want.  I want a smaller block size 
for the system pool since it’s metadata only and on RAID 1 mirrors (HD’s on the 
test cluster but SSD’s on the production cluster).  So … side question … is 1 
MB OK there?

But I want a 4 MB block size for data with an 8 KB sub block … I want good 
performance for the sane people using our cluster without unduly punishing the 
… ahem … fine folks whose apps want to create a bazillion tiny files!

So how do I do that?

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 1, 2018, at 1:41 PM, Sven Oehme 
mailto:oeh...@gmail.com>> wrote:

the number of subblocks is derived by the smallest blocksize in any pool of a 
given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in 
the metadata pool, but 4 x of that in the data pool if your data pool is 4M.

sven


On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop 
mailto:k...@us.ibm.com>> wrote:

Marc, Kevin,

We'll be looking into this issue, since at least at a first glance, it does 
look odd. A 4MB block size should have resulted in an 8KB subblock size. I 
suspect that, somehow, the --metadata-block-size 1M may have resulted in

32768 Minimum fragment (subblock) size in bytes (other pools)


but I do not yet understand how.

The subblocks-per-full-block parameter is not supported with mmcrfs .

Felipe


Felipe Knop k...@us.ibm.com<mailto:k...@us.ibm.com>
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314




"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into 
all the details but here's a clue -- notice there is only one "subblocks-per-

From: "Marc A Kaplan" mailto:makap...@us.ibm.com>>


To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>

Date: 08/01/2018 01:21 PM
Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





I haven't looked into all the details but here's a clue -- notice there is only 
one "subblocks-per-full-block" parameter.

And it is the same for both metadata blocks and datadata blocks.

So maybe (MAYBE) that is a constraint somewhere...

Certainly, in the currently supported code, that's what you get.




From: "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date: 08/01/2018 12:55 PM
Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving 
to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am 
setting up a new filesystem there using hardware that we recently life-cycled 
out of our production environment.

I “successfully” created a filesystem but I believe the sub-block size is 
wrong. I’m using a 4 MB filesystem block size, so according to the mmcrfs man 
page the sub-block size should be 8K:

Table 1. Block sizes and subblock sizes

+‐‐‐+‐‐‐+
| Block size | Subblock size |
+‐‐‐+‐‐‐+
| 64 KiB | 2 KiB |
+‐‐‐+‐‐‐+
| 128 KiB | 4 KiB |
+‐‐‐+‐‐‐+
| 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB |
| MiB, 4 MiB | |
+‐‐‐+‐‐‐+
| 8 MiB, 16 MiB | 16 KiB |
+‐‐‐+‐‐‐+

However, it appears that it’s 8K for the system pool but 32K for the other 
pools:

flag value description
---  ---
-f 8192 Minimum fragment (subblock) size in bytes (system pool)
32768 Minimum fragment (subblock) size in bytes (other pools)
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 2 Default number of metadata replicas
-M 3 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 3 Maximum number of data replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-n 32 Estimated number of nodes that will mount file system
-B 1048576 Block size (system pool)
4194304 Block size (other pools)
-Q user;group;fileset Qu

Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

All,

Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB … so does this 
go back to what Marc is saying that there’s really only one sub blocks per 
block parameter?  If so, is there any way to get what I want as described below?

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi Sven,

OK … but why?  I mean, that’s not what the man page says.  Where does that “4 
x” come from?

And, most importantly … that’s not what I want.  I want a smaller block size 
for the system pool since it’s metadata only and on RAID 1 mirrors (HD’s on the 
test cluster but SSD’s on the production cluster).  So … side question … is 1 
MB OK there?

But I want a 4 MB block size for data with an 8 KB sub block … I want good 
performance for the sane people using our cluster without unduly punishing the 
… ahem … fine folks whose apps want to create a bazillion tiny files!

So how do I do that?

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 1, 2018, at 1:41 PM, Sven Oehme 
mailto:oeh...@gmail.com>> wrote:

the number of subblocks is derived by the smallest blocksize in any pool of a 
given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in 
the metadata pool, but 4 x of that in the data pool if your data pool is 4M.

sven


On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop 
mailto:k...@us.ibm.com>> wrote:

Marc, Kevin,

We'll be looking into this issue, since at least at a first glance, it does 
look odd. A 4MB block size should have resulted in an 8KB subblock size. I 
suspect that, somehow, the --metadata-block-size 1M may have resulted in

32768 Minimum fragment (subblock) size in bytes (other pools)


but I do not yet understand how.

The subblocks-per-full-block parameter is not supported with mmcrfs .

Felipe


Felipe Knop k...@us.ibm.com<mailto:k...@us.ibm.com>
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314 T/L 293-9314




"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into 
all the details but here's a clue -- notice there is only one "subblocks-per-

From: "Marc A Kaplan" mailto:makap...@us.ibm.com>>


To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>

Date: 08/01/2018 01:21 PM
Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





I haven't looked into all the details but here's a clue -- notice there is only 
one "subblocks-per-full-block" parameter.

And it is the same for both metadata blocks and datadata blocks.

So maybe (MAYBE) that is a constraint somewhere...

Certainly, in the currently supported code, that's what you get.




From: "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date: 08/01/2018 12:55 PM
Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving 
to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am 
setting up a new filesystem there using hardware that we recently life-cycled 
out of our production environment.

I “successfully” created a filesystem but I believe the sub-block size is 
wrong. I’m using a 4 MB filesystem block size, so according to the mmcrfs man 
page the sub-block size should be 8K:

Table 1. Block sizes and subblock sizes

+‐‐‐+‐‐‐+
| Block size | Subblock size |
+‐‐‐+‐‐‐+
| 64 KiB | 2 KiB |
+‐‐‐+‐‐‐+
| 128 KiB | 4 KiB |
+‐‐‐+‐‐‐+
| 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB |
| MiB, 4 MiB | |
+‐‐‐+‐‐‐+
| 8 MiB, 16 MiB | 16 KiB |
+‐‐‐+‐‐‐+

However, it appears that it’s 8K for the system pool but 32K for the other 
pools:

flag value description
---  ---
-f 8192 Minimum

Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Hi Marc,

Thanks for the response … I understand what you’re saying, but since I’m asking 
for a 1 MB block size for metadata and a 4 MB block size for data and according 
to the chart in the mmcrfs man page both result in an 8 KB sub block size I’m 
still confused as to why I’ve got a 32 KB sub block size for my non-system 
(i.e. data) pools?  Especially when you consider that 32 KB isn’t the default 
even if I had chosen an 8 or 16 MB block size!

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Aug 1, 2018, at 12:21 PM, Marc A Kaplan 
mailto:makap...@us.ibm.com>> wrote:

I haven't looked into all the details but here's a clue -- notice there is only 
one "subblocks-per-full-block" parameter.

And it is the same for both metadata blocks and datadata blocks.

So maybe (MAYBE) that is a constraint somewhere...

Certainly, in the currently supported code, that's what you get.

From:    "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To:gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:08/01/2018 12:55 PM
Subject:[gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi All,

Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving 
to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1.  I 
am setting up a new filesystem there using hardware that we recently 
life-cycled out of our production environment.

I “successfully” created a filesystem but I believe the sub-block size is 
wrong.  I’m using a 4 MB filesystem block size, so according to the mmcrfs man 
page the sub-block size should be 8K:

 Table 1. Block sizes and subblock sizes

+‐‐‐+‐‐‐+
| Block size| Subblock size |
+‐‐‐+‐‐‐+
| 64 KiB| 2 KiB |
+‐‐‐+‐‐‐+
| 128 KiB   | 4 KiB |
+‐‐‐+‐‐‐+
| 256 KiB, 512 KiB, 1 MiB, 2| 8 KiB |
| MiB, 4 MiB|   |
+‐‐‐+‐‐‐+
| 8 MiB, 16 MiB | 16 KiB|
+‐‐‐+‐‐‐+

However, it appears that it’s 8K for the system pool but 32K for the other 
pools:

flagvaluedescription
---  ---
 -f 8192 Minimum fragment (subblock) size 
in bytes (system pool)
32768Minimum fragment (subblock) size 
in bytes (other pools)
 -i 4096 Inode size in bytes
 -I 32768Indirect block size in bytes
 -m 2Default number of metadata replicas
 -M 3Maximum number of metadata replicas
 -r 1Default number of data replicas
 -R 3Maximum number of data replicas
 -j scatter  Block allocation type
 -D nfs4 File locking semantics in effect
 -k all  ACL semantics in effect
 -n 32   Estimated number of nodes that 
will mount file system
 -B 1048576  Block size (system pool)
4194304  Block size (other pools)
 -Q user;group;fileset   Quotas accounting enabled
user;group;fileset   Quotas enforced
none Default quotas enabled
 --perfileset-quota No   Per-fileset quota enforcement
 --filesetdfNo   Fileset df enabled?
 -V 19.01 (5.0.1.0)  File system version
 --create-time  Wed Aug  1 11:39:39 2018 File system creation time
 -z No   Is DMAPI enabled?
 -L 33554432 Logfile size
 -E Yes  Exact mtime mount option
 -S relatime Suppress atime mount option
 -K whenpossible Strict

[gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Hi All,

Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving 
to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1.  I 
am setting up a new filesystem there using hardware that we recently 
life-cycled out of our production environment.

I “successfully” created a filesystem but I believe the sub-block size is 
wrong.  I’m using a 4 MB filesystem block size, so according to the mmcrfs man 
page the sub-block size should be 8K:

 Table 1. Block sizes and subblock sizes

+‐‐‐+‐‐‐+
| Block size| Subblock size |
+‐‐‐+‐‐‐+
| 64 KiB| 2 KiB |
+‐‐‐+‐‐‐+
| 128 KiB   | 4 KiB |
+‐‐‐+‐‐‐+
| 256 KiB, 512 KiB, 1 MiB, 2| 8 KiB |
| MiB, 4 MiB|   |
+‐‐‐+‐‐‐+
| 8 MiB, 16 MiB | 16 KiB|
+‐‐‐+‐‐‐+

However, it appears that it’s 8K for the system pool but 32K for the other 
pools:

flagvaluedescription
---  ---
 -f 8192 Minimum fragment (subblock) size 
in bytes (system pool)
32768Minimum fragment (subblock) size 
in bytes (other pools)
 -i 4096 Inode size in bytes
 -I 32768Indirect block size in bytes
 -m 2Default number of metadata replicas
 -M 3Maximum number of metadata replicas
 -r 1Default number of data replicas
 -R 3Maximum number of data replicas
 -j scatter  Block allocation type
 -D nfs4 File locking semantics in effect
 -k all  ACL semantics in effect
 -n 32   Estimated number of nodes that 
will mount file system
 -B 1048576  Block size (system pool)
4194304  Block size (other pools)
 -Q user;group;fileset   Quotas accounting enabled
user;group;fileset   Quotas enforced
none Default quotas enabled
 --perfileset-quota No   Per-fileset quota enforcement
 --filesetdfNo   Fileset df enabled?
 -V 19.01 (5.0.1.0)  File system version
 --create-time  Wed Aug  1 11:39:39 2018 File system creation time
 -z No   Is DMAPI enabled?
 -L 33554432 Logfile size
 -E Yes  Exact mtime mount option
 -S relatime Suppress atime mount option
 -K whenpossible Strict replica allocation option
 --fastea   Yes  Fast external attributes enabled?
 --encryption   No   Encryption enabled?
 --inode-limit  101095424Maximum number of inodes
 --log-replicas 0Number of log replicas
 --is4KAligned  Yes  is4KAligned?
 --rapid-repair Yes  rapidRepair enabled?
 --write-cache-threshold 0   HAWC Threshold (max 65536)
 --subblocks-per-full-block 128  Number of subblocks per full block
 -P system;raid1;raid6   Disk storage pools in file system
 --file-audit-log   No   File Audit Logging enabled?
 --maintenance-mode No   Maintenance Mode enabled?
 -d 
test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd
  Disks in file system
 -A yes  Automatic mount option
 -o none Additional mount options
 -T /gpfs5   Default mount point
 --mount-priority   0Mount priority

Output of mmcrfs:

mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k 
all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes 
--nofilesetdf --metadata-block-size 1M

The following disks of gpfs5 will be formatted on node testnsd3:

Re: [gpfsug-discuss] Power9 / GPFS

2018-07-27 Thread Buterbaugh, Kevin L

Hi Simon,

Have you tried running it with the “—silent” flag, too?

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633

On Jul 27, 2018, at 10:18 AM, Simon Thompson 
mailto:s.j.thomp...@bham.ac.uk>> wrote:

I feel like I must be doing something stupid here but …

We’re trying to install GPFS onto some Power 9 AI systems we’ve just got…

So from Fix central, we download 
“Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install”, however we are 
failing to unpack the file:

./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install --dir 5.0.1.1 
--text-only

Extracting License Acceptance Process Tool to 5.0.1.1 ...
tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | 
tar -C 5.0.1.1 -xvz --exclude=installer --exclude=*_rpms --exclude=*rpm  
--exclude=*tgz --exclude=*deb 1> /dev/null

Installing JRE ...

If directory 5.0.1.1 has been created or was previously created during another 
extraction,
.rpm, .deb, and repository related files in it (if there were) will be removed 
to avoid conflicts with the ones being extracted.

tail -n +620 ./Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install | 
tar -C 5.0.1.1 --wildcards -xvz  ibm-java*tgz 1> /dev/null
tar -C 5.0.1.1/ -xzf 5.0.1.1/ibm-java*tgz

Invoking License Acceptance Process Tool ...
5.0.1.1/ibm-java-ppc64le-80/jre/bin/java -cp 5.0.1.1/LAP_HOME/LAPApp.jar 
com.ibm.lex.lapapp.LAP -l 5.0.1.1/LA_HOME -m 5.0.1.1 -s 5.0.1.1  -text_only
Unhandled exception
Type=Segmentation error vmState=0x
J9Generic_Signal_Number=0004 Signal_Number=000b Error_Value= 
Signal_Code=0001
Handler1=7FFFB194FC80 Handler2=7FFFB176EA40
R0=7FFFB176A0E8 R1=7FFFB23AC5D0 R2=7FFFB2737400 R3=
R4=7FFFB17D2AA4 R5=0006 R6= R7=7FFFAC12A3C0


This looks like the java runtime is failing during the license approval status.

First off, can someone confirm that 
“Spectrum_Scale_Data_Management-5.0.1.1-ppc64LE-Linux-install” is indeed the 
correct package we are downloading for Power9, and then any tips on how to 
extract the packages.

These systems are running the IBM factory shipped install of RedHat 7.5.

Thanks

Simon


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9660d98faa7b4241b52508d5f3d44462%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636683015365941338sdata=8%2BKtcv8Tm3S5OS67xX5lOZatL%2B7mHZ71HXgm6dalEmg%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] mmdiag --iohist question

2018-07-23 Thread Buterbaugh, Kevin L

Hi GPFS team,

Yes, that’s what we see, too … thanks.

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Jul 23, 2018, at 1:51 AM, IBM Spectrum Scale 
mailto:sc...@us.ibm.com>> wrote:


Hi

Please check the IO type before examining the IP address for the output of 
mmdiag --iohist. For the "lcl"(local) IO, the IP address is not necessary and 
we don't show it. Please check whether this is your case.

=== mmdiag: iohist ===

I/O history:

I/O start time RW Buf type disk:sectorNum nSec time ms Type Device/NSD ID NSD 
node
--- -- --- - - ---  
-- ---
01:14:08.450177 R inode 6:189513568 8 4.920 srv dm-4 192.168.116.92
01:14:08.450448 R inode 6:189513664 8 4.968 srv dm-4 192.168.116.92
01:14:08.475689 R inode 6:189428264 8 0.230 srv dm-4 192.168.116.92
01:14:08.983587 W logData 4:30686784 8 0.216 lcl dm-0
01:14:08.983601 W logData 3:25468480 8 0.197 lcl dm-8
01:14:08.983961 W inode 2:188808504 8 0.142 lcl dm-11
01:14:08.984144 W inode 1:188808504 8 0.134 lcl dm-7



Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D----0479=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbc6d7df8b9fb453b50bf08d5f068cc1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636679255263961402=adR3hLlARxW6mIqw%2Fw4e29V6QgBtkOvkAH8RgN2Tgeg%3D=0>.

If your query concerns a potential software error in Spectrum Scale (GPFS) and 
you have an IBM software maintenance contract please contact 1-800-237-5511 in 
the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for 
priority messages to the Spectrum Scale (GPFS) team.

"Buterbaugh, Kevin L" ---07/11/2018 10:34:32 PM---Hi All, Quick 
question about “mmdiag —iohist” that is not documented in the man page … what 
does it

From: "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To: gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date: 07/11/2018 10:34 PM
Subject: [gpfsug-discuss] mmdiag --iohist question
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>





Hi All,

Quick question about “mmdiag —iohist” that is not documented in the man page … 
what does it mean if the client IP address field is blank? That the NSD server 
itself issued the I/O? Or ???

This only happens occasionally … and the way I discovered it was that our 
Python script that takes “mmdiag —iohist” output, looks up the client IP for 
any waits above the threshold, converts that to a hostname, and queries SLURM 
for whose jobs are on that client started occasionally throwing an exception … 
and when I started looking at the “mmdiag —iohist” output itself I do see times 
when there is no client IP address listed for a I/O wait.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbc6d7df8b9fb453b50bf08d5f068cc1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636679255263971410=UjU%2BSAOBf4P9oCdFRxatJ58blR9YOgDKes3Y2%2FYRzV4%3D=0>



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbc6d7df8b9fb453b50bf08d5f068cc1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636679255264001433sdata=uSiXYheeOw%2F4%2BSls8lP3XO9w7i7dFc3UWEYa%2F8aIn%2B0%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] mmhealth - where is the info hiding?

2018-07-19 Thread Buterbaugh, Kevin L

Hi Valdis,

Is this what you’re looking for (from an IBMer in response to another question 
a few weeks back)?

assuming 4.2.3 code level this can be done by deleting and recreating the rule 
with changed settings:

# mmhealth thresholds list
### Threshold Rules ###
rule_namemetricerror  warn  
direction  filterBy  groupBy   
sensitivity

InodeCapUtil_RuleFileset_inode 90.0   80.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name  300
MetaDataCapUtil_Rule MetaDataPool_capUtil  90.0   80.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
DataCapUtil_Rule DataPool_capUtil  90.0   80.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
MemFree_Rule mem_memfree   5  10low 
 node   300

# mmhealth thresholds delete MetaDataCapUtil_Rule
The rule(s) was(were) deleted successfully


# mmhealth thresholds add MetaDataPool_capUtil --errorlevel 95.0 --warnlevel 
85.0 --direction high --sensitivity 300 --name MetaDataCapUtil_Rule --groupby 
gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name


#  mmhealth thresholds list
### Threshold Rules ###
rule_namemetricerror  warn  
direction  filterBy  groupBy 
sensitivity  

InodeCapUtil_RuleFileset_inode 90.0   80.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name  300
MemFree_Rule mem_memfree   5  10low 
 node   300
DataCapUtil_Rule DataPool_capUtil  90.0   80.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
MetaDataCapUtil_Rule MetaDataPool_capUtil  95.0   85.0  high
 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



On Jul 19, 2018, at 4:25 PM, 
valdis.kletni...@vt.edu wrote:

So I'm trying to tidy up things like 'mmhealth' etc.  Got most of it fixed, but 
stuck on
one thing..

Note: I already did a 'mmhealth node eventlog --clear -N all' yesterday, which
cleaned out a bunch of other long-past events that were "stuck" as failed /
degraded even though they were corrected days/weeks ago - keep this in mind as
you read on

# mmhealth cluster show

Component   Total Failed   DegradedHealthy  
Other
-
NODE   10  0  0 10  
0
GPFS   10  0  0 10  
0
NETWORK10  0  0 10  
0
FILESYSTEM  1  0  1  0  
0
DISK  102  0  0102  
0
CES 4  0  0  4  
0
GUI 1  0  0  1  
0
PERFMON10  0  0 10  
0
THRESHOLD  10  0  0 10  
0

Great.  One hit for 'degraded' filesystem.

# mmhealth node show --unhealthy -N all
(skipping all the nodes that show healthy)

Node name:  arnsd3-vtc.nis.internal
Node status:HEALTHY
Status Change:  21 hours ago

Component  StatusStatus Change Reasons
---
FILESYSTEM FAILED24 days ago   
pool-data_high_error(archive/system)
(...)
Node name:  arproto2-isb.nis.internal
Node status:HEALTHY
Status Change:  21 hours ago

Component  StatusStatus Change Reasons
--
FILESYSTEM DEGRADED  6 days ago
pool-data_high_warn(archive/system)

mmdf tells me:
nsd_isb_01131030056961 No   Yes  1747905536 ( 13%) 
111667200 ( 1%)
nsd_isb_02

Re: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace?

2018-07-15 Thread Buterbaugh, Kevin L

Hi All,

So I had noticed some waiters on my NSD servers that I thought were unrelated
to the mmchdisk. However, I decided to try rebooting my NSD servers one at a
time (mmshutdown failed!) to clear that up … and evidently one of them had
things hung up because the mmchdisk start completed.

Thanks…

Kevin

On Jul 15, 2018, at 12:34 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE
CORP] mailto:aaron.s.knis...@nasa.gov>> wrote:

Hmm...have you dumped waiters across the entire cluster or just on the NSD
servers/fs managers? Maybe there’s a slow node out there participating in the
suspend effort? Might be worth running some quick tracing on the FS manager to
see what it’s up to.

On July 15, 2018 at 13:27:54 EDT, Buterbaugh, Kevin L
mailto:kevin.buterba...@vanderbilt.edu>> wrote:
Hi All,

We are in a partial cluster downtime today to do firmware upgrades on our
storage arrays. It is a partial downtime because we have two GPFS filesystems:

1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which
I’ve unmounted across the cluster because it has data replication set to 1.

2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set
to two, so what we’re doing is “mmchdisk gpfs22 suspend -d ”,
then doing the firmware upgrade, and once the array is back we’re doing a
“mmchdisk gpfs22 resume -d ”, followed by “mmchdisk gpfs22 start -d ”.

On the 1st storage array this went very smoothly … the mmchdisk took about 5
minutes, which is what I would expect.

But on the 2nd storage array the mmchdisk appears to either be hung or
proceeding at a glacial pace. For more than an hour it’s been stuck at:

mmchdisk: Processing continues ...
Scanning file system metadata, phase 1 …

There are no waiters of any significance and “mmdiag —iohist” doesn’t show any
issues either.

Any ideas, anyone? Unless I can figure this out I’m hosed for this downtime,
as I’ve got 7 more arrays to do after this one!

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> -
(615)875-9633

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discussdata=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd518db52846a4be34e2208d5ea7a00d7%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636672732087040757sdata=m77IpWNOlODc%2FzLiYI2qiPo9Azs8qsIdXSY8%2FoC6Nn0%3Dreserved=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace?

2018-07-15 Thread Buterbaugh, Kevin L

Hi All,

We are in a partial cluster downtime today to do firmware upgrades on our 
storage arrays.  It is a partial downtime because we have two GPFS filesystems:

1.  gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which 
I’ve unmounted across the cluster because it has data replication set to 1.

2.  gpfs22 - 42 TB and which corresponds to /home.  It has data replication set 
to two, so what we’re doing is “mmchdisk gpfs22 suspend -d ”, 
then doing the firmware upgrade, and once the array is back we’re doing a 
“mmchdisk gpfs22 resume -d ”, followed by “mmchdisk gpfs22 start -d ”.

On the 1st storage array this went very smoothly … the mmchdisk took about 5 
minutes, which is what I would expect.

But on the 2nd storage array the mmchdisk appears to either be hung or 
proceeding at a glacial pace.  For more than an hour it’s been stuck at:

mmchdisk: Processing continues ...
Scanning file system metadata, phase 1 …

There are no waiters of any significance and “mmdiag —iohist” doesn’t show any 
issues either.

Any ideas, anyone?  Unless I can figure this out I’m hosed for this downtime, 
as I’ve got 7 more arrays to do after this one!

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] mmdiag --iohist question

2018-07-11 Thread Buterbaugh, Kevin L

Hi All,

Quick question about “mmdiag —iohist” that is not documented in the man page … 
what does it mean if the client IP address field is blank?  That the NSD server 
itself issued the I/O?  Or ???

This only happens occasionally … and the way I discovered it was that our 
Python script that takes “mmdiag —iohist” output, looks up the client IP for 
any waits above the threshold, converts that to a hostname, and queries SLURM 
for whose jobs are on that client started occasionally throwing an exception … 
and when I started looking at the “mmdiag —iohist” output itself I do see times 
when there is no client IP address listed for a I/O wait.

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] What NSDs does a file have blocks on?

2018-07-09 Thread Buterbaugh, Kevin L

Hi All,

I am still working on my issue of the occasional high I/O wait times and that 
has raised another question … I know that I can run mmfileid to see what files 
have a block on a given NSD, but is there a way to do the opposite?  I.e. I 
want to know what NSDs a single file has its’ blocks on?  The mmlsattr command 
does not appear to show this information unless it’s got an undocumented 
option.  Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] High I/O wait times

2018-07-06 Thread Buterbaugh, Kevin L

Hi All,

Another update on this issue as we have made significant progress today … but 
first let me address the two responses I received.

Alex - this is a good idea and yes, we did this today.  We did see some higher 
latencies on one storage array as compared to the others.  10-20 ms on the 
“good” storage arrays … 50-60 ms on the one storage array.  It took us a while 
to be able to do this because while the vendor provides a web management 
interface, that didn’t show this information.  But they have an actual app that 
will … and the Mac and Linux versions don’t work.  So we had to go scrounge up 
this thing called a Windows PC and get the software installed there.  ;-)

Jonathan - also a good idea and yes, we also did this today.  I’ll explain as 
part of the rest of this update.

The main thing that we did today that has turned out to be most revealing is to 
take a list of all the NSDs in the impacted storage pool … 19 devices spread 
out over 7 storage arrays … and run read dd tests on all of them (the 
/dev/dm-XX multipath device).  15 of them showed rates of 33 - 100+ MB/sec and 
the variation is almost definitely explained by the fact that they’re in 
production use and getting hit by varying amounts of “real” work.  But 4 of 
them showed rates of 2-10 MB/sec and those 4 all happen to be on storage array 
eon34.

So, to try to rule out everything but the storage array we replaced the FC 
cables going from the SAN switches to the array, plugging the new cables into 
different ports on the SAN switches.  Then we repeated the dd tests from a 
different NSD server, which both eliminated the NSD server and its’ FC cables 
as a potential cause … and saw results virtually identical to the previous 
test.  Therefore, we feel pretty confident that it is the storage array and 
have let the vendor know all of this.

And there’s another piece of quite possibly relevant info … the last week in 
May one of the controllers in this array crashed and rebooted (it’s a 
active-active dual controller array) … when that happened the failover occurred 
… with a major glitch.  One of the LUNs essentially disappeared … more 
accurately, it was there, but had no size!  We’ve been using this particular 
vendor for 15 years now and I have seen more than a couple of their controllers 
go bad during that time and nothing like this had ever happened before.  They 
were never able to adequately explain what happened there.  So what I am 
personally suspecting has happened is that whatever caused that one LUN to go 
MIA has caused these issues with the other LUNs on the array.  As an aside, we 
ended up using mmfileid to identify the files that had blocks on the MIA LUN 
and restored those from tape backup.

I want to thank everyone who has offered their suggestions so far.  I will 
update the list again once we have a definitive problem determination.

I hope that everyone has a great weekend.  In the immortal words of the wisest 
man who ever lived, “I’m kinda tired … think I’ll go home now.”  ;-)

Kevin

On Jul 6, 2018, at 12:13 PM, Alex Chekholko 
mailto:a...@calicolabs.com>> wrote:

Hi Kevin,

This is a bit of a "cargo cult" suggestion but one issue that I have seen is if 
a disk starts misbehaving a bit but does not fail, it slows down the whole raid 
group that it is in.  And the only way to detect it is to examine the 
read/write latencies on the individual disks.  Does your SAN allow you to do 
that?

That happened to me at least twice in my life and replacing the offending 
individual disk solved the issue.  This was on DDN, so the relevant command 
were something like 'show pd * counters write_lat' or similar, which showed the 
latency for the I/Os for each disk.  If one disk in the group is an outlier 
(e.g. 1s write latencies), then the whole raid array (LUN) is just waiting for 
that one disk.

Another possibility for troubleshooting, if you have sufficient free resources: 
you can just suspend the problematic LUNs in GPFS, as that will remove the 
write load from them, while still having them service read requests and not 
affecting users.

Regards,
Alex

On Fri, Jul 6, 2018 at 9:11 AM Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:
Hi Jim,

Thank you for your response.  We are taking a two-pronged approach at this 
point:

1.  While I don’t see anything wrong with our storage arrays, I have opened a 
ticket with the vendor (not IBM) to get them to look at things from that angle.

2.  Since the problem moves around from time to time, we are enhancing our 
monitoring script to see if we can basically go from “mmdiag —iohist” to 
“clients issuing those I/O requests” to “jobs running on those clients” to see 
if there is any commonality there.

Thanks again - much appreciated!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderb

Re: [gpfsug-discuss] High I/O wait times

2018-07-06 Thread Buterbaugh, Kevin L

Hi Jim,

Thank you for your response.  We are taking a two-pronged approach at this 
point:

1.  While I don’t see anything wrong with our storage arrays, I have opened a 
ticket with the vendor (not IBM) to get them to look at things from that angle.

2.  Since the problem moves around from time to time, we are enhancing our 
monitoring script to see if we can basically go from “mmdiag —iohist” to 
“clients issuing those I/O requests” to “jobs running on those clients” to see 
if there is any commonality there.

Thanks again - much appreciated!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Jul 6, 2018, at 8:02 AM, Jim Doherty 
mailto:jjdohe...@yahoo.com>> wrote:

You may want to get an mmtrace,  but I suspect that the disk IOs are slow. 
The iohist is showing the time from when the start IO was issued until it was 
finished.Of course if you have disk IOs taking 10x too long then other IOs 
are going to queue up behind it.If there are more IOs than there are NSD 
server threads then there are going to be IOs that are queued and waiting for a 
thread.

Jim


On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:


Hi All,

First off, my apologies for the delay in responding back to the list … we’ve 
actually been working our tails off on this one trying to collect as much data 
as we can on what is a very weird issue.  While I’m responding to Aaron’s 
e-mail, I’m going to try to address the questions raised in all the responses.

Steve - this all started last week.  You’re correct about our mixed workload.  
There have been no new workloads that I am aware of.

Stephen - no, this is not an ESS.  We are running GPFS 4.2.3-8.

Aaron - no, this is not on a DDN, either.

The hardware setup is a vanilla 8 GB FC SAN.  Commodity hardware for the 
servers and storage.  We have two SAN “stacks” and all NSD servers and storage 
are connected to both stacks.  Linux multipathing handles path failures.  10 
GbE out to the network.

We first were alerted to this problem by one of our monitoring scripts which 
was designed to alert us to abnormally high I/O times, which, as I mentioned 
previously, in our environment has usually been caused by cache battery backup 
failures in the storage array controllers (but _not_ this time).  So I’m 
getting e-mails that in part read:

Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms.
Disk eon34Ensd on nsd4 has a service time of 3146.715 ms.

The “34” tells me what storage array and the “C” or “E” tells me what LUN on 
that storage array.  As I’ve mentioned, those two LUNs are by far and away my 
most frequent problem children, but here’s another report from today as well:

Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms.
Disk eon28Ansd on nsd7 has a service time of 1154.002 ms.
Disk eon31Ansd on nsd3 has a service time of 1068.987 ms.
Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms.

NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8.

Based on Fred’s excellent advice, we took a closer look at the “mmfsadm dump 
nsd” output.  We wrote a Python script to pull out what we think is the most 
pertinent information:

nsd1
29 SMALL queues, 50 requests pending, 3741 was the highest number of requests 
pending.
348 threads started, 1 threads active, 348 was the highest number of 
threads active.
29 LARGE queues, 0 requests pending, 5694 was the highest number of requests 
pending.
348 threads started, 124 threads active, 348 was the highest number of 
threads active.
nsd2
29 SMALL queues, 0 requests pending, 1246 was the highest number of requests 
pending.
348 threads started, 13 threads active, 348 was the highest number of 
threads active.
29 LARGE queues, 470 requests pending, 2404 was the highest number of requests 
pending.
348 threads started, 340 threads active, 348 was the highest number of 
threads active.
nsd3
29 SMALL queues, 108 requests pending, 1796 was the highest number of requests 
pending.
348 threads started, 0 threads active, 348 was the highest number of 
threads active.
29 LARGE queues, 35 requests pending, 3331 was the highest number of requests 
pending.
348 threads started, 4 threads active, 348 was the highest number of 
threads active.
nsd4
42 SMALL queues, 0 requests pending, 1529 was the highest number of requests 
pending.
504 threads started, 8 threads active, 504 was the highest number of 
threads active.
42 LARGE queues, 0 requests pending, 637 was the highest number of requests 
pending.
504 threads started, 211 threads active, 504 was the highest number of 
threads active.
nsd5
42 SMALL queues, 182 requests pending, 2798 was the highest number of requests 
pending.
504 threads started, 6 threads active, 5

Re: [gpfsug-discuss] High I/O wait times

2018-07-05 Thread Buterbaugh, Kevin L

” on the LARGE queue side of things and that nsd2 
and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently 
in our alerts) are the heaviest loaded.

One other thing we have noted is that our home grown RRDtool monitoring plots 
that are based on netstat, iostat, vmstat, etc. also show an oddity.  Most of 
our LUNs show up as 33 - 68% utilized … but all the LUNs on eon34 (there are 4 
in total) show up as 93 - 97% utilized.  And another oddity there is that 
eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E 
show up wyyy more than anything else … the difference between them is 
that A and B are on the storage array itself and C and E are on JBOD’s 
SAS-attached to the storage array (and yes, we’ve actually checked and reseated 
those connections).

Another reason why I could not respond earlier today is that one of the things 
which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 
GB respectively to 64 GB each … and I then upped the pagepool on those two 
boxes to 40 GB.  That has not made a difference.  How can I determine how much 
of the pagepool is actually being used, BTW?  A quick Google search didn’t help 
me.

So we’re trying to figure out if we have storage hardware issues causing GPFS 
issues or GPFS issues causing storage slowdowns.  The fact that I see slowdowns 
most often on one storage array points in one direction, while the fact that at 
times I see even worse slowdowns on multiple other arrays points the other way. 
 The fact that some NSD servers show better stats than others in the analysis 
of the “mmfsadm dump nsd” output tells me … well, I don’t know what it tells me.

I think that’s all for now.  If you have read this entire very long e-mail, 
first off, thank you!  If you’ve read it and have ideas for where I should go 
from here, T-H-A-N-K Y-O-U!

Kevin

> On Jul 4, 2018, at 7:34 AM, Aaron Knister  wrote:
> 
> Hi Kevin,
> 
> Just going out on a very weird limb here...but you're not by chance seeing 
> this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 
> 14K, etc.) We just started seeing some very weird and high latency on some of 
> our SFA12ks (that have otherwise been solid both in terms of stability and 
> performance) but only on certain volumes and the affected volumes change. 
> It's very bizzarre and we've been working closely with DDN to track down the 
> root cause but we've not yet found a smoking gun. The timing and description 
> of your problem sounded eerily similar to what we're seeing so I'd thought 
> I'd ask.
> 
> -Aaron
> 
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> 
> 
> On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote:
> 
>> Hi all,
>> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of 
>> our NSDs as reported by “mmdiag —iohist" and are struggling to understand 
>> why.  One of the
>> confusing things is that, while certain NSDs tend to show the problem more 
>> than others, the problem is not consistent … i.e. the problem tends to move 
>> around from
>> NSD to NSD (and storage array to storage array) whenever we check … which is 
>> sometimes just a few minutes apart.
>> In the past when I have seen “mmdiag —iohist” report high wait times like 
>> this it has *always* been hardware related.  In our environment, the most 
>> common cause has
>> been a battery backup unit on a storage array controller going bad and the 
>> storage array switching to write straight to disk.  But that’s *not* 
>> happening this time.
>> Is there anything within GPFS / outside of a hardware issue that I should be 
>> looking for??  Thanks!
>> —
>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and Education
>> kevin.buterba...@vanderbilt.edu - (615)875-9633
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] High I/O wait times

2018-07-03 Thread Buterbaugh, Kevin L

Hi Fred,

I have a total of 48 NSDs served up by 8 NSD servers.  12 of those NSDs are in 
our small /home filesystem, which is performing just fine.  The other 36 are in 
our ~1 PB /scratch and /data filesystem, which is where the problem is.  Our 
max filesystem block size parameter is set to 16 MB, but the aforementioned 
filesystem uses a 1 MB block size.

nsdMaxWorkerThreads is set to 1024 as shown below.  Since each NSD server 
serves an average of 6 NSDs and 6 x 12 = 72 we’re OK if I’m understanding the 
calculation correctly.  Even multiplying 48 x 12 = 576, so we’re good?!?

Your help is much appreciated!  Thanks again…

Kevin

On Jul 3, 2018, at 4:53 PM, Frederick Stock 
mailto:sto...@us.ibm.com>> wrote:

How many NSDs are served by the NSD servers and what is your maximum file 
system block size?  Have you confirmed that you have sufficient NSD worker 
threads to handle the maximum number of IOs you are configured to have active?  
That would be the number of NSDs served times 12 (you have 12 threads per 
queue).

Fred
__
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
sto...@us.ibm.com<mailto:sto...@us.ibm.com>



From:    "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>>
To:gpfsug main discussion list 
mailto:gpfsug-discuss@spectrumscale.org>>
Date:07/03/2018 05:41 PM
Subject:Re: [gpfsug-discuss] High I/O wait times
Sent by:
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>




Hi Fred,

Thanks for the response.  I have been looking at the “mmfsadm dump nsd” data 
from the two NSD servers that serve up the two NSDs that most commonly 
experience high wait times (although, again, this varies from time to time).  
In addition, I have been reading:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fwikis%2Fhome%3Flang%3Den%23!%2Fwiki%2FGeneral+Parallel+File+System+(GPFS)%2Fpage%2FNSD+Server+Design+and+Tuning=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C7658e1b458b147ad8a3908d5e12f6982%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662516110903567=cWw5UipcO7HgupLQTFgOWVwXF%2B9b8S%2Fw935%2FeqG6xIY%3D=0>

And:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fwikis%2Fhome%3Flang%3Den%23!%2Fwiki%2FGeneral%2520Parallel%2520File%2520System%2520(GPFS)%2Fpage%2FNSD%2520Server%2520Tuning=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C7658e1b458b147ad8a3908d5e12f6982%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662516110903567=CAuOPOhC1MXdZW2e2HaVOY0PmySwP6FzlsvNNlteWZw%3D=0>

Which seem to be the most relevant documents on the Wiki.

I would like to do a more detailed analysis of the “mmfsadm dump nsd” output, 
but my preliminary looks at it seems to indicate that I see I/O’s queueing in 
the 50 - 100 range for the small queues and the 60 - 200 range on the large 
queues.

In addition, I am regularly seeing all 12 threads on the LARGE queues active, 
while it is much more rare that I see all - or even close to all - the threads 
on the SMALL queues active.

As far as the parameters Scott and Yuri mention, on our cluster they are set 
thusly:

[common]
nsdMaxWorkerThreads 640
[]
nsdMaxWorkerThreads 1024
[common]
nsdThreadsPerQueue 4
[]
nsdThreadsPerQueue 12
[common]
nsdSmallThreadRatio 3
[]
nsdSmallThreadRatio 1

So to me it sounds like I need more resources on the LARGE queue side of things 
… i.e. it sure doesn’t sound like I want to change my small thread ratio.  If I 
increase the amount of threads it sounds like that might help, but that also 
takes more pagepool, and I’ve got limited RAM in these (old) NSD servers.  I do 
have nsdbufspace set to 70, but I’ve only got 16-24 GB RAM each in these NSD 
servers.  And a while back I did try increase the page pool on them (very 
slightly) and ended up causing problems because then they ran out of physical 
RAM.

Thoughts?  Followup questions?  Thanks!

Kevin

On Jul 3, 2018, at 3:11 PM, Frederick Stock 
mailto:sto...@us.ibm.com>> wrote:

Are you seeing similar values for all the nodes or just some of them?  One 
possible issue is how the NSD queues are configured on the NSD servers.  You 
can see this with the output of "mmfsadm dump nsd".  There are queues for LARGE 
IOs (greater than 64K) and queues for SMALL IOs (64K or less).  Check the 
highest pending values to see if many IOs are queueing.  There are a couple of 
options to fix this but rather than explain them I suggest you look for 
information about N

Re: [gpfsug-discuss] High I/O wait times

2018-07-03 Thread Buterbaugh, Kevin L

Hi Fred,

Thanks for the response. I have been looking at the “mmfsadm dump nsd” data
from the two NSD servers that serve up the two NSDs that most commonly
experience high wait times (although, again, this varies from time to time).
In addition, I have been reading:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Design%20and%20Tuning

And:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/NSD%20Server%20Tuning

Which seem to be the most relevant documents on the Wiki.

I would like to do a more detailed analysis of the “mmfsadm dump nsd” output,
but my preliminary looks at it seems to indicate that I see I/O’s queueing in
the 50 - 100 range for the small queues and the 60 - 200 range on the large
queues.

In addition, I am regularly seeing all 12 threads on the LARGE queues active,
while it is much more rare that I see all - or even close to all - the threads
on the SMALL queues active.

As far as the parameters Scott and Yuri mention, on our cluster they are set
thusly:

[common]
nsdMaxWorkerThreads 640
[]
nsdMaxWorkerThreads 1024
[common]
nsdThreadsPerQueue 4
[]
nsdThreadsPerQueue 12
[common]
nsdSmallThreadRatio 3
[]
nsdSmallThreadRatio 1

So to me it sounds like I need more resources on the LARGE queue side of things
… i.e. it sure doesn’t sound like I want to change my small thread ratio. If I
increase the amount of threads it sounds like that might help, but that also
takes more pagepool, and I’ve got limited RAM in these (old) NSD servers. I do
have nsdbufspace set to 70, but I’ve only got 16-24 GB RAM each in these NSD
servers. And a while back I did try increase the page pool on them (very
slightly) and ended up causing problems because then they ran out of physical
RAM.

Thoughts? Followup questions? Thanks!

Kevin

On Jul 3, 2018, at 3:11 PM, Frederick Stock wrote:

Are you seeing similar values for all the nodes or just some of them? One
possible issue is how the NSD queues are configured on the NSD servers. You
can see this with the output of "mmfsadm dump nsd". There are queues for LARGE
IOs (greater than 64K) and queues for SMALL IOs (64K or less). Check the
highest pending values to see if many IOs are queueing. There are a couple of
options to fix this but rather than explain them I suggest you look for
information about NSD queueing on the developerWorks site. There has been
information posted there that should prove helpful.

Fred
__
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
sto...@us.ibm.com

From:"Buterbaugh, Kevin L"
To:gpfsug main discussion list
Date:07/03/2018 03:49 PM
Subject:[gpfsug-discuss] High I/O wait times
Sent by:gpfsug-discuss-boun...@spectrumscale.org

Hi all,

We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our
NSDs as reported by “mmdiag —iohist" and are struggling to understand why. One
of the confusing things is that, while certain NSDs tend to show the problem
more than others, the problem is not consistent … i.e. the problem tends to
move around from NSD to NSD (and storage array to storage array) whenever we
check … which is sometimes just a few minutes apart.

In the past when I have seen “mmdiag —iohist” report high wait times like this
it has *always* been hardware related. In our environment, the most common
cause has been a battery backup unit on a storage array controller going bad
and the storage array switching to write straight to disk. But that’s *not*
happening this time.

Is there anything within GPFS / outside of a hardware issue that I should be
looking for?? Thanks!

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938066010=jL0pB5MEaWtJZjMbS8JzhsKGvwmYB6qV%2FVyosdUKcSU%3D=0>

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd3d7ff675bb440286cb908d5e1212b66%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636662454938076014=wIyB66HoqvL13

[gpfsug-discuss] High I/O wait times

2018-07-03 Thread Buterbaugh, Kevin L

Hi all,

We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our 
NSDs as reported by “mmdiag —iohist" and are struggling to understand why.  One 
of the confusing things is that, while certain NSDs tend to show the problem 
more than others, the problem is not consistent … i.e. the problem tends to 
move around from NSD to NSD (and storage array to storage array) whenever we 
check … which is sometimes just a few minutes apart.

In the past when I have seen “mmdiag —iohist” report high wait times like this 
it has *always* been hardware related.  In our environment, the most common 
cause has been a battery backup unit on a storage array controller going bad 
and the storage array switching to write straight to disk.  But that’s *not* 
happening this time.

Is there anything within GPFS / outside of a hardware issue that I should be 
looking for??  Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] File system manager - won't change to new node

2018-06-22 Thread Buterbaugh, Kevin L

Hi Bob,

Have you tried explicitly moving it to a specific manager node?  That’s what I 
always do … I personally never let GPFS pick when I’m moving the management 
functions for some reason.  Thanks…

Kevin

On Jun 22, 2018, at 8:13 AM, Oesterlin, Robert 
mailto:robert.oester...@nuance.com>> wrote:

Any idea why I can’t force the file system manager off this node? I turned off 
the manager on the node (mmchnode --client) and used mmchmgr to move the other 
file systems off, but I can’t move this one. There are 6 other good choices for 
file system managers. I’ve never seen this message before.

[root@nrg1-gpfs01 ~]# mmchmgr dataeng
The best choice node 10.30.43.136 (nrg1-gpfs13) is already the manager for 
dataeng.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C46935624ea7048a9471608d5d841feb5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636652700325626997=Az9GZeDDG76lDLi02NSKYXsXK9EHy%2FT3vLAtaMrnpew%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Capacity pool filling

Hi Uwe,

Thanks for your response.

So our restore software lays down the metadata first, then the data.  While it 
has no specific knowledge of the extended attributes, it does back them up and 
restore them.  So the only explanation that makes sense to me is that since the 
inode for the file says that the file should be in the gpfs23capacity pool, the 
data gets written there.

Right now I don’t have time to do an analysis of the “live” version of a 
fileset and the “restored” version of that same fileset to see if the placement 
of the files matches up.  My quick and dirty checks seem to show files getting 
written to all 3 pools.  Unfortunately, we have no way to tell our tape 
software to ignore files from the gpfs23capacity pool (and we’re aware that we 
won’t need those files).  We’ve also determined that it is actually quicker to 
tell our tape system to restore all files from a fileset than to take the time 
to tell it to selectively restore only certain files … and the same amount of 
tape would have to be read in either case.

Our SysAdmin who is primary on tape backup and restore was going on vacation 
the latter part of the week, so he decided to be helpful and just queue up all 
the restores to run one right after the other.  We didn’t realize that, so we 
are solving our disk space issues by slowing down the restores until we can run 
more instances of the script that replaces the corrupted files and deletes the 
unneeded restored files.

Thanks again…

Kevin

> On Jun 7, 2018, at 1:34 PM, Uwe Falke  wrote:
> 
>> However, I took a look in one of the restore directories under 
>> /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! 
> 
> 
>> So ? I don?t think GPFS is doing this but the next thing I am 
>> going to do is follow up with our tape software vendor ? I bet 
>> they preserve the pool attribute on files and - like Jaime said - 
>> old stuff is therefore hitting the gpfs23capacity pool.
> 
> Hm, then the backup/restore must be doing very funny things. Usually, GPFS 
> should rule the 
> placement of new files, and I assume that a restore of a file, in 
> particular under a different name, 
> creates a new file. So, if your backup tool does override that GPFS 
> placement, it must be very 
> intimate with Scale :-). 
> I'd do some list scans of the capacity pool just to see what the files 
> appearing there from tape have in common. 
> If it's really that these files' data were on the capacity pool at the 
> last backup, they should not be affected by your dead NSD and a restore is 
> in vain anyway.
> 
> If that doesn't help or give no clue, then, if the data pool has some more 
> free  space, you might try to run an upward/backward migration from 
> capacity to data . 
> 
> And, yeah, as GPFS tends to stripe over all NSDs, all files in data large 
> enough plus some smaller ones would have data on your broken NSD. That's 
> the drawback of parallelization.
> Maybe you'd ask the storage vendor whether they supply some more storage 
> for the fault of their (redundant?) device to alleviate your current 
> storage shortage ?
> 
> Mit freundlichen Grüßen / Kind regards
> 
> 
> Dr. Uwe Falke
> 
> IT Specialist
> High Performance Computing Services / Integrated Technology Services / 
> Data Center Services
> ---
> IBM Deutschland
> Rathausstr. 7
> 09111 Chemnitz
> Phone: +49 371 6978 2165
> Mobile: +49 175 575 2877
> E-Mail: uwefa...@de.ibm.com
> ---
> IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: 
> Thomas Wolter, Sven Schooß
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
> HRB 17122 
> 
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cacad30699025407bc67b08d5cca54bca%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636639932669887596=vywTFbG4O0lquAIAVfa0csdC0HtpvfhY8%2FOjqm98fxI%3D=0

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Capacity pool filling

Hi again all,

I received a direct response and am not sure whether that means the sender did 
not want to be identified, but they asked good questions that I wanted to 
answer on list…

No, we do not use snapshots on this filesystem.

No, we’re not using HSM … our tape backup system is a traditional backup system 
not named TSM.  We’ve created a top level directory in the filesystem called 
“RESTORE” and are restoring everything under that … then doing our moves / 
deletes of what we’ve restored … so I *think* that means all of that should be 
written to the gpfs23data pool?!?

On the “plus” side, I may figure this out myself soon when someone / something 
starts getting I/O errors!  :-O

In the meantime, other ideas are much appreciated!

Kevin


Do you have a job that’s creating snapshots?  That’s an easy one to overlook.

Not sure if you are using an HSM. Any new file that gets generated should 
follow the default rule in ILM unless if meets a placement condition. It would 
only be if you’re using an HSM that files would be placed in a non-placement 
location pool but that is purely because the the file location has already been 
updated to the capacity pool.




On Thu, Jun 7, 2018 at 8:17 AM -0600, "Buterbaugh, Kevin L" 
mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi All,

First off, I’m on day 8 of dealing with two different mini-catastrophes at work 
and am therefore very sleep deprived and possibly missing something obvious … 
with that disclaimer out of the way…

We have a filesystem with 3 pools:  1) system (metadata only), 2) gpfs23data 
(the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with 
an atime - yes atime - of more than 90 days get migrated to by a script that 
runs out of cron each weekend.

However … this morning the free space in the gpfs23capacity pool is dropping … 
I’m down to 0.5 TB free in a 582 TB pool … and I cannot figure out why.  The 
migration script is NOT running … in fact, it’s currently disabled.  So I can 
only think of two possible explanations for this:

1.  There are one or more files already in the gpfs23capacity pool that someone 
has started updating.  Is there a way to check for that … i.e. a way to run 
something like “find /gpfs23 -mtime -7 -ls” but restricted to only files in the 
gpfs23capacity pool.  Marc Kaplan - can mmfind do that??  ;-)

2.  We are doing a large volume of restores right now because one of the 
mini-catastrophes I’m dealing with is one NSD (gpfs23data pool) down due to a 
issue with the storage array.  We’re working with the vendor to try to resolve 
that but are not optimistic so we have started doing restores in case they come 
back and tell us it’s not recoverable.  We did run “mmfileid” to identify the 
files that have one or more blocks on the down NSD, but there are so many that 
what we’re doing is actually restoring all the files to an alternate path 
(easier for out tape system), then replacing the corrupted files, then deleting 
any restores we don’t need.  But shouldn’t all of that be going to the 
gpfs23data pool?  I.e. even if we’re restoring files that are in the 
gpfs23capacity pool shouldn’t the fact that we’re restoring to an alternate 
path (i.e. not overwriting files with the tape restores) and the default pool 
is the gpfs23data pool mean that nothing is being restored to the 
gpfs23capacity pool???

Is there a third explanation I’m not thinking of?

Thanks...

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Capacity pool filling

Hi All,

So in trying to prove Jaime wrong I proved him half right … the cron job is 
stopped:

#13 22 * * 5 /root/bin/gpfs_migration.sh

However, I took a look in one of the restore directories under /gpfs23/ RESTORE 
using mmlsattr and I see files in all 3 pools!  So that explains why the 
capacity pool is filling, but mmlspolicy says:

Policy for file system '/dev/gpfs23':
   Installed by root@gpfsmgr on Wed Jan 25 10:17:01 2017.
   First line of policy 'gpfs23.policy' is:
RULE 'DEFAULT' SET POOL 'gpfs23data'

So … I don’t think GPFS is doing this but the next thing I am going to do is 
follow up with our tape software vendor … I bet they preserve the pool 
attribute on files and - like Jaime said - old stuff is therefore hitting the 
gpfs23capacity pool.

Thanks Jaime and everyone else who has responded so far…

Kevin

> On Jun 7, 2018, at 9:53 AM, Jaime Pinto  wrote:
> 
> I think the restore is is bringing back a lot of material with atime > 90, so 
> it is passing-trough gpfs23data and going directly to gpfs23capacity.
> 
> I also think you may not have stopped the crontab script as you believe you 
> did.
> 
> Jaime
> 
> Quoting "Buterbaugh, Kevin L" :
> 
>> Hi All,
>> 
>> First off, I?m on day 8 of dealing with two different  mini-catastrophes at 
>> work and am therefore very sleep deprived and  possibly missing something 
>> obvious ? with that disclaimer out of the  way?
>> 
>> We have a filesystem with 3 pools:  1) system (metadata only), 2)  
>> gpfs23data (the default pool if I run mmlspolicy), and 3)  gpfs23capacity 
>> (where files with an atime - yes atime - of more than  90 days get migrated 
>> to by a script that runs out of cron each  weekend.
>> 
>> However ? this morning the free space in the gpfs23capacity pool is  
>> dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot  figure 
>> out why.  The migration script is NOT running ? in fact, it?s  currently 
>> disabled.  So I can only think of two possible  explanations for this:
>> 
>> 1.  There are one or more files already in the gpfs23capacity pool  that 
>> someone has started updating.  Is there a way to check for that  ? i.e. a 
>> way to run something like ?find /gpfs23 -mtime -7 -ls? but  restricted to 
>> only files in the gpfs23capacity pool.  Marc Kaplan -  can mmfind do that??  
>> ;-)
>> 
>> 2.  We are doing a large volume of restores right now because one of  the 
>> mini-catastrophes I?m dealing with is one NSD (gpfs23data pool)  down due to 
>> a issue with the storage array.  We?re working with the  vendor to try to 
>> resolve that but are not optimistic so we have  started doing restores in 
>> case they come back and tell us it?s not  recoverable.  We did run 
>> ?mmfileid? to identify the files that have  one or more blocks on the down 
>> NSD, but there are so many that what  we?re doing is actually restoring all 
>> the files to an alternate path  (easier for out tape system), then replacing 
>> the corrupted files,  then deleting any restores we don?t need.  But 
>> shouldn?t all of that  be going to the gpfs23data pool?  I.e. even if we?re 
>> restoring files  that are in the gpfs23capacity pool shouldn?t the fact that 
>> we?re  restoring to an alternate path (i.e. not overwriting files with the  
>> tape restores) and the default pool is the gpfs23data pool mean that  
>> nothing is being restored to the gpfs23capacity pool???
>> 
>> Is there a third explanation I?m not thinking of?
>> 
>> Thanks...
>> 
>> ?
>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and Education
>> kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> -  
>> (615)875-9633
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 
> 
> 
>  TELL US ABOUT YOUR SUCCESS STORIES
> 
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.scinethpc.ca%2Ftestimonials=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C63663970107084=VUOqjEJ%2FWt8VI%2BWolWbpa1snbLx85XFJvc0sZPuI86Q%3D=0
> 
> ---
> Jaime Pinto - Storage Analyst
> SciNet HPC Consortium - Compute/Calcul Canada
> https://na01.safelinks.protection.outlook.com/?url=www.scinet.utoronto.ca=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C63663970107084=3PxI2hAdhUOJZp5d%2BjxOu1N0BoQr8X5K8xZG%2BcONjEU%3D=0

[gpfsug-discuss] Capacity pool filling