Re: [gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace?

2018-07-15 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Hmm...have you dumped waiters across the entire cluster or just on the NSD 
servers/fs managers? Maybe there’s a slow node out there participating in the 
suspend effort? Might be worth running some quick tracing on the FS manager to 
see what it’s up to.





On July 15, 2018 at 13:27:54 EDT, Buterbaugh, Kevin L 
 wrote:
Hi All,

We are in a partial cluster downtime today to do firmware upgrades on our 
storage arrays.  It is a partial downtime because we have two GPFS filesystems:

1.  gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which 
I’ve unmounted across the cluster because it has data replication set to 1.

2.  gpfs22 - 42 TB and which corresponds to /home.  It has data replication set 
to two, so what we’re doing is “mmchdisk gpfs22 suspend -d ”, 
then doing the firmware upgrade, and once the array is back we’re doing a 
“mmchdisk gpfs22 resume -d ”, followed by “mmchdisk gpfs22 start -d ”.

On the 1st storage array this went very smoothly … the mmchdisk took about 5 
minutes, which is what I would expect.

But on the 2nd storage array the mmchdisk appears to either be hung or 
proceeding at a glacial pace.  For more than an hour it’s been stuck at:

mmchdisk: Processing continues ...
Scanning file system metadata, phase 1 …

There are no waiters of any significance and “mmdiag —iohist” doesn’t show any 
issues either.

Any ideas, anyone?  Unless I can figure this out I’m hosed for this downtime, 
as I’ve got 7 more arrays to do after this one!

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu - 
(615)875-9633



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] RFE Process ... Burning Issues

2018-04-18 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
While I don’t own a DeLorean I work with someone who once fixed one up, which I 
*think* effectively means I can jump back in time to before the deadline to 
submit. (And let’s be honest, with the way HPC is going it feels like we have 
the requisite 1.21GW of power...) However, since I can’t actually time travel 
back to last week, is there any possibility of an extension?




On April 5, 2018 at 05:30:42 EDT, Simon Thompson (Spectrum Scale User Group 
Chair)  wrote:

Just a reminder that if you want to submit for the pilot RFE process, 
submissions must be in by end of next week.



Judging by the responses so far, apparently the product is perfect 



Simon



From:  on behalf of 
"ch...@spectrumscale.org" 
Reply-To: "gpfsug-discuss@spectrumscale.org" 
Date: Monday, 26 March 2018 at 12:52
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [gpfsug-discuss] RFE Process ... Burning Issues



Hi All,



We’ve been talking with product management about the RFE process and have 
agreed that we’ll try out a community-voting process.



First up, we are piloting this idea, hopefully it will work out, but it may 
also need tweaks as we move forward.



One of the things we’ve been asking for is for a better way for the Spectrum 
Scale user group community to vote on RFEs. Sure we get people posting to the 
list, but we’re looking at if we can make it a better/more formal process to 
support this. Talking with IBM, we also recognise that with a large number of 
RFEs, it can be difficult for them to track work tasks being completed, but 
with the community RFEs, there is a commitment to try and track them closely 
and report back on progress later in the year.



To submit an RFE using this process, you must complete the form available at:

https://ibm.box.com/v/EnhBlitz

(Enhancement Blitz template v1.pptx)



The form provides some guidance on a good and bad RFE. Sure a lot of us are 
techie/engineers, so please try to explain what problem you are solving rather 
than trying to provide a solution. (i.e. leave the technical implementation 
details to those with the source code).



Each site is limited to 2 submissions and they will be looked over by the 
Spectrum Scale community leaders, we may ask people to merge requests, send 
back for more info etc, or there may be some that we know will just never be 
progressed for various reasons.



At the April user group in the UK, we have an RFE (Burning issues) session 
planned. Submitters of the RFE will be expected to provide a 1-3 minute pitch 
for their RFE. We’ve placed the session at the end of the day (UK time) to try 
and ensure USA people can participate. Remote presentation of your RFE is fine 
and we plan to live-stream the session.



Each person will have 3 votes to choose what they think are their highest 
priority requests. Again remote voting is perfectly fine but only 3 votes per 
person.



The requests with the highest number of votes will then be given a higher 
chance of being implemented. There’s a possibility that some may even make the 
winter release cycle. Either way, we plan to track the “chosen” RFEs more 
closely and provide an update at the November USA meeting (likely the SC18 
one). The submission and voting process is also planned to be run again in time 
for the November meeting.



Anyone wanting to submit an RFE for consideration should submit the form by 
email to r...@spectrumscaleug.org *before* 
13th April. We’ll be posting the submitted RFEs up at the box site as well, you 
are encouraged to visit the site regularly and check the submissions as you may 
want to contact the author of an RFE to provide more information/support the 
RFE. Anything received after this date will be held over to the November cycle. 
The earlier you submit, the better chance it has of being included (we plan to 
limit the number to be considered) and will give us time to review the RFE and 
come back for more information/clarification if needed.



You must also be prepared to provide a 1-3 minute pitch for your RFE (in person 
or remote) for the UK user group meeting.



You are welcome to submit any RFE you have already put into the RFE portal for 
this process to garner community votes for it. There is space on the form to 
provide the existing RFE number.



If you have any comments on the process, you can also email them to 
r...@spectrumscaleug.org as well.



Thanks to Carl Zeite for supporting this plan…



Get submitting!



Simon

(UK Group Chair)


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Confusing I/O Behavior

2018-04-10 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
I hate admitting this but I’ve found something that’s got me stumped.

We have a user running an MPI job on the system. Each rank opens up several 
output files to which it writes ASCII debug information. The net result across 
several hundred ranks is an absolute smattering of teeny tiny I/o requests to 
te underlying disks which they don’t appreciate. Performance plummets. The I/o 
requests are 30 to 80 bytes in size. What I don’t understand is why these write 
requests aren’t getting batched up into larger write requests to the underlying 
disks.

If I do something like “df if=/dev/zero of=foo bs=8k” on a node I see that the 
nasty unaligned 8k io requests are batched up into nice 1M I/o requests before 
they hit the NSD.

As best I can tell the application isn’t doing any fsync’s and isn’t doing 
direct io to these files.

Can anyone explain why seemingly very similar io workloads appear to result in 
well formed NSD I/O in one case and awful I/o in another?

Thanks!

-Stumped


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Preferred NSD

2018-03-12 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Hi Lukas,

Check out FPO mode. That mimics Hadoop’s data placement features. You can have 
up to 3 replicas both data and metadata but still the downside, though, as you 
say is the wrong node failures will take your cluster down.

You might want to check out something like Excelero’s NVMesh (note: not an 
endorsement since I can’t give such things) which can create logical volumes 
across all your NVMe drives. The product has erasure coding on their roadmap. 
I’m not sure if they’ve released that feature yet but in theory it will give 
better fault tolerance *and* you’ll get more efficient usage of your SSDs.

I’m sure there are other ways to skin this cat too.

-Aaron




On March 12, 2018 at 10:59:35 EDT, Lukas Hejtmanek  wrote:
Hello,

I'm thinking about the following setup:
~ 60 nodes, each with two enterprise NVMe SSDs, FDR IB interconnected

I would like to setup shared scratch area using GPFS and those NVMe SSDs. Each
SSDs as on NSD.

I don't think like 5 or more data/metadata replicas are practical here. On the
other hand, multiple node failures is something really expected.

Is there a way to instrument that local NSD is strongly preferred to store
data? I.e. node failure most probably does not result in unavailable data for
the other nodes?

Or is there any other recommendation/solution to build shared scratch with
GPFS in such setup? (Do not do it including.)

--
Lukáš Hejtmánek
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] GPFS best practises : end user standpoint

2018-01-16 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Apologies for that. My mobile exchange email client has a like button you can 
tap in the email action menu. I just discovered it this morning accidentally 
and thought “wonder what this does. Better push it to find out.” Nothing 
happened or so I thought. Apparently all it does is make you look like a moron 
on public mailing lists. Doh!





On January 16, 2018 at 08:03:06 EST, Aaron Knister  
wrote:
Aaron Knister liked your message with Boxer.


On January 16, 2018 at 07:41:40 EST, Oesterlin, Robert 
 wrote:

Hi PMB



This is one of the areas where even the most experienced admins struggle. There 
is no single answer here. Much of it depends on your particular use case and 
the file system layout, storage choices, block sizes all go together. IBM has 
started to provide some templates for this (the one out is for Genomics) but 
much is left to do. If you could share some details on the overall user 
environment and data myself and others could offer some ideas.





Bob Oesterlin

Sr Principal Storage Engineer, Nuance



From:  on behalf of Brunet 
Pierre-Marie 
Reply-To: gpfsug main discussion list 
Date: Tuesday, January 16, 2018 at 4:51 AM
To: "gpfsug-discuss@spectrumscale.org" 
Subject: [EXTERNAL] [gpfsug-discuss] GPFS best practises : end user standpoint



Hi all GPFS-gurus !



We (hardly) try to train our users in order to improve how they use storage.



I found a lot of best practices on how to configure GPFS from the admin point 
of view but is there any documentation about how to best user a parallel 
filesystem like GPFS ?

I’m talking about very basic rules such as max number of files / subdir into a 
directory.



I know this is closely related to the FS and storage configuration but there 
may exist some common rules in order to achieve the best scalabilty, right ?



Thanks for your advices,

PMB

--

HPC architect

CNES, French space agency
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] more than one mlx connectx-4 adapter in same host

2017-12-20 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]


We’ve done a fair amount of VPI work but admittedly not with connectx4. Is it 
possible the cards are trying to talk IB rather than Eth? I figured you’re 
Ethernet based because of the mention of Juniper.

Are you attempting to do RoCE or just plain TCP/IP?


On December 20, 2017 at 14:40:48 EST, J. Eric Wonderley  
wrote:
Hello:

Does anyone have this type of config?

The host configuration looks sane but we seem to observe link-down on all mlx 
adapters no matter what we do.

Big picture is that we are attempting to do mc(multichassis)-lags to a core 
switch.  I'm somewhat fearful as to how this is implemented in the juniper 
switch we are about to test.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] gpfsug-discuss Digest, Vol 71, Issue 35

2017-12-19 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
It’s not supported on SLES11 either.


IBM didn’t (that I saw) talk much about this publicly or give customers a 
chance to provide feedback about the decision. I know it was raised at the UG 
in NY and I recall a number of people saying it would be a significant issue 
for them (myself included) as is the fact they no longer support Debian with 
scale 5.0.

I’d raised the issue on the mailing list after the UG trying to start the 
discussion but IBM said they weren’t ready to talk about it publicly and I can 
only guess they had already set their sights and didn’t actually want feedback. 
This is actually pretty frustrating. I’m tempted to open an RFE but most of my 
RFEs either have been rejected or just sit idle so I’m not clear there’s a 
benefit.



On December 19, 2017 at 03:08:27 EST, atmane khiredine  
wrote:
IBM Spectrum Scale V5.0 not support RHEL 6.x

Only RHEL 7.1 or later

https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#linux


Atmane Khiredine
HPC System Administrator | Office National de la Météorologie
Tél : +213 21 50 73 93 # 303 | Fax : +213 21 50 79 40 | E-mail : 
a.khired...@meteo.dz


De : gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] de la part de 
gpfsug-discuss-requ...@spectrumscale.org 
[gpfsug-discuss-requ...@spectrumscale.org]
Envoyé : lundi 18 décembre 2017 22:46
À : gpfsug-discuss@spectrumscale.org
Objet : gpfsug-discuss Digest, Vol 71, Issue 35

Send gpfsug-discuss mailing list submissions to
gpfsug-discuss@spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-requ...@spectrumscale.org

You can reach the person managing the list at
gpfsug-discuss-ow...@spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

1. Re: FW: Spectrum Scale 5.0 now available on Fix Central
(Michael L Taylor)
2. Re: gpfs 4.2.3.5 and RHEL 7.4... (Frederick Stock)
3. Re: gpfs 4.2.3.5 and RHEL 7.4... (Eric Horst)


--

Message: 1
Date: Mon, 18 Dec 2017 13:27:42 -0700
From: "Michael L Taylor" 
To: gpfsug-discuss@spectrumscale.org
Subject: Re: [gpfsug-discuss] FW: Spectrum Scale 5.0 now available on
Fix Central
Message-ID:


Content-Type: text/plain; charset="us-ascii"



Hi Bob,
Thanks for the note on 5.0.0
One correction however clusters can do a rolling upgrade to 5.0.0 from
any 4.2.x level (not just 4.2.3).




Today's Topics:

1. FW: Spectrum Scale 5.0 now available on Fix Central
(Oesterlin, Robert)


--

Message: 1
Date: Mon, 18 Dec 2017 19:43:35 +
From: "Oesterlin, Robert" 
To: gpfsug main discussion list 
Subject: [gpfsug-discuss] FW: Spectrum Scale 5.0 now available on Fix
Central
Message-ID: 
Content-Type: text/plain; charset="utf-8"

The Scale 5.0 fix level is now up on Fix Central.

You need to be at Scale 4.2.3 (cluster level) to do a rolling upgrade to
this level.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

From: "dw-not...@us.ibm.com" 
Reply-To: "dw-not...@us.ibm.com" 
Date: Monday, December 18, 2017 at 1:27 PM
Subject: [EXTERNAL] [Forums] 'g...@us.ibm.com' replied to the 'IBM Spectrum
Scale V5.0 announcements' topic thread in the 'General Parallel File System
- Announce (GPFS - Announce)' forum.

[mid:4b32cb5c696f2849bdef7df9eace884b72acd...@sdeb-exc02.meteo.dz/forums.png] 
g...@us.ibm.com<
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_developerworks_community_profiles_html_profileView.do-3Fuserid-3D06T9GF=DwMFaQ=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU=NhoaaeH3JplrJ1i1QspT5guZgy9z5td9aMxzwKGQHXk=YIpO2jniMJVXI1EqifZ-k4fMI36-_p1K5LqWeOadBT8=
> replied to the IBM Spectrum Scale V5.0 announcements<
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_developerworks_community_forums_html_topic-3Fid-3D2ad27846-2D6a54-2D46ba-2D96f4-2D5d6afa0df3ab=DwMFaQ=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU=NhoaaeH3JplrJ1i1QspT5guZgy9z5td9aMxzwKGQHXk=05bRl_SHFZieId6ukqofk_XzwZ2TSg3u-cqcGNRtobg=
> topic thread in the General Parallel File System - Announce (GPFS -
Announce)<
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ibm.com_developerworks_community_forums_html_forum-3Fid-3D-2D-2D0



Re: [gpfsug-discuss] Online data migration tool

2017-12-18 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Thanks Sven! That makes sense to me and is what I thought was the case which is 
why I was confused when I saw the reply to the thread that said the >32 
subblocks code had no performance impact.

A couple more question for you— in your presentation there’s a benchmark that 
shows the file create performance without the zero padding. Since you mention 
this is done for security reasons was that feature ever promoted to a GA Scale 
release? I’m also wondering if you could explain the performance difference 
between the no zero padding code and the > 32 subblock code since given your 
the example of 32K files and 16MB block size I figure both cases ought to write 
the same amount to disk.

Thanks!

-Aaron





On December 15, 2017 at 18:07:23 EST, Sven Oehme  wrote:
i thought i answered that already, but maybe i just thought about answering it 
and then forgot about it :-D

so yes more than 32 subblocks per block significant increase the performance of 
filesystems with small files, for the sake of the argument let's say 32k in a 
large block filesystem again for sake of argument say 16MB.

you probably ask why ?

if you create a file and write 32k into it in a pre 5.0.0 Version 16 MB 
filesystem your client actually doesn't write 32k to the NSD Server, it writes 
512k, because thats the subblock size and we need to write the full subblock 
(for security reasons). so first you waste significant memory on the client to 
cache that zero padding, you waste network bandwidth and you waste NSD Server 
cache because you store it there too. this means you overrun the cache more 
quickly, means you start doing read/modify writes earlier on all your nice 
large raid tracks... i guess you get the story by now.

in fact,  if you have a good raid code that can drive really a lot of bandwidth 
out of individual drives like a GNR system you get more performance for small 
file writes as larger your blocksize is, because we can 'pack' more files into 
larger i/os and therefore turn a small file create workload into a bandwidth 
workload, essentially exactly what we did and i demonstrated in the CORAL 
presentation .

hope that makes this crystal clear now .

sven



On Fri, Dec 15, 2017 at 10:47 PM Aaron Knister 
> wrote:
Thanks, Alex. I'm all too familiar with the trade offs between large
blocks and small files and we do use pretty robust SSD storage for our
metadata. We support a wide range of workloads and we have some folks
with many small (<1M) files and other folks with many large (>256MB) files.

My point in this thread is that IBM has said over and over again in
presentations that there is a significant performance gain with the >32
subblocks code on filesystems with large block sizes (although to your
point I'm not clear on exactly what large means since I didn't define
large in this context). Therefore given that the >32 subblock code gives
a significant performance gain one could reasonably assume that having a
filesystem with >32 subblocks is required to see this gain (rather than
just running the >32 subblocks code on an fs w/o > 32 subblocks).

This lead me to ask about a migration tool because in my mind if there's
a performance gain from having >32 subblocks on the FS I'd like that
feature and having to manually copy 10's of PB to new hardware to get
this performance boost is unacceptable. However, IBM can't seem to make
up their mind about whether or not the >32 subblocks code *actually*
provides a performance increase. This seems like a pretty
straightforward question.

-Aaron

On 12/15/17 3:48 PM, Alex Chekholko wrote:
> Hey Aaron,
>
> Can you define your sizes for "large blocks" and "small files"?  If you
> dial one up and the other down, your performance will be worse.  And in
> any case it's a pathological corner case so it shouldn't matter much for
> your workflow, unless you've designed your system with the wrong values.
>
> For example, for bioinformatics workloads, I prefer to use 256KB
> filesystem block size, and I'd consider 4MB+ to be "large block size",
> which would make the filesystem obviously unsuitable for processing
> millions of 8KB files.
>
> You can make a histogram of file sizes in your existing filesystems and
> then make your subblock size (1/32 of block size) on the smaller end of
> that.   Also definitely use the "small file in inode" feature and put
> your metadata on SSD.
>
> Regards,
> Alex
>
> On Fri, Dec 15, 2017 at 11:49 AM, Aaron Knister
>  
> >> wrote:
>
> Thanks, Bill.
>
> I still don't feel like I've got an clear answer from IBM and frankly
> the core issue of a lack of migration tool was totally dodged.
>
> Again in Sven's presentation from SSUG @ SC17
> (http://files.gpfsug.org/presentations/2017/SC17/SC17-UG-CORAL_V3.pdf 
> 

Re: [gpfsug-discuss] Infiniband connection rejected, ibv_create_qp err 13

2017-12-05 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]


Looks like 13 is EPERM which means apparently permissions didn’t exist to 
create the QP of the desired type which is odd since mmfsd runs as root. Is 
there any remote chance SELinux is enabled (e.g. sestatus)? Although I’d think 
mmfsd would run unconfined in the default policy, but maybe it didn’t 
transition correctly.

On December 5, 2017 at 08:16:49 EST, Andreas Mattsson 
 wrote:

Hi.



Have anyone here experienced having VERBS RDMA connection request rejects on 
Scale NSD servers with the error message “ibv_create_qp err 13”?

I’m having issues with this on a IBM ESS system.



The error mostly affects only one of the two GSSIO-nodes, and moves with the 
node even if I put all four of the infiniband links on the same infiniband 
switch as the working node is connected to.

The issue affects client nodes in different blade-chassis, going through 
different Infiniband swithes and cables, and also non-blade nodes running a 
slightly different os-setup and different infiniband HCAs.

MPI-jobs on the client nodes can communicate over the infiniband fabric without 
issues.

Upgrading all switches and HCAs to the latest firmware and making sure that 
client nodes have the same OFED-version as the ESS has had no impact on the 
issue.

When the issue is there, I can still do ibping between the nodes, ibroute gives 
me a working and correct path between the nodes that get connection rejects, 
and if I set up IPoIB, ip traffic works on the afflicted interfaces.



I have opened a PMR with IBM on the issue, so asking here is a parallel track 
for trying to find a solution to this.



Any help or suggestions is appreciated.

Regards,

Andreas Mattsson

_

[mid:d8d07f7e01ec4fcca5ae124f40c2d...@maxiv.lu.se/part1.08040705.03090...@maxiv.lu.se]

Andreas Mattsson
Systems Engineer



MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 225 94 Lund
Mobile: +46 706 64 95 44
andreas.matts...@maxiv.se
www.maxiv.se


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmauth/mmremotecluster wonkyness?

2017-11-30 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
It’s my understanding and experience that all member nodes of two clusters that 
are multi-clustered must be able to (and will eventually given enough 
time/activity) make connections to any and all nodes in both clusters. Even if 
you don’t designate the 2 protocol nodes as contact nodes I would expect to see 
connections from remote clusters to the protocol nodes just because of the 
nature of the beast. If you don’t want remote nodes to make connections to the 
protocol nodes then I believe you would need to put the protocol nodes in their 
own cluster. CES/CNFS hasn’t always supported this but I think it is now 
supported, at least with NFS.





On November 30, 2017 at 11:28:03 EST, valdis.kletni...@vt.edu wrote:
We have a 10-node cluster running gpfs 4.2.2.3, where 8 nodes are GPFS contact
nodes for 2 filesystems, and 2 are protocol nodes doingNFS exports of the
filesystems.

But we see some nodes in remote clusters trying to GPFS connect to
the 2 protocol nodes anyhow.

My reading of the manpages is that the remote cluster is responsible
for setting '-n contactNodes' when they do the 'mmremotecluster add',
and there's no way to sanity check or enforce that at the local end, and
fail/flag connections to unintended non-contact nodes if the remote
admin forgets/botches the -n.

Is that actually correct? If so, is it time for an RFE?
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] tar sparse file data loss

2017-11-22 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Somehow this nugget of joy (that’s most definitely sarcasm, this really sucks) 
slipped past my radar:

http://www-01.ibm.com/support/docview.wss?uid=isg1IV96475



Anyone know if there’s a fix in the 4.1 stream?

In my opinion this is 100% a tar bug as the APAR suggests but GPFS has 
implemented a workaround. See this post from the tar mailing list:

https://www.mail-archive.com/bug-tar@gnu.org/msg04209.html

It looks like the troublesome code may still exist upstream:

http://git.savannah.gnu.org/cgit/tar.git/tree/src/sparse.c#n273

No better way to ensure you’ll hit a problem than to assume you won’t :)

-Aaron
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Latest recommended 4.2 efix?

2017-09-28 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Hi Everyone,

What’s the latest recommended efix release for 4.2.3.4?

I’m working on testing a 4.1 to 4.2 migration and was reminded today of some 
fun bugs in 4.2.3.4 for which I think there are efixes. Alternatively, any word 
on a 4.2.3.5 release date?

-Aaron


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] sas avago/lsi hba reseller recommendation

2017-08-28 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Hi Eric,

I shot you an email directly with contact info.

-Aaron




On August 28, 2017 at 08:26:56 EDT, J. Eric Wonderley  
wrote:
We have several avago/lsi 9305-16e that I believe came from Advanced HPC.

Can someone recommend a another reseller of these hbas or a contact with 
Advance HPC?
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] NSD Server/FS Manager Memory Requirements

2017-08-17 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Hi Everyone,

In the world of GPFS 4.2 is there a particular advantage to having a large 
amount of memory (e.g. > 64G) allocated to the pagepool on combination NSD 
Server/FS manager nodes? We currently have half of physical memory allocated to 
pagepool on these nodes.

For some historical context-- we had two indicidents that drove us to increase 
our NSD server/FS manager pagepools. One was a weird behavior in GPFS 3.5 that 
was causing bouncing FS managers until we bumped the page pool from a few gigs 
to about half of the physical memory on the node. The other was a mass round of 
parallel mmfsck's of all 20 something of our filesystems. It came highly 
recommended to us to increase the pagepool to something very large for that.

I'm curious to hear what other folks do and what the recommendations from IBM 
folks are.

Thanks,
Aaron



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Associating I/O operations with files/processes

2017-05-30 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Hi Andreas,

I often start with an lsof to see who has files open on the troubled filesystem 
and then start stracing the various processes to see which is responsible. It 
ought to be a process blocked in uninterruptible sleep and ideally would be 
obvious but on a shared machine it might not be.

Something else you could do is a reverse lookup of the disk addresseses in 
iohist using mmfileid. This won't help if these are transient files but it 
could point you in the right direction. Careful though it'll give your metadata 
disks a tickle :) the syntax would be "mmfileid $FsName -d :$DiskAddrrss" where 
$DiskAddress is the 4th field from the iohist". It's not a quick command to 
return-- it could easily take up to a half hour but it should tell you which 
file path contains that disk address.

Sometimes this is all too tedious and in that case grabbing some trace data can 
help. When you're experiencing I/O trouble you can run "mmtrace trace=def 
start" on the node, wait about a minute or so and then run "mmtrace stop". The 
resulting trcrpt file is bit of a monster to go through but I do believe you 
can see which PIDs are responsible for the I/O given some sleuthing. If it 
comes to that let me know and I'll see if I can point you at some phrases to 
grep for. It's been a while since I've done it.

-Aaron




On May 30, 2017 at 09:13:09 EDT, Andreas Petzold (SCC) 
 wrote:
Hi John,

iotop wasn't helpful. It seems to be overwhelmed by what is going on on
the machine.

Cheers,

Andreas

On 05/30/2017 02:28 PM, John Hearns wrote:
> Andreas,
> This is a stupid reply, but please bear with me.
> Not exactly GPFS related, but I once managed an SGI CXFS (Clustered XFS 
> filesystem) setup.
> We also had a new application which did post-processing One of the users 
> reported that a post-processing job would take about 30 minutes.
> However when two or more of the same application were running the job would 
> take several hours.
>
> We finally found that this slowdown was due to the IO size, the application 
> was using the default size.
> We only found this by stracing the application and spending hours staring at 
> the trace...
>
> I am sure there are better tools for this, and I do hope you don’t have to 
> strace every application really.
> A good tool to get a general feel for IO pattersn is 'iotop'. It might help?
>
>
>
>
> -Original Message-
> From: gpfsug-discuss-boun...@spectrumscale.org 
> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Andreas 
> Petzold (SCC)
> Sent: Tuesday, May 30, 2017 2:17 PM
> To: gpfsug-discuss@spectrumscale.org
> Subject: [gpfsug-discuss] Associating I/O operations with files/processes
>
> Dear group,
>
> first a quick introduction: at KIT we are running a 20+PB storage system with 
> several large (1-9PB) file systems. We have a 14 node NSD server cluster and 
> 5 small (~10 nodes) protocol node clusters which each mount one of the file 
> systems. The protocol nodes run server software (dCache, xrootd) specific to 
> our users which primarily are the LHC experiments at CERN. GPFS version is 
> 4.2.2 everywhere. All servers are connected via IB, while the protocol nodes 
> communicate via Ethernet to their clients.
>
> Now let me describe the problem we are facing. Since a few days, one of the 
> protocol nodes shows a very strange and as of yet unexplained I/O behaviour. 
> Before we were usually seeing reads like this (iohist example from a well 
> behaved node):
>
> 14:03:37.637526 R data 32:138835918848 8192 46.626 cli 0A417D79:58E3B179 
> 172.18.224.19
> 14:03:37.660177 R data 18:12590325760 8192 25.498 cli 0A4179AD:58E3AE66 
> 172.18.224.14
> 14:03:37.640660 R data 15:106365067264 8192 45.682 cli 0A4179AD:58E3ADD7 
> 172.18.224.14
> 14:03:37.657006 R data 35:130482421760 8192 30.872 cli 0A417DAD:58E3B266 
> 172.18.224.21
> 14:03:37.643908 R data 33:107847139328 8192 45.571 cli 0A417DAD:58E3B206 
> 172.18.224.21
>
> Since a few days we see this on the problematic node:
>
> 14:06:27.253537 R data 46:126258287872 8 15.474 cli 0A4179AB:58E3AE54 
> 172.18.224.13
> 14:06:27.268626 R data 40:137280768624 8 0.395 cli 0A4179AD:58E3ADE3 
> 172.18.224.14
> 14:06:27.269056 R data 46:56452781528 8 0.427 cli 0A4179AB:58E3AE54 
> 172.18.224.13
> 14:06:27.269417 R data 47:97273159640 8 0.293 cli 0A4179AD:58E3AE5A 
> 172.18.224.14
> 14:06:27.269293 R data 49:59102786168 8 0.425 cli 0A4179AD:58E3AE72 
> 172.18.224.14
> 14:06:27.269531 R data 46:142387326944 8 0.340 cli 0A4179AB:58E3AE54 
> 172.18.224.13
> 14:06:27.269377 R data 28:102988517096 8 0.554 cli 0A417879:58E3AD08 
> 172.18.224.10
>
> The number of read ops has gone up by O(1000) which is what one would expect 
> when going from 8192 sector reads to 8 sector reads.
>
> We have already excluded problems of node itself so we are focusing on the 
> applications running on the node. What we'd like to to is to associate the 
> I/O requests either with files or specific processes 

Re: [gpfsug-discuss] VERBS RDMA issue

2017-05-21 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Hi Tushar,

For me the issue was an underlying performance bottleneck (some CPU frequency 
scaling problems causing cores to throttle back when it wasn't appropriate).

I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past 
to turn this off under certain conditions although I don't remember what those 
where. Hopefully others can chime in and qualify that.

Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the 
mmfs.log).

-Aaron




On May 21, 2017 at 04:41:00 EDT, Tushar Pathare  wrote:

Hello Team,



We are facing a lot of messages waiters  related to waiting for conn rdmas < 
conn 
maxrdmas



Is there some recommended settings to resolve this issue.?

Our config for RDMA is as follows for 140 nodes(32 cores each)





VERBS RDMA Configuration:

  Status  : started

  Start time  : Thu

  Stats reset time: Thu

  Dump time   : Sun

  mmfs verbsRdma  : enable

  mmfs verbsRdmaCm: disable

  mmfs verbsPorts : mlx4_0/1 mlx4_0/2

  mmfs verbsRdmasPerNode  : 3200

  mmfs verbsRdmasPerNode (max): 3200

  mmfs verbsRdmasPerNodeOptimize  : yes

  mmfs verbsRdmasPerConnection: 16

  mmfs verbsRdmasPerConnection (max)  : 16

  mmfs verbsRdmaMinBytes  : 16384

  mmfs verbsRdmaRoCEToS   : -1

  mmfs verbsRdmaQpRtrMinRnrTimer  : 18

  mmfs verbsRdmaQpRtrPathMtu  : 2048

  mmfs verbsRdmaQpRtrSl   : 0

  mmfs verbsRdmaQpRtrSlDynamic: no

  mmfs verbsRdmaQpRtrSlDynamicTimeout : 10

  mmfs verbsRdmaQpRtsRnrRetry : 6

  mmfs verbsRdmaQpRtsRetryCnt : 6

  mmfs verbsRdmaQpRtsTimeout  : 18

  mmfs verbsRdmaMaxSendBytes  : 16777216

  mmfs verbsRdmaMaxSendSge: 27

  mmfs verbsRdmaSend  : yes

  mmfs verbsRdmaSerializeRecv : no

  mmfs verbsRdmaSerializeSend : no

  mmfs verbsRdmaUseMultiCqThreads : yes

  mmfs verbsSendBufferMemoryMB: 1024

  mmfs verbsLibName   : libibverbs.so

  mmfs verbsRdmaCmLibName : librdmacm.so

  mmfs verbsRdmaMaxReconnectInterval  : 60

  mmfs verbsRdmaMaxReconnectRetries   : -1

  mmfs verbsRdmaReconnectAction   : disable

  mmfs verbsRdmaReconnectThreads  : 32

  mmfs verbsHungRdmaTimeout   : 90

  ibv_fork_support: true

  Max connections : 196608

  Max RDMA size   : 16777216

  Target number of vsend buffs: 16384

  Initial vsend buffs per conn: 59

  nQPs: 140

  nCQs: 282

  nCMIDs  : 0

  nDtoThreads : 2

  nextIndex   : 141

  Number of Devices opened: 1

Device: mlx4_0

  vendor_id   : 713

  Device vendor_part_id   : 4099

  Device mem register chunk   : 8589934592 (0x2)

  Device max_sge  : 32

  Adjusted max_sge: 0

  Adjusted max_sge vsend  : 30

  Device max_qp_wr: 16351

  Device max_qp_rd_atom   : 16

  Open Connect Ports  : 1

verbsConnectPorts[0]  : mlx4_0/1/0

  lid : 129

  state   : IBV_PORT_ACTIVE

  path_mtu: 2048

  interface ID: 0xe41d2d030073b9d1

  sendChannel.ib_channel  : 0x7FA6CB816200

  sendChannel.dtoThreadP  : 0x7FA6CB821870

  sendChannel.dtoThreadId : 12540

  sendChannel.nFreeCq : 1

  recvChannel.ib_channel  : 0x7FA6CB81D590

  recvChannel.dtoThreadP  : 0x7FA6CB822BA0

  recvChannel.dtoThreadId : 12541

  recvChannel.nFreeCq : 1

  ibv_cq  : 0x7FA2724C81F8

  ibv_cq.cqP  : 0x0

  ibv_cq.nEvents  : 0

  ibv_cq.contextP : 0x0

  ibv_cq.ib_channel   : 0x0



Thanks





Tushar B Pathare MBA IT,BE IT

Bigdata & GPFS

Software Development & Databases

Scientific Computing

Bioinformatics Division

Research



"What ever the mind of man can conceive and believe, drill can query"



Sidra Medical and Research Centre

Sidra OPC Building

Sidra Medical & Research Center

PO Box 26999

Al Luqta Street

Education City North Campus

​Qatar Foundation, Doha, Qatar

Office 4003  ext 37443 | M +974 74793547


Re: [gpfsug-discuss] question on viewing block distribution across NSDs

2017-03-29 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
I don't necessarily think you need to run a snap prior, just the output of mmdf 
should be enough. Something to keep in mind that I should have said before-- an 
mmdf can be stressful on your system particularly if you have spinning disk for 
your metadata. We're fortunate enough to have all flash for our metadata and I 
tend to take it for granted some times :)



From: greg.lehm...@csiro.au
Sent: 3/29/17, 19:52
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] question on viewing block distribution across NSDs
Thanks. I don’t have a snap. I’ll keep that in mind for next time I do this.

From: mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Knister, 
Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Sent: Thursday, 30 March 2017 9:45 AM
To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Subject: Re: [gpfsug-discuss] question on viewing block distribution across NSDs

Hi Greg,

You could run an mmdf which will show you how full each NSD is. I'm not sure 
how to look back in time though to see the fs before the restripe. Do you 
perhaps have a gpfs.snap you took somewhat recently before the restripe? Maybe 
an internaldump in /tmp/mmfs somewhere?

From: greg.lehm...@csiro.au<mailto:greg.lehm...@csiro.au>
Sent: 3/29/17, 19:21
To: gpfsug main discussion list
Subject: [gpfsug-discuss] question on viewing block distribution across NSDs
Hi All,
   I added some NSDs to an existing filesystem and ran 
mmrestripefs. I was sort of curious to see what the distribution looked like 
before and after the restripe. Is there any way of looking at it?

Cheers,

Greg.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] nodes being ejected out of the cluster

2017-01-11 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
The RDMA errors I think are secondary to what's going on with either your IPoIB 
or Ethernet fabrics that's causing I assume IPoIB communication breakdowns and 
expulsions. We've had entire IB fabrics go offline and if the nodes werent 
depending on it for daemon communication nobody got expelled. Do you have a 
subnet defined for your IPoIB network or are your nodes daemon interfaces 
already set to their IPoIB interface? Have you checked your SM logs?



From: Damir Krstic
Sent: 1/11/17, 9:39 AM
To: gpfsug main discussion list
Subject: [gpfsug-discuss] nodes being ejected out of the cluster
We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our storage 
(ESS GL6) is also running GPFS 4.2. Compute nodes and storage are connected via 
Infiniband (FDR14). At the time of implementation of ESS, we were instructed to 
enable RDMA in addition to IPoIB. Previously we only ran IPoIB on our GPFS3.5 
cluster.

Every since the implementation (sometime back in July of 2016) we see a lot of 
compute nodes being ejected. What usually precedes the ejection are following 
messages:

Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error 
IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 
vendor_err 135
Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 
(gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error 
IBV_WC_RNR_RETRY_EXC_ERR index 2
Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error 
IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 
vendor_err 135
Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 
(gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR 
index 1
Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error 
IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 
vendor_err 135
Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 
(gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error 
IBV_WC_RNR_RETRY_EXC_ERR index 2
Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error 
IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 
vendor_err 135
Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to 172.41.2.5 
(gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error IBV_WC_WR_FLUSH_ERR 
index 400

Even our ESS IO server sometimes ends up being ejected (case in point - 
yesterday morning):

Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error 
IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 
vendor_err 135
Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 
(gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error 
IBV_WC_RNR_RETRY_EXC_ERR index 3001
Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error 
IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 
vendor_err 135
Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 
(gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error 
IBV_WC_RNR_RETRY_EXC_ERR index 2671
Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error 
IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 
vendor_err 135
Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 
(gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error 
IBV_WC_RNR_RETRY_EXC_ERR index 2495
Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error 
IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 
vendor_err 135
Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to 172.41.2.1 
(gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error 
IBV_WC_RNR_RETRY_EXC_ERR index 3077
Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease renewal is 
overdue. Pinging to check if it is alive

I've had multiple PMRs open for this issue, and I am told that our ESS needs 
code level upgrades in order to fix this issue. Looking at the errors, I think 
the issue is Infiniband related, and I am wondering if anyone on this list has 
seen similar issues?

Thanks for your help in advance.

Damir
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] LROC

2016-12-28 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Ouch...to quote Adam Savage "well there's yer problem". Are you perhaps running 
a version of GPFS 4.1 older than 4.1.1.9? Looks like there was an LROC related 
assert fixed in 4.1.1.9 but I can't find details on it.



From: Matt Weil
Sent: 12/28/16, 5:21 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] LROC
yes

> Wed Dec 28 16:17:07.507 2016: [X] *** Assert exp(ssd->state !=
> ssdActive) in line 427 of file
> /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C
> Wed Dec 28 16:17:07.508 2016: [E] *** Traceback:
> Wed Dec 28 16:17:07.509 2016: [E] 2:0x7FF1604F39B5
> logAssertFailed + 0x2D5 at ??:0
> Wed Dec 28 16:17:07.510 2016: [E] 3:0x7FF160CA8947
> fs_config_ssds(fs_config*) + 0x867 at ??:0
> Wed Dec 28 16:17:07.511 2016: [E] 4:0x7FF16009A749
> SFSConfigLROC() + 0x189 at ??:0
> Wed Dec 28 16:17:07.512 2016: [E] 5:0x7FF160E565CB
> NsdDiskConfig::readLrocConfig(unsigned int) + 0x2BB at ??:0
> Wed Dec 28 16:17:07.513 2016: [E] 6:0x7FF160E5EF41
> NsdDiskConfig::reReadConfig() + 0x771 at ??:0
> Wed Dec 28 16:17:07.514 2016: [E] 7:0x7FF160024E0E
> runTSControl(int, int, char**) + 0x80E at ??:0
> Wed Dec 28 16:17:07.515 2016: [E] 8:0x7FF1604FA6A5
> RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int,
> StripeGroup*, unsigned int*, RpcContext*) + 0x21F5 at ??:0
> Wed Dec 28 16:17:07.516 2016: [E] 9:0x7FF1604FBA36
> HandleCmdMsg(void*) + 0x1216 at ??:0
> Wed Dec 28 16:17:07.517 2016: [E] 10:0x7FF160039172
> Thread::callBody(Thread*) + 0x1E2 at ??:0
> Wed Dec 28 16:17:07.518 2016: [E] 11:0x7FF160027302
> Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0
> Wed Dec 28 16:17:07.519 2016: [E] 12:0x7FF15F73FDC5
> start_thread + 0xC5 at ??:0
> Wed Dec 28 16:17:07.520 2016: [E] 13:0x7FF15E84873D __clone +
> 0x6D at ??:0
> mmfsd:
> /project/sprelbmd1/build/rbmd11027d/src/avs/fs/mmfs/ts/flea/fs_agent_gpfs.C:427:
> void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32,
> UInt32, const char*, const char*): Assertion `ssd->state != ssdActive'
> failed.
> Wed Dec 28 16:17:07.521 2016: [E] Signal 6 at location 0x7FF15E7861D7
> in process 125345, link reg 0x.
> Wed Dec 28 16:17:07.522 2016: [I] rax0x  rbx
> 0x7FF15FD71000
> Wed Dec 28 16:17:07.523 2016: [I] rcx0x  rdx
> 0x0006
> Wed Dec 28 16:17:07.524 2016: [I] rsp0x7FEF34FBBF78  rbp
> 0x7FF15E8D03A8
> Wed Dec 28 16:17:07.525 2016: [I] rsi0x0001F713  rdi
> 0x0001E9A1
> Wed Dec 28 16:17:07.526 2016: [I] r8 0x0001  r9
> 0xFF092D63646B6860
> Wed Dec 28 16:17:07.527 2016: [I] r100x0008  r11
> 0x0202
> Wed Dec 28 16:17:07.528 2016: [I] r120x7FF1610C6847  r13
> 0x7FF161032EC0
> Wed Dec 28 16:17:07.529 2016: [I] r140x  r15
> 0x
> Wed Dec 28 16:17:07.530 2016: [I] rip0x7FF15E7861D7  eflags
> 0x0202
> Wed Dec 28 16:17:07.531 2016: [I] csgsfs 0x0033  err
> 0x
> Wed Dec 28 16:17:07.532 2016: [I] trapno 0x  oldmsk
> 0x10017807
> Wed Dec 28 16:17:07.533 2016: [I] cr20x
> Wed Dec 28 16:17:09.022 2016: [D] Traceback:
> Wed Dec 28 16:17:09.023 2016: [D] 0:7FF15E7861D7 raise + 37 at ??:0
> Wed Dec 28 16:17:09.024 2016: [D] 1:7FF15E7878C8 __GI_abort + 148
> at ??:0
> Wed Dec 28 16:17:09.025 2016: [D] 2:7FF15E77F146
> __assert_fail_base + 126 at ??:0
> Wed Dec 28 16:17:09.026 2016: [D] 3:7FF15E77F1F2
> __GI___assert_fail + 42 at ??:0
> Wed Dec 28 16:17:09.027 2016: [D] 4:7FF1604F39D9 logAssertFailed +
> 2F9 at ??:0
> Wed Dec 28 16:17:09.028 2016: [D] 5:7FF160CA8947
> fs_config_ssds(fs_config*) + 867 at ??:0
> Wed Dec 28 16:17:09.029 2016: [D] 6:7FF16009A749 SFSConfigLROC() +
> 189 at ??:0
> Wed Dec 28 16:17:09.030 2016: [D] 7:7FF160E565CB
> NsdDiskConfig::readLrocConfig(unsigned int) + 2BB at ??:0
> Wed Dec 28 16:17:09.031 2016: [D] 8:7FF160E5EF41
> NsdDiskConfig::reReadConfig() + 771 at ??:0
> Wed Dec 28 16:17:09.032 2016: [D] 9:7FF160024E0E runTSControl(int,
> int, char**) + 80E at ??:0
> Wed Dec 28 16:17:09.033 2016: [D] 10:7FF1604FA6A5
> RunClientCmd(MessageHeader*, IpAddr, unsigned short, int, int,
> StripeGroup*, unsigned int*, RpcContext*) + 21F5 at ??:0
> Wed Dec 28 16:17:09.034 2016: [D] 11:7FF1604FBA36
> HandleCmdMsg(void*) + 1216 at ??:0
> Wed Dec 28 16:17:09.035 2016: [D] 12:7FF160039172
> Thread::callBody(Thread*) + 1E2 at ??:0
> Wed Dec 28 16:17:09.036 2016: [D] 13:00007FF160027302
> Thread::callBodyWrapper(Thread*) + A2

Re: [gpfsug-discuss] Is anyone performing any kind of Charge back / Show back on Scale today and how do you collect the data

2016-11-18 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
I believe ARCAStream has a product that could facilitate this also. I also 
believe their engineers are on the list.



From: Andrew Beattie
Sent: 11/17/16, 3:56 PM
To: gpfsug main discussion list
Subject: [gpfsug-discuss] Is anyone performing any kind of Charge back / Show 
back on Scale today and how do you collect the data
Good Morning,


I have a large managed services provider in Australia who are looking at the 
benefits of deploying Scale for a combination of Object storage and basic SMB 
file access.  This data is typically historical data rather than highly 
accessed production data and the proposed services is designed to be a low cost 
option - think a private version of Amazon's Glacier type offering. The 
proposed solution will have Platinum - Flash,  Gold - SAS,  Silver - NL-SAS and 
Bronze - Tape tiers with different cost's per Tier

One of the questions that they have asked is, how can they on a regular basis 
(6-10min increments), poll the storage array to determine what capacity is 
stored in what tier of disk, by company / user,  and export the results 
(ideally via an API) into their Reporting and Billing system.  They do 
something similar today for their VMWare farm (6 minute increments), to provide 
accountability for the number of virtual machines they are providing, and would 
like to extend this capability to their file storage offering, which today is 
based on basic virtual windows file servers

Is anyone doing something similar today? and if so at what granularity?

Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeat...@au1.ibm.com

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] SGExceptionLogBufferFullThread waiter

2016-10-15 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Understood. Thank you for your help.

By the way, I was able to figure out by poking mmpmon gfis that the job is 
performing 20k a second each of inode creations, updates and deletions across 
64 nodes. There's my 60k iops on the backend. While I'm impressed and not 
surprised GPFS can keep up with this...that's a pretty hefty workload.



From: Olaf Weiser
Sent: 10/15/16, 12:47 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] SGExceptionLogBufferFullThread waiter
well - hard to say.. 60K IO may or may not be a problem... it depends on your 
storage backends..

check the response times to the physical disk on the NSD server... concerning 
the output you provided ... check particularly
10.1.53.5  and 10.1.53.7

 if they are in the same (bad/ poor) range .. then your storage back end is 
in trouble or maybe just too heavily utilized ...
 if the response times to physical disks on the NSD server are ok... .. than 
maybe the network from client <-->  NSD server is somehow in trouble ..




From:Aaron Knister 
To:
Date:10/15/2016 08:28 AM
Subject:Re: [gpfsug-discuss] SGExceptionLogBufferFullThread waiter
Sent by:gpfsug-discuss-boun...@spectrumscale.org




It absolutely does, thanks Olaf!

The tasks running on these nodes are running on 63 other nodes and
generating ~60K iop/s of metadata writes and I *think* about the same in
reads. Do you think that could be contributing to the higher waiter
times? I'm not sure quite what the job is up to. It's seemingly doing
very little data movement, the cpu %used is very low but the load is
rather high.

-Aaron

On 10/15/16 11:23 AM, Olaf Weiser wrote:
> from your file system configuration .. mmfs  -L you'll find the
> size of the LOG
> since release 4.x ..you can change it, but you need to re-mount the FS
> on every client , to make the change effective ...
>
> when a clients initiate writes/changes to GPFS  it needs to update its
> changes to the log -  if this narrows a certain filling degree, GPFS
> triggers so called logWrapThreads to write content to disk and so free
> space
>
> with your given numbers ... double digit [ms] waiter times .. you fs
> get's probably slowed down.. and there's something suspect with the
> storage, because LOG-IOs are rather small and should not take that long
>
> to give you an example from a healthy environment... the IO times are so
> small, that you usually don't see waiters for this..
>
> I/O start time RWBuf type disk:sectorNum nSec  time ms  tag1
>  tag2   Disk UID typ  NSD node context   thread
> --- -- --- - -  ---
> - - -- --- --- -
> --
> 06:23:32.358851  W logData2:52430642480.439
> 0 0  C0A70D08:57CF40D1 cli   192.167.20.17 LogData
> SGExceptionLogBufferFullThread
> 06:23:33.576367  W logData1:52425728080.646
> 0 0  C0A70D08:57CF40D0 cli   192.167.20.16 LogData
> SGExceptionLogBufferFullThread
> 06:23:32.358851  W logData2:52430642480.439
> 0 0  C0A70D08:57CF40D1 cli   192.167.20.17 LogData
> SGExceptionLogBufferFullThread
> 06:23:33.576367  W logData1:52425728080.646
> 0 0  C0A70D08:57CF40D0 cli   192.167.20.16 LogData
> SGExceptionLogBufferFullThread
> 06:23:32.212426  W   iallocSeg1:524490048   640.733
> 2   245  C0A70D08:57CF40D0 cli   192.167.20.16 Logwrap
> LogWrapHelperThread
> 06:23:32.212412  W logWrap2:52455219280.755
> 0179200  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
> 06:23:32.212432  W logWrap2:52516276080.737
> 0125473  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
> 06:23:32.212416  W   iallocSeg2:524488384   640.763
> 2   347  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
> 06:23:32.212414  W logWrap2:52526694482.160
> 0177664  C0A70D08:57CF40D1 cli   192.167.20.17 Logwrap
> LogWrapHelperThread
>
>
> hope this helps ..
>
>
> Mit freundlichen Grüßen / Kind regards
>
>
> Olaf Weiser
>
> EMEA Storage Competence Center Mainz, German / IBM Systems, Storage
> Platform,
> ---
> IBM Deutschland
> IBM Allee 1
> 71139 Ehningen
> Phone: +49-170-579-44-66
> E-Mail: olaf.wei...@de.ibm.com
> ---
> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
> Geschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter,
> Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus 

Re: [gpfsug-discuss] GPFS Routers

2016-09-20 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Looks like the attachment got scrubbed. Here's the link 
http://docplayer.net/docs-images/39/19199001/images/7-0.png[X]

From: aaron.s.knis...@nasa.gov
Sent: 9/20/16, 10:07 AM
To: gpfsug main discussion list, gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS Routers
Not sure if this image will go through but here's one I found:

[X]

The "Routers" are LNET routers. LNET is just the name of lustre's network 
stack. The LNET routers "route" the Lustre protocol between disparate network 
types (quadrics, Ethernet, myrinet, carrier pigeon). Packet loss on carrier 
pigeon is particularly brutal, though.



From: Marc A Kaplan
Sent: 9/20/16, 10:02 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS Routers
Thanks for spelling out the situation more clearly.  This is beyond my 
knowledge and expertise.
But perhaps some other participants on this forum will chime in!

I may be missing something, but asking "What is Lustre LNET?" via google does 
not yield good answers.
It would be helpful to have some graphics (pictures!) of typical, useful 
configurations.  Limiting myself to a few minutes of searching, I couldn't find 
any.

I "get" that Lustre users/admin with lots of nodes and several switching 
fabrics find it useful, but beyond that...

I guess the answer will be "Performance!" -- but the obvious question is: Why 
not "just" use IP - that is the Internetworking Protocol!
So rather than sweat over LNET, why not improve IP to work better over several 
IBs?

>From a user/customer point of view where "I needed this yesterday", short of 
>having an "LNET for GPFS", I suggest considering reconfiguring your nodes, 
>switches, storage
to get better performance.  If you need to buy some more hardware, so be it.

--marc



From:Aaron Knister <aaron.s.knis...@nasa.gov>
To:<gpfsug-discuss@spectrumscale.org>
Date:09/20/2016 09:23 AM
Subject:Re: [gpfsug-discuss] GPFS Routers
Sent by:gpfsug-discuss-boun...@spectrumscale.org




Hi Marc,

Currently we serve three disparate infiniband fabrics with three
separate sets of NSD servers all connected via FC to backend storage.

I was exploring the idea of flipping that on its head and having one set
of NSD servers but would like something akin to Lustre LNET routers to
connect each fabric to the back-end NSD servers over IB. I know there's
IB routers out there now but I'm quite drawn to the idea of a GPFS
equivalent of Lustre LNET routers, having used them in the past.

I suppose I could always smush some extra HCAs in the NSD servers and do
it that way but that got really ugly when I started factoring in
omnipath. Something like an LNET router would also be useful for GNR
users who would like to present to both an IB and an OmniPath fabric
over RDMA.

-Aaron

On 9/12/16 10:48 AM, Marc A Kaplan wrote:
> Perhaps if you clearly describe what equipment and connections you have
> in place and what you're trying to accomplish, someone on this board can
> propose a solution.
>
> In principle, it's always possible to insert proxies/routers to "fake"
> any two endpoints into "believing" they are communicating directly.
>
>
>
>
>
> From:Aaron Knister <aaron.s.knis...@nasa.gov>
> To:<gpfsug-discuss@spectrumscale.org>
> Date:09/11/2016 08:01 PM
> Subject:Re: [gpfsug-discuss] GPFS Routers
> Sent by:gpfsug-discuss-boun...@spectrumscale.org
> 
>
>
>
> After some googling around, I wonder if perhaps what I'm thinking of was
> an I/O forwarding layer that I understood was being developed for x86_64
> type machines rather than some type of GPFS protocol router or proxy.
>
> -Aaron
>
> On 9/11/16 5:02 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE
> CORP] wrote:
>> Hi Everyone,
>>
>> A while back I seem to recall hearing about a mechanism being developed
>> that would function similarly to Lustre's LNET routers and effectively
>> allow a single set of NSD servers to talk to multiple RDMA fabrics
>> without requiring the NSD servers to have infiniband interfaces on each
>> RDMA fabric. Rather, one would have a set of GPFS gateway nodes on each
>> fabric that would in effect proxy the RDMA requests to the NSD server.
>> Does anyone know what I'm talking about? Just curious if it's still on
>> the roadmap.
>>
>> -Aaron
>>
>>
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
> --
> Aaron Knister
> NASA Center for Clim

Re: [gpfsug-discuss] GPFS Routers

2016-09-20 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Not sure if this image will go through but here's one I found:

[X]

The "Routers" are LNET routers. LNET is just the name of lustre's network 
stack. The LNET routers "route" the Lustre protocol between disparate network 
types (quadrics, Ethernet, myrinet, carrier pigeon). Packet loss on carrier 
pigeon is particularly brutal, though.



From: Marc A Kaplan
Sent: 9/20/16, 10:02 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] GPFS Routers
Thanks for spelling out the situation more clearly.  This is beyond my 
knowledge and expertise.
But perhaps some other participants on this forum will chime in!

I may be missing something, but asking "What is Lustre LNET?" via google does 
not yield good answers.
It would be helpful to have some graphics (pictures!) of typical, useful 
configurations.  Limiting myself to a few minutes of searching, I couldn't find 
any.

I "get" that Lustre users/admin with lots of nodes and several switching 
fabrics find it useful, but beyond that...

I guess the answer will be "Performance!" -- but the obvious question is: Why 
not "just" use IP - that is the Internetworking Protocol!
So rather than sweat over LNET, why not improve IP to work better over several 
IBs?

>From a user/customer point of view where "I needed this yesterday", short of 
>having an "LNET for GPFS", I suggest considering reconfiguring your nodes, 
>switches, storage
to get better performance.  If you need to buy some more hardware, so be it.

--marc



From:Aaron Knister <aaron.s.knis...@nasa.gov>
To:<gpfsug-discuss@spectrumscale.org>
Date:09/20/2016 09:23 AM
Subject:Re: [gpfsug-discuss] GPFS Routers
Sent by:gpfsug-discuss-boun...@spectrumscale.org




Hi Marc,

Currently we serve three disparate infiniband fabrics with three
separate sets of NSD servers all connected via FC to backend storage.

I was exploring the idea of flipping that on its head and having one set
of NSD servers but would like something akin to Lustre LNET routers to
connect each fabric to the back-end NSD servers over IB. I know there's
IB routers out there now but I'm quite drawn to the idea of a GPFS
equivalent of Lustre LNET routers, having used them in the past.

I suppose I could always smush some extra HCAs in the NSD servers and do
it that way but that got really ugly when I started factoring in
omnipath. Something like an LNET router would also be useful for GNR
users who would like to present to both an IB and an OmniPath fabric
over RDMA.

-Aaron

On 9/12/16 10:48 AM, Marc A Kaplan wrote:
> Perhaps if you clearly describe what equipment and connections you have
> in place and what you're trying to accomplish, someone on this board can
> propose a solution.
>
> In principle, it's always possible to insert proxies/routers to "fake"
> any two endpoints into "believing" they are communicating directly.
>
>
>
>
>
> From:Aaron Knister <aaron.s.knis...@nasa.gov>
> To:<gpfsug-discuss@spectrumscale.org>
> Date:09/11/2016 08:01 PM
> Subject:Re: [gpfsug-discuss] GPFS Routers
> Sent by:gpfsug-discuss-boun...@spectrumscale.org
> 
>
>
>
> After some googling around, I wonder if perhaps what I'm thinking of was
> an I/O forwarding layer that I understood was being developed for x86_64
> type machines rather than some type of GPFS protocol router or proxy.
>
> -Aaron
>
> On 9/11/16 5:02 PM, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE
> CORP] wrote:
>> Hi Everyone,
>>
>> A while back I seem to recall hearing about a mechanism being developed
>> that would function similarly to Lustre's LNET routers and effectively
>> allow a single set of NSD servers to talk to multiple RDMA fabrics
>> without requiring the NSD servers to have infiniband interfaces on each
>> RDMA fabric. Rather, one would have a set of GPFS gateway nodes on each
>> fabric that would in effect proxy the RDMA requests to the NSD server.
>> Does anyone know what I'm talking about? Just curious if it's still on
>> the roadmap.
>>
>> -Aaron
>>
>>
>> ___
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>

Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale Data Protection"

2016-08-30 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Just want to add on to one of the points Sven touched on regarding metadata HW. 
We have a modest SSD infrastructure for our metadata disks and we can scan 500M 
inodes in parallel in about 5 hours if my memory serves me right (and I believe 
we could go faster if we really wanted to). I think having solid metadata disks 
(no pun intended) will really help with scan times.


From: Sven Oehme
Sent: 8/30/16, 7:25 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] *New* IBM Spectrum Protect Whitepaper "Petascale 
Data Protection"
so lets start with some simple questions.

when you say mmbackup takes ages, what version of gpfs code are you running ?
how do you execute the mmbackup command ? exact parameters would be useful .
what HW are you using for the metadata disks ?
how much capacity (df -h) and how many inodes (df -i) do you have in the 
filesystem you try to backup ?

sven


On Tue, Aug 30, 2016 at 3:02 PM, Lukas Hejtmanek 
> wrote:
Hello,

On Mon, Aug 29, 2016 at 09:20:46AM +0200, Frank Kraemer wrote:
> Find the paper here:
>
> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Storage%20Manager/page/Petascale%20Data%20Protection

thank you for the paper, I appreciate it.

However, I wonder whether it could be extended a little. As it has the title
Petascale Data Protection, I think that in Peta scale, you have to deal with
millions (well rather hundreds of millions) of files you store in and this is
something where TSM does not scale well.

Could you give some hints:

On the backup site:
mmbackup takes ages for:
a) scan (try to scan 500M files even in parallel)
b) backup - what if 10 % of files get changed - backup process can be blocked
several days as mmbackup cannot run in several instances on the same file
system, so you have to wait until one run of mmbackup finishes. How long could
it take at petascale?

On the restore site:
how can I restore e.g. 40 millions of file efficiently? dsmc restore '/path/*'
runs into serious troubles after say 20M files (maybe wrong internal
structures used), however, scanning 1000 more files takes several minutes
resulting the dsmc restore never reaches that 40M files.

using filelists the situation is even worse. I run dsmc restore -filelist
with a filelist consisting of 2.4M files. Running for *two* days without
restoring even a single file. dsmc is consuming 100 % CPU.

So any hints addressing these issues with really large number of files would
be even more appreciated.

--
Lukáš Hejtmánek
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Monitor NSD server queue?

2016-08-16 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Hi Everyone,

We ran into a rather interesting situation over the past week. We had a job 
that was pounding the ever loving crap out of one of our filesystems (called 
dnb02) doing about 15GB/s of reads. We had other jobs experience a slowdown on 
a different filesystem (called dnb41) that uses entirely separate backend 
storage. What I can't figure out is why this other filesystem was affected. 
I've checked IB bandwidth and congestion, Fibre channel bandwidth and errors, 
Ethernet bandwidth congestion, looked at the mmpmon nsd_ds counters (including 
disk request wait time), and checked out the disk iowait values from collectl. 
I simply can't account for the slowdown on the other filesystem. The only thing 
I can think of is the high latency on dnb02's NSDs caused the mmfsd NSD queues 
to back up.

Here's my question-- how can I monitor the state of th NSD queues? I can't find 
anything in mmdiag. An mmfsadm saferdump NSD shows me the queues and their 
status. I'm just not sure calling saferdump NSD every 10 seconds to monitor 
this data is going to end well. I've seen saferdump NSD cause mmfsd to die and 
that's from a task we only run every 6 hours that calls saferdump NSD.

Any thoughts/ideas here would be great.

Thanks!

-Aaron
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] GPFS API O_NOFOLLOW support

2016-07-21 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Hi Everyone,

I've noticed that many GPFS commands (mm*acl,mm*attr) and API calls (in 
particular the putacl and getacl functions) have no support for not following 
symlinks. Is there some hidden support for gpfs_putacl that will cause it to 
not deteference symbolic links? Something like the O_NOFOLLOW flag used 
elsewhere in linux?

Thanks!

-Aaron
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss