Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-06 Thread Luis Bolinches
Hi
 
On top of what has been mentioned here (RAID <-> BS aligment, and many other things) I would suggest to look at the last 4/5 slides of a 2018 (disclaimer: my own) London UG presentation http://files.gpfsug.org/presentations/2018/London/14_LuisBolinches_GPFSUG.pdf
 
It gives a start on different storage subsystems and full stripe write and those things that do not matter, ... until they do.
 
But I agree that on top of whatever that Protect process was doing there might be ways of improvement here. Enjoy the ride.
--Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations / SalutacionsLuis Bolinches
Consultant IT Specialist
IBM Spectrum Scale development
ESS & client adoption teams
Mobile Phone: +358503112585
 
https://www.youracclaim.com/user/luis-bolinches
 
Ab IBM Finland Oy
Laajalahdentie 23
00330 Helsinki
Uusimaa - Finland"If you always give you will always have" --  Anonymous
 
 
 
- Original message -From: "Valdis Klētnieks" Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: gpfsug main discussion list Cc:Subject: [EXTERNAL] Re: [gpfsug-discuss] Client Latency and High NSD Server Load AverageDate: Sat, Jun 6, 2020 08:38 
On Fri, 05 Jun 2020 14:24:27 -, "Saula, Oluwasijibomi" said:> But with the RAID 6 writing costs Vladis explained, it now makes sense why the write IO was badly affected...> Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A]> Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A]> Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A]> Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 2, 3, 4, A]And a read-modify-write on each one.. Ouch.Stuff like that is why making sure program output goes to /var or other local filesystem is usually a good thing.I seem to remember us getting bit by a similar misbehavior in TSM, but I don'tknow the details because I was busier with GPFS and LTFS/EE than TSM. Though Ihave to wonder how TSM could be a decades-old product and still havemisbehaviors in basic things like failed reads on input prompts... 
 
atttroa5.datType: application/pgp-signatureName: atttroa5.dat
___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss
 

Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland



attf6izb.dat
Description: Binary data
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-05 Thread Valdis Klētnieks
On Fri, 05 Jun 2020 14:24:27 -, "Saula, Oluwasijibomi" said:

> But with the RAID 6 writing costs Vladis explained, it now makes sense why 
> the write IO was badly affected...

> Action [1,2,3,4,A] : The only valid responses are characters from this set: 
> [1, 2, 3, 4, A]
> Action [1,2,3,4,A] : The only valid responses are characters from this set: 
> [1, 2, 3, 4, A]
> Action [1,2,3,4,A] : The only valid responses are characters from this set: 
> [1, 2, 3, 4, A]
> Action [1,2,3,4,A] : The only valid responses are characters from this set: 
> [1, 2, 3, 4, A]

And a read-modify-write on each one.. Ouch.

Stuff like that is why making sure program output goes to /var or other local 
file
system is usually a good thing.

I seem to remember us getting bit by a similar misbehavior in TSM, but I don't
know the details because I was busier with GPFS and LTFS/EE than TSM. Though I
have to wonder how TSM could be a decades-old product and still have
misbehaviors in basic things like failed reads on input prompts...



pgpUqksoZkR44.pgp
Description: PGP signature
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-05 Thread Saula, Oluwasijibomi
Vladis/Kums/Fred/Kevin/Stephen,

Thanks so much for your insights, thoughts, and pointers! - Certainly increased 
my knowledge and understanding of potential culprits to watch for...

So we finally discovered the root issue to this problem: An unattended TSM 
restore exercise profusely writing to a single file, over and over again into 
the GBs!!..I'm opening up a ticket with TSM support to learn how to mitigate 
this in the future.

But with the RAID 6 writing costs Vladis explained, it now makes sense why the 
write IO was badly affected...

Excerpt from output file:


--- User Action is Required ---

File '/gpfs1/X/Y/Z/fileABC' is write protected


Select an appropriate action

  1. Force an overwrite for this object

  2. Force an overwrite on all objects that are write protected

  3. Skip this object

  4. Skip all objects that are write protected

  A. Abort this operation

Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 
2, 3, 4, A]

Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 
2, 3, 4, A]
Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 
2, 3, 4, A]
Action [1,2,3,4,A] : The only valid responses are characters from this set: [1, 
2, 3, 4, A]
...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology



Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>



[cid:image001.gif@01D57DE0.91C300C0]




From: gpfsug-discuss-boun...@spectrumscale.org 
 on behalf of 
gpfsug-discuss-requ...@spectrumscale.org 

Sent: Friday, June 5, 2020 6:00 AM
To: gpfsug-discuss@spectrumscale.org 
Subject: gpfsug-discuss Digest, Vol 101, Issue 12

Send gpfsug-discuss mailing list submissions to
gpfsug-discuss@spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-requ...@spectrumscale.org

You can reach the person managing the list at
gpfsug-discuss-ow...@spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Client Latency and High NSD Server Load Average
  (Valdis Kl=?utf-8?Q?=c4=93?=tnieks)


--

Message: 1
Date: Thu, 04 Jun 2020 21:17:08 -0400
From: "Valdis Kl=?utf-8?Q?=c4=93?=tnieks" 
To: gpfsug main discussion list 
Subject: Re: [gpfsug-discuss] Client Latency and High NSD Server Load
Average
Message-ID: <309214.1591319828@turing-police>
Content-Type: text/plain; charset="us-ascii"

On Thu, 04 Jun 2020 15:33:18 -, "Saula, Oluwasijibomi" said:

> However, I still can't understand why write IO operations are 5x more latent
> than ready operations to the same class of disks.

Two things that may be biting you:

First, on a RAID 5 or 6 LUN, most of the time you only need to do 2 physical
reads (data and parity block). To do a write, you have to read the old parity
block, compute the new value, and write the data block and new parity block.
This is often called the "RAID write penalty".

Second, if a read size is smaller than the physical block size, the storage 
array can read
a block, and return only the fragment needed.  But on a write, it has to read
the whole block, splice in the new data, and write back the block - a RMW (read
modify write) cycle.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: 
<http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200604/da016913/attachment-0001.sig>

--

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 101, Issue 12
***
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-04 Thread Valdis Klētnieks
On Thu, 04 Jun 2020 15:33:18 -, "Saula, Oluwasijibomi" said:

> However, I still can't understand why write IO operations are 5x more latent
> than ready operations to the same class of disks.

Two things that may be biting you:

First, on a RAID 5 or 6 LUN, most of the time you only need to do 2 physical
reads (data and parity block). To do a write, you have to read the old parity
block, compute the new value, and write the data block and new parity block.
This is often called the "RAID write penalty".

Second, if a read size is smaller than the physical block size, the storage 
array can read
a block, and return only the fragment needed.  But on a write, it has to read
the whole block, splice in the new data, and write back the block - a RMW (read
modify write) cycle.


pgpKAKy2bcSNE.pgp
Description: PGP signature
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-04 Thread Saula, Oluwasijibomi
Stephen,

Looked into client requests, and it doesn't seem to lean heavily on any one NSD 
server. Of course, this is an eyeball assessment after reviewing IO request 
percentages to the different NSD servers from just a few nodes.

By the way, I later discovered our TSM/NSD server couldn't handle restoring a 
read-only file and ended-up writing my output file into GBs asking for my 
response...that seemed to have contributed to some unnecessary high write IO.

However, I still can't understand why write IO operations are 5x more latent 
than ready operations to the same class of disks.

Maybe it's time for a GPFS support ticket...


Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology



Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>



[cid:image001.gif@01D57DE0.91C300C0]




From: gpfsug-discuss-boun...@spectrumscale.org 
 on behalf of 
gpfsug-discuss-requ...@spectrumscale.org 

Sent: Wednesday, June 3, 2020 9:19 PM
To: gpfsug-discuss@spectrumscale.org 
Subject: gpfsug-discuss Digest, Vol 101, Issue 9

Send gpfsug-discuss mailing list submissions to
gpfsug-discuss@spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
gpfsug-discuss-requ...@spectrumscale.org

You can reach the person managing the list at
gpfsug-discuss-ow...@spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Client Latency and High NSD Server Load Average
  (Stephen Ulmer)


--

Message: 1
Date: Wed, 3 Jun 2020 22:19:49 -0400
From: Stephen Ulmer 
To: gpfsug main discussion list 
Subject: Re: [gpfsug-discuss] Client Latency and High NSD Server Load
Average
Message-ID: 
Content-Type: text/plain; charset="utf-8"

Note that if nsd02-ib is offline, that nsd03-ib is now servicing all of the 
NSDs for *both* servers, and that if nsd03-ib gets busy enough to appear 
offline, then nsd04-ib would be next in line to get the load of all 3. The two 
servers with the problems are in line after the one that is off.

This is based on the candy striping of the NSD server order (which I think most 
of us do).

NSD fail-over is ?straight-forward? so to speak - the last I checked, it is 
really fail-over in the listed order not load balancing among the servers 
(which is why you stripe them). I do *not* know if individual clients make the 
decision that the I/O for a disk should go through the ?next? NSD server, or if 
it is done cluster-wide (in the case of intermittently super-slow I/O). 
Hopefully someone with source code access will answer that, because now I?m 
curious...

Check what path the clients are using to the NSDs, i.e. which server. See if 
you are surprised. :)

 --
Stephen


> On Jun 3, 2020, at 6:03 PM, Saula, Oluwasijibomi 
>  wrote:
>
> ?
> Frederick,
>
> Yes on both counts! -  mmdf is showing pretty uniform (ie 5 NSDs out of 30 
> report 65% free; All others are uniform at 58% free)...
>
> NSD servers per disks are called in round-robin fashion as well, for example:
>
>  gpfs1 tier2_001nsd02-ib,nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib
>  gpfs1 tier2_002nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib
>  gpfs1 tier2_003nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib
>  gpfs1 tier2_004tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib,nsd04-ib
>
> Any other potential culprits to investigate?
>
> I do notice nsd03/nsd04 have long waiters, but nsd01 doesn't (nsd02-ib is 
> offline for now):
> [nsd03-ib ~]# mmdiag --waiters
> === mmdiag: waiters ===
> Waiting 6.5113 sec since 17:17:33, monitored, thread 4175 NSDThread: for I/O 
> completion
> Waiting 6.3810 sec since 17:17:33, monitored, thread 4127 NSDThread: for I/O 
> completion
> Waiting 6.1959 sec since 17:17:34, monitored, thread 4144 NSDThread: for I/O 
> completion
>
> nsd04-ib:
> Waiting 13.1386 sec since 17:19:09, monitored, thread 9971 NSDThread: for I/O 
> completion
> Waiting 10.3562 sec since 17:19:12, monitored, thread 9958 NSDThread: for I/O 
> completion
> Waiting 10.0338 sec since 17:19:12, monitored, thread 9951 NSDThread: for I/O 
> completion
>
> tsm01-ib:
> Waiting 8.1211 sec since 17:20:24, monitored, thread 3644 NSDThread: for I/O 
> completion
> Waiting 7.6690 sec since 17:20:24, monitored, thread 3641 NSDThread: for I/O 
> completion
> Waiting 7.4969 sec since 17:20:24, monitored, thread 3658 NSDThread: for I/O 
> completion
> Waiting 7.3573 sec since 17:20:24, monitored, thread 3642 NSDThread: for I/O 
> c

Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-04 Thread Kumaran Rajaram

Hi,

 >> I do notice nsd03/nsd04 have long waiters, but nsd01 doesn't (nsd02-ib
 is offline for now):

Please issue "mmlsdisk  -m" in NSD client to ascertain the active NSD
server serving a NSD. Since nsd02-ib is offlined, it is possible that some
servers would be serving higher NSDs than the rest.

https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_PoorPerformanceDuetoDiskFailure.htm
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_HealthStateOfNSDserver.htm

>> From the waiters you provided I would guess there is something amiss
with some of your storage systems.

Please ensure there are no "disk rebuild" pertaining to certain
NSDs/storage volumes in progress (in the storage subsystem) as this can
sometimes impact block-level performance and thus impact latency,
especially for write operations. Please ensure that the hardware components
constituting the Spectrum Scale stack are healthy and performing optimally.

https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_pspduetosyslevelcompissue.htm

Please refer to the Spectrum Scale documentation (link below) for potential
causes (e.g. Scale maintenance operation such as mmapplypolicy/mmestripefs
in progress, slow disks)  that can be contributing to this issue:

https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1pdg_performanceissues.htm

Thanks and Regards,
-Kums

Kumaran Rajaram
Spectrum Scale Development, IBM Systems
k...@us.ibm.com




From:   "Frederick Stock" 
To: gpfsug-discuss@spectrumscale.org
Cc: gpfsug-discuss@spectrumscale.org
Date:   06/04/2020 07:08 AM
Subject:    [EXTERNAL] Re: [gpfsug-discuss] Client Latency and High NSD
Server Load Average
Sent by:gpfsug-discuss-boun...@spectrumscale.org



>From the waiters you provided I would guess there is something amiss with
some of your storage systems.  Since those waiters are on NSD servers they
are waiting for IO requests to the kernel to complete.  Generally IOs are
expected to complete in milliseconds, not seconds.  You could look at the
output of "mmfsadm dump nsd" to see how the GPFS IO queues are working but
that would be secondary to checking your storage systems.

Fred
__
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
sto...@us.ibm.com


 - Original message -
 From: "Saula, Oluwasijibomi" 
 Sent by: gpfsug-discuss-boun...@spectrumscale.org
 To: "gpfsug-discuss@spectrumscale.org" 
 Cc:
 Subject: [EXTERNAL] Re: [gpfsug-discuss] Client Latency and High NSD
 Server Load Average
 Date: Wed, Jun 3, 2020 6:24 PM

 Frederick,

 Yes on both counts! -  mmdf is showing pretty uniform (ie 5 NSDs out of 30
 report 65% free; All others are uniform at 58% free)...

 NSD servers per disks are called in round-robin fashion as well, for
 example:

  gpfs1 tier2_001nsd02-ib,nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib
  gpfs1 tier2_002nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib
  gpfs1 tier2_003nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib
  gpfs1 tier2_004tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib,nsd04-ib


 Any other potential culprits to investigate?

 I do notice nsd03/nsd04 have long waiters, but nsd01 doesn't (nsd02-ib is
 offline for now):
 [nsd03-ib ~]# mmdiag --waiters
 === mmdiag: waiters ===
 Waiting 6.5113 sec since 17:17:33, monitored, thread 4175 NSDThread: for
 I/O completion
 Waiting 6.3810 sec since 17:17:33, monitored, thread 4127 NSDThread: for
 I/O completion
 Waiting 6.1959 sec since 17:17:34, monitored, thread 4144 NSDThread: for
 I/O completion

 nsd04-ib:

 Waiting 13.1386 sec since 17:19:09, monitored, thread 9971 NSDThread: for
 I/O completion
 Waiting 10.3562 sec since 17:19:12, monitored, thread 9958 NSDThread: for
 I/O completion
 Waiting 10.0338 sec since 17:19:12, monitored, thread 9951 NSDThread: for
 I/O completion



 tsm01-ib:

 Waiting 8.1211 sec since 17:20:24, monitored, thread 3644 NSDThread: for
 I/O completion
 Waiting 7.6690 sec since 17:20:24, monitored, thread 3641 NSDThread: for
 I/O completion
 Waiting 7.4969 sec since 17:20:24, monitored, thread 3658 NSDThread: for
 I/O completion
 Waiting 7.3573 sec since 17:20:24, monitored, thread 3642 NSDThread: for
 I/O completion



 nsd01-ib:

 Waiting 0.2548 sec since 17:21:47, monitored, thread 30513 NSDThread: for
 I/O completion
 Waiting 0.1502 sec since 17:21:47, monitored, thread 30529 NSDThread: for
 I/O completion








 Thanks,

 Oluwasijibomi (Siji) Saula


 HPC Systems Administrator  /  Information Technology





 Research 2 Building 220B / Fargo ND 58108-6050


 p: 701.231.7749 / www.ndsu.edu














 From: gpfsug-discuss-boun...@spectrumscale.org
  on behalf of
 gpfsug-discuss-requ...@spectrumscale.org
 
 Sent: Wednesday, June 3, 2

Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-04 Thread Frederick Stock
From the waiters you provided I would guess there is something amiss with some of your storage systems.  Since those waiters are on NSD servers they are waiting for IO requests to the kernel to complete.  Generally IOs are expected to complete in milliseconds, not seconds.  You could look at the output of "mmfsadm dump nsd" to see how the GPFS IO queues are working but that would be secondary to checking your storage systems.
Fred__Fred Stock | IBM Pittsburgh Lab | 720-430-8821sto...@us.ibm.com
 
 
- Original message -From: "Saula, Oluwasijibomi" Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: "gpfsug-discuss@spectrumscale.org" Cc:Subject: [EXTERNAL] Re: [gpfsug-discuss] Client Latency and High NSD Server Load AverageDate: Wed, Jun 3, 2020 6:24 PM 
Frederick,
 
Yes on both counts! -  mmdf is showing pretty uniform (ie 5 NSDs out of 30 report 65% free; All others are uniform at 58% free)...
 
NSD servers per disks are called in round-robin fashion as well, for example:
 
 gpfs1         tier2_001    nsd02-ib,nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib 
 gpfs1         tier2_002    nsd03-ib,nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib 
 gpfs1         tier2_003    nsd04-ib,tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib 
 gpfs1         tier2_004    tsm01-ib,nsd01-ib,nsd02-ib,nsd03-ib,nsd04-ib 
Any other potential culprits to investigate?
 
I do notice nsd03/nsd04 have long waiters, but nsd01 doesn't (nsd02-ib is offline for now): 
[nsd03-ib ~]# mmdiag --waiters 
=== mmdiag: waiters === 
Waiting 6.5113 sec since 17:17:33, monitored, thread 4175 NSDThread: for I/O completion 
Waiting 6.3810 sec since 17:17:33, monitored, thread 4127 NSDThread: for I/O completion 
Waiting 6.1959 sec since 17:17:34, monitored, thread 4144 NSDThread: for I/O completion 
  
nsd04-ib: 
  
Waiting 13.1386 sec since 17:19:09, monitored, thread 9971 NSDThread: for I/O completion 
Waiting 10.3562 sec since 17:19:12, monitored, thread 9958 NSDThread: for I/O completion 
Waiting 10.0338 sec since 17:19:12, monitored, thread 9951 NSDThread: for I/O completion 
  
tsm01-ib: 
  
Waiting 8.1211 sec since 17:20:24, monitored, thread 3644 NSDThread: for I/O completion 
Waiting 7.6690 sec since 17:20:24, monitored, thread 3641 NSDThread: for I/O completion 
Waiting 7.4969 sec since 17:20:24, monitored, thread 3658 NSDThread: for I/O completion 
Waiting 7.3573 sec since 17:20:24, monitored, thread 3642 NSDThread: for I/O completion 
  
nsd01-ib: 
  
Waiting 0.2548 sec since 17:21:47, monitored, thread 30513 NSDThread: for I/O completion 
Waiting 0.1502 sec since 17:21:47, monitored, thread 30529 NSDThread: for I/O completion 
  
 
 
 
Thanks,
 
Oluwasijibomi (Siji) Saula
HPC Systems Administrator  /  Information Technology
 
Research 2 Building 220B / Fargo ND 58108-6050
p: 701.231.7749 / www.ndsu.edu
 

  

 
 
 
From: gpfsug-discuss-boun...@spectrumscale.org  on behalf of gpfsug-discuss-requ...@spectrumscale.org Sent: Wednesday, June 3, 2020 4:56 PMTo: gpfsug-discuss@spectrumscale.org Subject: gpfsug-discuss Digest, Vol 101, Issue 6
 
Send gpfsug-discuss mailing list submissions to    gpfsug-discuss@spectrumscale.orgTo subscribe or unsubscribe via the World Wide Web, visit    http://gpfsug.org/mailman/listinfo/gpfsug-discussor, via email, send a message with subject or body 'help' to    gpfsug-discuss-requ...@spectrumscale.orgYou can reach the person managing the list at    gpfsug-discuss-ow...@spectrumscale.orgWhen replying, please edit your Subject line so it is more specificthan "Re: Contents of gpfsug-discuss digest..."Today's Topics:   1. Introducing SSUG::Digital  (Simon Thompson (Spectrum Scale User Group Chair))   2. Client Latency and High NSD Server Load Average  (Saula, Oluwasijibomi)   3. Re: Client Latency and High NSD Server Load Average  (Frederick Stock)--Message: 1Date: Wed, 03 Jun 2020 20:11:17 +0100From: "Simon Thompson (Spectrum Scale User Group Chair)"    To: "gpfsug-discuss@spectrumscale.org"    Subject: [gpfsug-discuss] Introducing SSUG::DigitalMessage-ID: Content-Type: text/plain; charset="utf-8"Hi All., I happy that we can finally announce SSUG:Digital, which will be a series of online session based on the types of topic we present at our in-person events. I know it?s taken use a while to get this up and running, but we?ve been working on trying to get the format right. So save the date for the first SSUG:Digital event which will take place on Thursday 18th June 2020 at 4pm BST. That?s:San Francisco, USA at 08:00 PDTNew York, USA at 11:00 EDTLondon, United Kingdom at 16:00 BSTFrankfurt, Germany at 17:00 CESTPune, India at 20:30 ISTWe estimate about 90 minutes for the first session, and please forgive any teething troubles as we get this going! (I know the times don?t work for everyone in the global community!) Each of the sessions we run over the next few months will be a different 

Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-04 Thread Stephen Ulmer
arset="utf-8"
> 
> Hi All.,
> 
>  
> 
> I happy that we can finally announce SSUG:Digital, which will be a series of 
> online session based on the types of topic we present at our in-person events.
> 
>  
> 
> I know it?s taken use a while to get this up and running, but we?ve been 
> working on trying to get the format right. So save the date for the first 
> SSUG:Digital event which will take place on Thursday 18th June 2020 at 4pm 
> BST. That?s:
> San Francisco, USA at 08:00 PDT
> New York, USA at 11:00 EDT
> London, United Kingdom at 16:00 BST
> Frankfurt, Germany at 17:00 CEST
> Pune, India at 20:30 IST
> We estimate about 90 minutes for the first session, and please forgive any 
> teething troubles as we get this going!
> 
>  
> 
> (I know the times don?t work for everyone in the global community!)
> 
>  
> 
> Each of the sessions we run over the next few months will be a different 
> Spectrum Scale Experts or Deep Dive session.
> 
> More details at:
> 
> https://www.spectrumscaleug.org/introducing-ssugdigital/
> 
>  
> 
> (We?ll announce the speakers and topic of the first session in the next few 
> days ?)
> 
>  
> 
> Thanks to Ulf, Kristy, Bill, Bob and Ted for their help and guidance in 
> getting this going.
> 
>  
> 
> We?re keen to include some user talks and site updates later in the series, 
> so please let me know if you might be interested in presenting in this format.
> 
>  
> 
> Simon Thompson
> 
> SSUG Group Chair
> 
> -- next part --
> An HTML attachment was scrubbed...
> URL: 
> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200603/e839fc73/attachment-0001.html>
> 
> --
> 
> Message: 2
> Date: Wed, 3 Jun 2020 21:45:05 +
> From: "Saula, Oluwasijibomi" 
> To: "gpfsug-discuss@spectrumscale.org"
> 
> Subject: [gpfsug-discuss] Client Latency and High NSD Server Load
> Average
> Message-ID:
> 
> 
> 
> Content-Type: text/plain; charset="iso-8859-1"
> 
> 
> Hello,
> 
> Anyone faced a situation where a majority of NSDs have a high load average 
> and a minority don't?
> 
> Also, is 10x NSD server latency for write operations than for read operations 
> expected in any circumstance?
> 
> We are seeing client latency between 6 and 9 seconds and are wondering if 
> some GPFS configuration or NSD server condition may be triggering this poor 
> performance.
> 
> 
> 
> Thanks,
> 
> 
> Oluwasijibomi (Siji) Saula
> 
> HPC Systems Administrator  /  Information Technology
> 
> 
> 
> Research 2 Building 220B / Fargo ND 58108-6050
> 
> p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>
> 
> 
> 
> [cid:image001.gif@01D57DE0.91C300C0]
> 
> 
> -- next part --
> An HTML attachment was scrubbed...
> URL: 
> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200603/2ac14173/attachment-0001.html>
> 
> --
> 
> Message: 3
> Date: Wed, 3 Jun 2020 21:56:04 +
> From: "Frederick Stock" 
> To: gpfsug-discuss@spectrumscale.org
> Cc: gpfsug-discuss@spectrumscale.org
> Subject: Re: [gpfsug-discuss] Client Latency and High NSD Server Load
> Average
> Message-ID:
> 
> 
> 
> Content-Type: text/plain; charset="us-ascii"
> 
> An HTML attachment was scrubbed...
> URL: 
> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200603/c252f3b9/attachment.html>
> 
> --
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> End of gpfsug-discuss Digest, Vol 101, Issue 6
> **
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-03 Thread Wahl, Edward
I saw something EXACTLY like this way back in the 3.x days when I had a backend 
storage unit that had a flaky main memory issue and some enclosures were 
constantly flapping between controllers for ownership.  Some NSDs were 
affected, some were not.  I can imagine this could still happen in 4.x and 
5.0.x with the right hardware problem.

Were things working before or is this a new installation?

What is the backend storage?

If you are using device-mapper-multipath, look for events in the 
messages/syslog.  Incorrect path weighting? Using ALUA when it isn't supported? 
(that can be comically bad! helped a friend diagnose that one at a customer 
once)   Perhaps using the wrong rr_weight or rr_min_io so you have some wacky 
long io queueing issues where your path_selector cannot keep up with the IO 
queue?
Most of this is easily fixed by using most vendor's suggested settings anymore, 
IF the hardware is healthy...

Ed


From: gpfsug-discuss-boun...@spectrumscale.org 
 on behalf of Saula, Oluwasijibomi 

Sent: Wednesday, June 3, 2020 5:45 PM
To: gpfsug-discuss@spectrumscale.org 
Subject: [gpfsug-discuss] Client Latency and High NSD Server Load Average


Hello,

Anyone faced a situation where a majority of NSDs have a high load average and 
a minority don't?

Also, is 10x NSD server latency for write operations than for read operations 
expected in any circumstance?

We are seeing client latency between 6 and 9 seconds and are wondering if some 
GPFS configuration or NSD server condition may be triggering this poor 
performance.



Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology



Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / 
www.ndsu.edu



[cid:image001.gif@01D57DE0.91C300C0]


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-03 Thread Uwe Falke
Hello, Oluwasijibomi , 
I suppose you are not running ESS (might be wrong on this). 

I'd check the IO history on the NSD servers (high IO times?) and in 
addition the IO traffic at  the block device level , e.g. with iostat or 
the like (still high IO times there? Are the IO sizes ok or too low on the 
NSD servers with high write latencies? ). 

What's the picture on your storage back-end? All caches active? Is the 
storage backend fully loaded or rather idle? How is storage connected? 
SAS? FC? IB?

What is the actual IO pattern when you see these high latencies?

Do you run additional apps on some or all of youre NSD servers?

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Global Technology Services / Project Services Delivery / High Performance 
Computing
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Dr. Thomas Wolter, Sven Schooss
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   "Saula, Oluwasijibomi" 
To: "gpfsug-discuss@spectrumscale.org" 

Date:   03/06/2020 23:45
Subject:[EXTERNAL] [gpfsug-discuss] Client Latency and High NSD 
Server Load Average
Sent by:gpfsug-discuss-boun...@spectrumscale.org




Hello,

Anyone faced a situation where a majority of NSDs have a high load average 
and a minority don't?

Also, is 10x NSD server latency for write operations than for read 
operations expected in any circumstance? 

We are seeing client latency between 6 and 9 seconds and are wondering if 
some GPFS configuration or NSD server condition may be triggering this 
poor performance.


Thanks,

Oluwasijibomi (Siji) Saula
HPC Systems Administrator  /  Information Technology
 
Research 2 Building 220B / Fargo ND 58108-6050
p: 701.231.7749 / www.ndsu.edu
 


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8=ql8z1YSfrzUgT8kXQBMEUuA8uyuprz6-fpvC660vG5A=JSYPIzNMZFNp17VaqcNWNuwwUE_nQMKu47mOOUonLp0=
 





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-03 Thread Kristy Kallback-Rose
Are you running ESS?


> On Jun 3, 2020, at 2:56 PM, Frederick Stock  wrote:
> 
> Does the output of mmdf show that data is evenly distributed across your 
> NSDs?  If not that could be contributing to your problem.  Also, are your 
> NSDs evenly distributed across your NSD servers, and the NSD configured so 
> the first NSD server for each is not the same one?
> 
> Fred
> __
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> sto...@us.ibm.com
>  
>  
> - Original message -
> From: "Saula, Oluwasijibomi" 
> Sent by: gpfsug-discuss-boun...@spectrumscale.org
> To: "gpfsug-discuss@spectrumscale.org" 
> Cc:
> Subject: [EXTERNAL] [gpfsug-discuss] Client Latency and High NSD Server Load 
> Average
> Date: Wed, Jun 3, 2020 5:45 PM
>  
>  
> Hello,
>  
> Anyone faced a situation where a majority of NSDs have a high load average 
> and a minority don't?
>  
> Also, is 10x NSD server latency for write operations than for read operations 
> expected in any circumstance? 
>  
> We are seeing client latency between 6 and 9 seconds and are wondering if 
> some GPFS configuration or NSD server condition may be triggering this poor 
> performance.
>  
>  
>  
>  
> Thanks,
>  
> Oluwasijibomi (Siji) Saula
> HPC Systems Administrator  /  Information Technology
>  
> Research 2 Building 220B / Fargo ND 58108-6050
> p: 701.231.7749 / www.ndsu.edu 
>  
> 
>  
>  
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
>  
>  
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-03 Thread Saula, Oluwasijibomi
ttachments/20200603/e839fc73/attachment-0001.html>

--

Message: 2
Date: Wed, 3 Jun 2020 21:45:05 +
From: "Saula, Oluwasijibomi" 
To: "gpfsug-discuss@spectrumscale.org"

Subject: [gpfsug-discuss] Client Latency and High NSD Server Load
Average
Message-ID:



Content-Type: text/plain; charset="iso-8859-1"


Hello,

Anyone faced a situation where a majority of NSDs have a high load average and 
a minority don't?

Also, is 10x NSD server latency for write operations than for read operations 
expected in any circumstance?

We are seeing client latency between 6 and 9 seconds and are wondering if some 
GPFS configuration or NSD server condition may be triggering this poor 
performance.



Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology



Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu<http://www.ndsu.edu/>



[cid:image001.gif@01D57DE0.91C300C0]


-- next part --
An HTML attachment was scrubbed...
URL: 
<http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200603/2ac14173/attachment-0001.html>

--

Message: 3
Date: Wed, 3 Jun 2020 21:56:04 +
From: "Frederick Stock" 
To: gpfsug-discuss@spectrumscale.org
Cc: gpfsug-discuss@spectrumscale.org
Subject: Re: [gpfsug-discuss] Client Latency and High NSD Server Load
Average
Message-ID:



Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: 
<http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20200603/c252f3b9/attachment.html>

--

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 101, Issue 6
**
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Client Latency and High NSD Server Load Average

2020-06-03 Thread Frederick Stock
Does the output of mmdf show that data is evenly distributed across your NSDs?  If not that could be contributing to your problem.  Also, are your NSDs evenly distributed across your NSD servers, and the NSD configured so the first NSD server for each is not the same one?
Fred__Fred Stock | IBM Pittsburgh Lab | 720-430-8821sto...@us.ibm.com
 
 
- Original message -From: "Saula, Oluwasijibomi" Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: "gpfsug-discuss@spectrumscale.org" Cc:Subject: [EXTERNAL] [gpfsug-discuss] Client Latency and High NSD Server Load AverageDate: Wed, Jun 3, 2020 5:45 PM 
 
Hello,
 
Anyone faced a situation where a majority of NSDs have a high load average and a minority don't?
 
Also, is 10x NSD server latency for write operations than for read operations expected in any circumstance? 
 
We are seeing client latency between 6 and 9 seconds and are wondering if some GPFS configuration or NSD server condition may be triggering this poor performance.
 
 
 
 
Thanks,
 
Oluwasijibomi (Siji) Saula
HPC Systems Administrator  /  Information Technology
 
Research 2 Building 220B / Fargo ND 58108-6050
p: 701.231.7749 / www.ndsu.edu
 

  

 
___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss