Re: [gpfsug-discuss] Long IO waiters and IBM Storwize V5030

2021-05-28 Thread Andrew Beattie


Hi Oluwasijibomi,

If you set up a Storage Insights Standard account
You can monitor the performance of your 5030, and pull the performance
metrics of the block storage array when you see poor performance in your
scale cluster.
This will give you some idea as to what is happening,
but The 5030 is designed to be a backup / low IOPS Storage controller, the
processing power and memory in the controllers is very limited.

If you have significant workload happening on your file system in terms of
user access (reads / writes) I am not at all surprised your seeing
performance bottleneck from the 5030.

You could ask you local IBM Presales team to perform a StorM disk model of
the expected performance using your current configuration to show you what
you performance should look like.

Regards

Andrew Beattie
Technical Sales - Storage for Data and AI
IBM Australia and New Zealand

> On 29 May 2021, at 06:04, Uwe Falke  wrote:
>
> Hi, odd prefetch strategy would affect read performance, but write
latency
> is claimed to be even worse ...
> Have you simply checked what the actual IO performance of the v5k box
> under that load is and how it compares to its nominal performance and
that
> of its disks?
> how is the storage organised? how many LUNs/NSDs, what RAID code (V5k
> cannot do declustered RAID, can it?), any thin provisioning or other
> gimmicks in the game?
> what IO sizes ?
> tons of things to look at.
>
> Mit freundlichen Grüßen / Kind regards
>
> Dr. Uwe Falke
> IT Specialist
> Hybrid Cloud Infrastructure / Technology Consulting & Implementation
> Services
> +49 175 575 2877 Mobile
> Rochlitzer Str. 19, 09111 Chemnitz, Germany
> uwefa...@de.ibm.com
>
> IBM Services
>
> IBM Data Privacy Statement
>
> IBM Deutschland Business & Technology Services GmbH
> Geschäftsführung: Sven Schooss, Stefan Hierl
> Sitz der Gesellschaft: Ehningen
> Registergericht: Amtsgericht Stuttgart, HRB 17122
>
>
>
> From:   Jan-Frode Myklebust 
> To:     gpfsug main discussion list 
> Date:   28/05/2021 19:50
> Subject:[EXTERNAL] Re: [gpfsug-discuss] Long IO waiters and IBM
> Storwize V5030
> Sent by:gpfsug-discuss-boun...@spectrumscale.org
>
>
>
>
> One thing to check: Storwize/SVC code will *always* guess wrong on
> prefetching for GPFS. You can see this with having a lot higher read data

> throughput on mdisk vs. on on vdisks in the webui. To fix it, disable
> cache_prefetch with "chsystem -cache_prefetch off".
>
> This being a global setting, you probably only should set it if the
system
> is only used for GPFS.
>
>
>   -jf
>
> On Fri, May 28, 2021 at 5:58 PM Saula, Oluwasijibomi <
> oluwasijibomi.sa...@ndsu.edu> wrote:
> Hi Folks,
>
> So, we are experiencing some very long IO waiters in our GPFS cluster:
>
> #  mmdiag --waiters
>
> === mmdiag: waiters ===
> Waiting 17.3823 sec since 10:41:01, monitored, thread 21761 NSDThread:
for
> I/O completion
> Waiting 16.6140 sec since 10:41:02, monitored, thread 21730 NSDThread:
for
> I/O completion
> Waiting 15.3004 sec since 10:41:03, monitored, thread 21763 NSDThread:
for
> I/O completion
> Waiting 15.2013 sec since 10:41:03, monitored, thread 22175
>
> However, GPFS support is pointing to our IBM Storwize V5030 disk system
as
> the source of latency. Unfortunately, we don't have paid support for the
> system so we are polling for anyone who might be able to assist.
>
> Does anyone by chance have any experience with IBM Storwize V5030 or
> possess a problem determination guide for the V5030?
>
> We've briefly reviewed the V5030 management portal, but we still haven't
> identified a cause for the increased latencies (i.e. read ~129ms, write
> ~198ms).
>
> Granted, we have some heavy client workloads, yet we seem to experience
> this drastic drop in performance every couple of months, probably
> exacerbated by heavy IO demands.
>
> Any assistance would be much appreciated.
>
>
> Thanks,
>
> (Siji) Saula
> HPC Systems Administrator  /  Information Technology
>
> Research 2 Building 220B / Fargo ND 58108-6050
> p: 701.231.7749 / www.ndsu.edu
>
>
>
>
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
>
>
>
>
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

>
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Long IO waiters and IBM Storwize V5030

2021-05-28 Thread Uwe Falke
You say you see that every few months: does it mean with about the same 
load sometimes the system knocks out and sometimes it behaves ok?
Have you checked the v5k event log, if there is anything going on (write 
performance may suffer if write cache is off which might happen if the 
buffer batteries are low on charge - but again that does not directly 
explain the high read latency). 
Are the latencies you gave derived from GPFS, from the OS, or from the 
V5k?

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rochlitzer Str. 19, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   "Uwe Falke" 
To: gpfsug main discussion list 
Date:   28/05/2021 22:04
Subject:    [EXTERNAL] Re: [gpfsug-discuss] Long IO waiters and IBM 
Storwize V5030
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hi, odd prefetch strategy would affect read performance, but write latency 

is claimed to be even worse ...
Have you simply checked what the actual IO performance of the v5k box 
under that load is and how it compares to its nominal performance and that 

of its disks?
how is the storage organised? how many LUNs/NSDs, what RAID code (V5k 
cannot do declustered RAID, can it?), any thin provisioning or other 
gimmicks in the game?
what IO sizes ?
tons of things to look at. 

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rochlitzer Str. 19, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   Jan-Frode Myklebust 
To: gpfsug main discussion list 
Date:   28/05/2021 19:50
Subject:    [EXTERNAL] Re: [gpfsug-discuss] Long IO waiters and IBM 
Storwize V5030
Sent by:gpfsug-discuss-boun...@spectrumscale.org




One thing to check: Storwize/SVC code will *always* guess wrong on 
prefetching for GPFS. You can see this with having a lot higher read data 
throughput on mdisk vs. on on vdisks in the webui. To fix it, disable 
cache_prefetch with "chsystem -cache_prefetch off".

This being a global setting, you probably only should set it if the system 

is only used for GPFS. 


   -jf

On Fri, May 28, 2021 at 5:58 PM Saula, Oluwasijibomi <
oluwasijibomi.sa...@ndsu.edu> wrote:
Hi Folks,

So, we are experiencing some very long IO waiters in our GPFS cluster:

#  mmdiag --waiters 

=== mmdiag: waiters ===
Waiting 17.3823 sec since 10:41:01, monitored, thread 21761 NSDThread: for 

I/O completion
Waiting 16.6140 sec since 10:41:02, monitored, thread 21730 NSDThread: for 

I/O completion
Waiting 15.3004 sec since 10:41:03, monitored, thread 21763 NSDThread: for 

I/O completion
Waiting 15.2013 sec since 10:41:03, monitored, thread 22175 

However, GPFS support is pointing to our IBM Storwize V5030 disk system as 

the source of latency. Unfortunately, we don't have paid support for the 
system so we are polling for anyone who might be able to assist.

Does anyone by chance have any experience with IBM Storwize V5030 or 
possess a problem determination guide for the V5030?

We've briefly reviewed the V5030 management portal, but we still haven't 
identified a cause for the increased latencies (i.e. read ~129ms, write 
~198ms). 

Granted, we have some heavy client workloads, yet we seem to experience 
this drastic drop in performance every couple of months, probably 
exacerbated by heavy IO demands.

Any assistance would be much appreciated.


Thanks,

Oluwasijibomi (Siji) Saula
HPC Systems Administrator  /  Information Technology
 
Research 2 Building 220B / Fargo ND 58108-6050
p: 701.231.7749 / www.ndsu.edu
 



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
 





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 






___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Long IO waiters and IBM Storwize V5030

2021-05-28 Thread Uwe Falke
Hi, odd prefetch strategy would affect read performance, but write latency 
is claimed to be even worse ...
Have you simply checked what the actual IO performance of the v5k box 
under that load is and how it compares to its nominal performance and that 
of its disks?
how is the storage organised? how many LUNs/NSDs, what RAID code (V5k 
cannot do declustered RAID, can it?), any thin provisioning or other 
gimmicks in the game?
what IO sizes ?
tons of things to look at. 

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rochlitzer Str. 19, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   Jan-Frode Myklebust 
To: gpfsug main discussion list 
Date:   28/05/2021 19:50
Subject:[EXTERNAL] Re: [gpfsug-discuss] Long IO waiters and IBM 
Storwize V5030
Sent by:gpfsug-discuss-boun...@spectrumscale.org




One thing to check: Storwize/SVC code will *always* guess wrong on 
prefetching for GPFS. You can see this with having a lot higher read data 
throughput on mdisk vs. on on vdisks in the webui. To fix it, disable 
cache_prefetch with "chsystem -cache_prefetch off".

This being a global setting, you probably only should set it if the system 
is only used for GPFS. 


   -jf

On Fri, May 28, 2021 at 5:58 PM Saula, Oluwasijibomi <
oluwasijibomi.sa...@ndsu.edu> wrote:
Hi Folks,

So, we are experiencing some very long IO waiters in our GPFS cluster:

#  mmdiag --waiters 

=== mmdiag: waiters ===
Waiting 17.3823 sec since 10:41:01, monitored, thread 21761 NSDThread: for 
I/O completion
Waiting 16.6140 sec since 10:41:02, monitored, thread 21730 NSDThread: for 
I/O completion
Waiting 15.3004 sec since 10:41:03, monitored, thread 21763 NSDThread: for 
I/O completion
Waiting 15.2013 sec since 10:41:03, monitored, thread 22175 

However, GPFS support is pointing to our IBM Storwize V5030 disk system as 
the source of latency. Unfortunately, we don't have paid support for the 
system so we are polling for anyone who might be able to assist.

Does anyone by chance have any experience with IBM Storwize V5030 or 
possess a problem determination guide for the V5030?

We've briefly reviewed the V5030 management portal, but we still haven't 
identified a cause for the increased latencies (i.e. read ~129ms, write 
~198ms). 

Granted, we have some heavy client workloads, yet we seem to experience 
this drastic drop in performance every couple of months, probably 
exacerbated by heavy IO demands.

Any assistance would be much appreciated.


Thanks,

Oluwasijibomi (Siji) Saula
HPC Systems Administrator  /  Information Technology
 
Research 2 Building 220B / Fargo ND 58108-6050
p: 701.231.7749 / www.ndsu.edu
 



___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Long IO waiters and IBM Storwize V5030

2021-05-28 Thread Saula, Oluwasijibomi
Hi Folks,

So, we are experiencing some very long IO waiters in our GPFS cluster:


#  mmdiag --waiters


=== mmdiag: waiters ===

Waiting 17.3823 sec since 10:41:01, monitored, thread 21761 NSDThread: for I/O 
completion

Waiting 16.6140 sec since 10:41:02, monitored, thread 21730 NSDThread: for I/O 
completion

Waiting 15.3004 sec since 10:41:03, monitored, thread 21763 NSDThread: for I/O 
completion

Waiting 15.2013 sec since 10:41:03, monitored, thread 22175

However, GPFS support is pointing to our IBM Storwize V5030 disk system as the 
source of latency. Unfortunately, we don't have paid support for the system so 
we are polling for anyone who might be able to assist.

Does anyone by chance have any experience with IBM Storwize V5030 or possess a 
problem determination guide for the V5030?

We've briefly reviewed the V5030 management portal, but we still haven't 
identified a cause for the increased latencies (i.e. read ~129ms, write ~198ms).

Granted, we have some heavy client workloads, yet we seem to experience this 
drastic drop in performance every couple of months, probably exacerbated by 
heavy IO demands.

Any assistance would be much appreciated.



Thanks,


Oluwasijibomi (Siji) Saula

HPC Systems Administrator  /  Information Technology



Research 2 Building 220B / Fargo ND 58108-6050

p: 701.231.7749 / www.ndsu.edu



[cid:image001.gif@01D57DE0.91C300C0]


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] Long IO waiters and IBM Storwize V5030

2021-05-28 Thread Jan-Frode Myklebust
One thing to check: Storwize/SVC code will *always* guess wrong on
prefetching for GPFS. You can see this with having a lot higher read data
throughput on mdisk vs. on on vdisks in the webui. To fix it, disable
cache_prefetch with "chsystem -cache_prefetch off".

This being a global setting, you probably only should set it if the system
is only used for GPFS.


   -jf

On Fri, May 28, 2021 at 5:58 PM Saula, Oluwasijibomi <
oluwasijibomi.sa...@ndsu.edu> wrote:

> Hi Folks,
>
> So, we are experiencing some very long IO waiters in our GPFS cluster:
>
> #  mmdiag --waiters
>
>
> === mmdiag: waiters ===
>
> Waiting 17.3823 sec since 10:41:01, monitored, thread 21761 NSDThread: for
> I/O completion
>
> Waiting 16.6140 sec since 10:41:02, monitored, thread 21730 NSDThread: for
> I/O completion
>
> Waiting 15.3004 sec since 10:41:03, monitored, thread 21763 NSDThread: for
> I/O completion
>
> Waiting 15.2013 sec since 10:41:03, monitored, thread 22175
>
> However, GPFS support is pointing to our IBM Storwize V5030 disk system
> as the source of latency. Unfortunately, we don't have paid support for the
> system so we are polling for anyone who might be able to assist.
>
> Does anyone by chance have any experience with IBM Storwize V5030 or
> possess a problem determination guide for the V5030?
>
> We've briefly reviewed the V5030 management portal, but we still haven't
> identified a cause for the increased latencies (i.e. read ~129ms, write
> ~198ms).
>
> Granted, we have some heavy client workloads, yet we seem to experience
> this drastic drop in performance every couple of months, probably
> exacerbated by heavy IO demands.
>
> Any assistance would be much appreciated.
>
>
> Thanks,
>
> *Oluwasijibomi (Siji) Saula*
>
> HPC Systems Administrator  /  Information Technology
>
>
>
> Research 2 Building 220B / Fargo ND 58108-6050
>
> p: 701.231.7749 / www.ndsu.edu
>
>
>
>
>
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss