Re: [gpfsug-discuss] bizarre performance behavior

2017-04-20 Thread Marcus Koenig1

Hi Kennmeth,

we also had similar performance numbers in our tests. Native was far
quicker than through GPFS. When we learned though that the client tested
the performance on the FS at a big blocksize (512k) with small files - we
were able to speed it up significantly using a smaller FS blocksize
(obviously we had to recreate the FS).

So really depends on how you do your tests.


Cheers,

Marcus Koenig
Lab Services Storage & Power Specialist
IBM Australia & New Zealand Advanced Technical Skills
IBM Systems-Hardware
|---+--+>
|   |  |
|
|---+--+>
  
>|
  | 
   |
  
>|
|---+--+>
|   |  |
|
|   |Mobile: +64 21 67 34 27   |
|
|   |E-mail: marc...@nz1.ibm.com   |
|
|   |  |
|
|   |  |
|
|   |  |
|
|   |82 Wyndham Street |
|
|   |Auckland, AUK 1010|
|
|   |New Zealand   |
|
|   |  |
|
|   |  |
|
|   |  |
|
|   |  |
|
|   |  |
|
|---+--+>
  
>|
  | 
   |
  
>|
|---+--+>
|   |  |
|
|---+--+>
  
>|
  | 
   |
  
>|





From:   "Uwe Falke" 
To: gpfsug main discussion list 
Date:   04/21/2017 03:07 AM
Subject:Re: [gpfsug-discuss] bizarre performance behavior
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hi Kennmeth,

is prefetching off or on  at your storage backend?
Raw sequential is very different from GPFS sequential at the storage
device !
GPFS does its own prefetching, the storage would never know what sectors
sequential read at GPFS level maps to at storage level!


Mit 

Re: [gpfsug-discuss] Spectrum Scale Slow to create directories

2017-04-20 Thread Peter Childs
Simon,

We've managed to resolve this issue by switching off quota's and switching them 
back on again and rebuilding the quota file.

Can I check if you run quota's on your cluster.

See you 2 weeks in Manchester

Thanks in advance.

Peter Childs
Research Storage Expert
ITS Research Infrastructure
Queen Mary, University of London
Phone: 020 7882 8393


From: gpfsug-discuss-boun...@spectrumscale.org 
 on behalf of Simon Thompson (IT 
Research Support) 
Sent: Tuesday, April 11, 2017 4:55:35 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Spectrum Scale Slow to create directories

We actually saw this for a while on one of our clusters which was new. But
by the time I'd got round to looking deeper, it had gone, maybe we were
using the NSDs more heavily, or possibly we'd upgraded. We are at 4.2.2-2,
so might be worth trying to bump the version and see if it goes away.

We saw it on the NSD servers directly as well, so not some client trying
to talk to it, so maybe there was some buggy code?

Simon

On 11/04/2017, 16:51, "gpfsug-discuss-boun...@spectrumscale.org on behalf
of Bryan Banister"  wrote:

>There are so many things to look at and many tools for doing so (iostat,
>htop, nsdperf, mmdiag, mmhealth, mmlsconfig, mmlsfs, etc).  I would
>recommend a review of the presentation that Yuri gave at the most recent
>GPFS User Group:
>https://drive.google.com/drive/folders/0B124dhp9jJC-UjFlVjJTa2ZaVWs
>
>Cheers,
>-Bryan
>
>-Original Message-
>From: gpfsug-discuss-boun...@spectrumscale.org
>[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Peter
>Childs
>Sent: Tuesday, April 11, 2017 3:58 AM
>To: gpfsug main discussion list 
>Subject: [gpfsug-discuss] Spectrum Scale Slow to create directories
>
>This is a curious issue which I'm trying to get to the bottom of.
>
>We currently have two Spectrum Scale file systems, both are running GPFS
>4.2.1-1 some of the servers have been upgraded to 4.2.1-2.
>
>The older one which was upgraded from GPFS 3.5 works find create a
>directory is always fast and no issue.
>
>The new one, which has nice new SSD for metadata and hence should be
>faster. can take up to 30 seconds to create a directory but usually takes
>less than a second, The longer directory creates usually happen on busy
>nodes that have not used the new storage in a while. (Its new so we've
>not moved much of the data over yet) But it can also happen randomly
>anywhere, including from the NSD servers them selves. (times of 3-4
>seconds from the NSD servers have been seen, on a single directory create)
>
>We've been pointed at the network and suggested we check all network
>settings, and its been suggested to build an admin network, but I'm not
>sure I entirely understand why and how this would help. Its a mixed
>1G/10G network with the NSD servers connected at 40G with an MTU of 9000.
>
>However as I say, the older filesystem is fine, and it does not matter if
>the nodes are connected to the old GPFS cluster or the new one, (although
>the delay is worst on the old gpfs cluster), So I'm really playing spot
>the difference. and the network is not really an obvious difference.
>
>Its been suggested to look at a trace when it occurs but as its difficult
>to recreate collecting one is difficult.
>
>Any ideas would be most helpful.
>
>Thanks
>
>
>
>Peter Childs
>ITS Research Infrastructure
>Queen Mary, University of London
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>Note: This email is for the confidential use of the named addressee(s)
>only and may contain proprietary, confidential or privileged information.
>If you are not the intended recipient, you are hereby notified that any
>review, dissemination or copying of this email is strictly prohibited,
>and to please notify the sender immediately and destroy this email and
>any attachments. Email transmission cannot be guaranteed to be secure or
>error-free. The Company, therefore, does not make any guarantees as to
>the completeness or accuracy of this email or any attachments. This email
>is for informational purposes only and does not constitute a
>recommendation, offer, request or solicitation of any kind to buy, sell,
>subscribe, redeem or perform any type of transaction of a financial
>product.
>___
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] RAID config for SSD's used for data

2017-04-20 Thread Uwe Falke
Some thoughts: 
you give typical cumulative usage values. However, a fast pool might 
matter most for spikes of the traffic. Do you have spikes driving your 
current system to the edge? 

Then: using the SSD pool for writes is straightforward (placement), using 
it for reads will only pay off if data are either pre-fetched to the pool 
somehow, or read more than once before getting migrated back to the HDD 
pool(s). Write traffic is less than read as you wrote. 

RAID1 vs RAID6: RMW penalty of parity-based RAIDs was mentioned, which 
strikes at writes smaller than the full stripe width of your RAID - what 
type of write I/O do you have (or expect)? (This may also be important for 
choosing the quality of SSDs, with RMW in mind you will have a comparably 
huge amount of data written on the SSD devices if your I/O traffic 
consists of myriads of small IOs and you organized the SSDs in a RAID5 or 
RAID6)

I suppose your current system is well set to provide the required 
aggregate throughput. Now, what kind of improvement do you expect? How are 
the clients connected? Would they have sufficient network bandwidth to see 
improvements at all?




 
Mit freundlichen Grüßen / Kind regards

 
Dr. Uwe Falke
 
IT Specialist
High Performance Computing Services / Integrated Technology Services / 
Data Center Services
---
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefa...@de.ibm.com
---
IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: 
Andreas Hasse, Thorsten Moehring
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 17122 


gpfsug-discuss-boun...@spectrumscale.org wrote on 04/19/2017 09:53:42 PM:

> From: "Buterbaugh, Kevin L" 
> To: gpfsug main discussion list 
> Date: 04/19/2017 09:54 PM
> Subject: [gpfsug-discuss] RAID config for SSD's used for data
> Sent by: gpfsug-discuss-boun...@spectrumscale.org
> 
> Hi All, 
> 
> We currently have what I believe is a fairly typical setup ? 
> metadata for our GPFS filesystems is the only thing in the system 
> pool and it?s on SSD, while data is on spinning disk (RAID 6 LUNs). 
> Everything connected via 8 Gb FC SAN.  8 NSD servers.  Roughly 1 PB 
> usable space.
> 
> Now lets just say that you have a little bit of money to spend. 
> Your I/O demands aren?t great - in fact, they?re way on the low end 
> ? typical (cumulative) usage is 200 - 600 MB/sec read, less than 
> that for writes.  But while GPFS has always been great and therefore
> you don?t need to Make GPFS Great Again, you do want to provide your
> users with the best possible environment.
> 
> So you?re considering the purchase of a dual-controller FC storage 
> array with 12 or so 1.8 TB SSD?s in it, with the idea being that 
> that storage would be in its? own storage pool and that pool would 
> be the default location for I/O for your main filesystem ? at least 
> for smaller files.  You intend to use mmapplypolicy nightly to move 
> data to / from this pool and the spinning disk pools.
> 
> Given all that ? would you configure those disks as 6 RAID 1 mirrors
> and have 6 different primary NSD servers or would it be feasible to 
> configure one big RAID 6 LUN?  I?m thinking the latter is not a good
> idea as there could only be one primary NSD server for that one LUN,
> but given that:  1) I have no experience with this, and 2) I have 
> been wrong once or twice before (), I?m looking for advice. 
Thanks!
> 
> ?
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and 
Education
> kevin.buterba...@vanderbilt.edu - (615)875-9633
> 
> ___
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss


___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] RAID config for SSD's used for data

2017-04-20 Thread Jonathan Buzzard
On Wed, 2017-04-19 at 14:23 -0700, Alex Chekholko wrote:
> On 04/19/2017 12:53 PM, Buterbaugh, Kevin L wrote:
> >
> > So you’re considering the purchase of a dual-controller FC storage array
> > with 12 or so 1.8 TB SSD’s in it, with the idea being that that storage
> > would be in its’ own storage pool and that pool would be the default
> > location for I/O for your main filesystem … at least for smaller files.
> >  You intend to use mmapplypolicy nightly to move data to / from this
> > pool and the spinning disk pools.
> 
> We did this and failed in interesting (but in retrospect obvious) ways. 
> You will want to ensure that your users cannot fill your write target 
> pool within a day.  The faster the storage, the more likely that is to 
> happen.  Or else your users will get ENOSPC.

Eh? Seriously you should have a fail over rule so that when your "fast"
pool is filled up it starts allocating in the "slow" pool (nice good
names that are descriptive and less than 8 characters including
termination character). Now there are issues when you get close to very
full so you need to set the fail over to as sizeable bit less than the
full size, 95% is a good starting point.

The pool names size is important because if the fast pool is less than
eight characters and the slow is more because you called in
"nearline" (which is 9 including termination character) once the files
get moved they get backed up again by TSM, yeah!!!

The 95% bit comes about from this. Imagine you had 12KB left in the fast
pool and you go to write a file. You open the file with 0B in size and
then start writing. At 12KB you run out of space in the fast pool and as
the file can only be in one pool you get a ENOSPC, and the file gets
canned. This then starts repeating on a regular basis.

So if you start allocating at significantly less than 100%, say 95%
where that 5% is larger than the largest file you expect that file
works, but all subsequent files get allocated in the slow pool, till you
flush the fast pool.

Something like this as the last two rules in your policy should do the
trick.

/* by default new files to the fast disk unless full, then to slow */
RULE 'new' SET POOL 'fast' LIMIT(95)
RULE 'spillover' SET POOL 'slow'

However in general your fast pool needs to have sufficient capacity to
take your daily churn and then some.

JAB.

-- 
Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] RAID config for SSD's used for data

2017-04-20 Thread Jonathan Buzzard
On Wed, 2017-04-19 at 20:05 +, Simon Thompson (IT Research Support)
wrote:
> By having many LUNs, you get many IO queues for Linux to play with. Also the 
> raid6 overhead can be quite significant, so it might be better to go with 
> raid1 anyway depending on the controller...
> 
> And if only gpfs had some sort of auto tier back up the pools for hot or data 
> caching :-)
> 

If you have sized the "fast" pool correctly then the "slow" pool will be
spending most of it's time doing diddly squat, aka under 10 IOPS per
second unless you are flushing the pool of old files to make space. I
have graphs that show this.

Then two things happen, if you are just reading the file then fine,
probably coming from the cache or the disks are not very busy anyway so
you won't notice.

If you happen to *change* the file and start doing things actively with
it again, then because most programs approach this by creating an
entirely new file with a temporary name, then doing a rename and delete
shuffle so a crash will leave you with a valid file somewhere then the
changed version ends up on the fast disk by virtue of being a new file.

JAB.

-- 
Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss