Re: [lustre-discuss] problem getting high performance output to single file

2015-05-19 Thread Schneider, David A.
Hi,

My first test was just to do the for loop where I allocate a 4MB buffer, 
initialize it, and delete it. That program ran at about 6GB/sec. Once I write 
to a file, I drop down to 370mb/sec. Our top performance for I/O to one file 
has been about 400 mb/sec.

For this question: Which versions are you using in servers and clients? 
I don't know what command to determine this, I suspect it is older since we are 
on red hat 5. I will ask.

best,

David Schneider

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
John Bauer [bau...@iodoctors.com]
Sent: Tuesday, May 19, 2015 8:52 AM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem getting high performance output to single 
file

David

You note that you write a 6GB file.  I suspect that your Linux systems
have significantly more memory than 6GB meaning your file will end being
cached in the system buffers.  It wont matter how many OSTs you use as
you probably are not measuring the speed to the OST's, but rather, you
are measuring the memory copy speed.
What transfer rate are you seeing?

John

On 5/19/2015 10:40 AM, Schneider, David A. wrote:
 I am trying to get good performance with parallel writing to one file through 
 MPI. Our cluster has high performance when I write to separate files, but 
 when I use one file - I see very little performance increase.

 As I understand, our cluster defaults to use one OST per file. There are many 
 OST's though, which is how we get good performance when writing to multiple 
 files. I have been using the command

   lfs setstripe

 to change the stripe count and block size. I can see that this works, when I 
 do lfs getstripe, I see the output file is striped, but I'm getting very 
 little I/O performance when I create the striped file.

 When working from hdf5 and mpi, I have seen a number of references about 
 tuning parameters, I haven't dug into this yet. I first want to make sure 
 lustre has the high output performance at a basic level. I tried to write a C 
 program uses simple POSIX calls (open and looping over writes) but I don't 
 see much increase in performance (I've tried 8 and 19 OST's, 1MB and 4MB 
 chunks, I write a 6GB file).

 Does anyone know if this should work? What is the simplest C program I could 
 write to see an increase in output performance after I stripe? Do I need 
 separate processes/threads with separate file handles? I am on linux red hat 
 5. I'm not sure what version of lustre this is. I have skimmed through a 450 
 page pdf of lustre documentation, I saw references to destructive testing one 
 does in the beginning, but I'm not sure what I can do now. I think this is 
 the first work we've done to get high performance when writing a single file, 
 so I'm worried there is something buried in the lustre configuration that 
 needs to be changed. I can run /usr/sbin/lcntl, maybe there are certain 
 parameters I should check?

 best,

 David Schneider
 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

--
I/O Doctors, LLC
507-766-0378
bau...@iodoctors.com

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] problem getting high performance output to single file

2015-05-19 Thread Schneider, David A.
I am trying to get good performance with parallel writing to one file through 
MPI. Our cluster has high performance when I write to separate files, but when 
I use one file - I see very little performance increase.

As I understand, our cluster defaults to use one OST per file. There are many 
OST's though, which is how we get good performance when writing to multiple 
files. I have been using the command

 lfs setstripe 

to change the stripe count and block size. I can see that this works, when I do 
lfs getstripe, I see the output file is striped, but I'm getting very little 
I/O performance when I create the striped file. 

When working from hdf5 and mpi, I have seen a number of references about tuning 
parameters, I haven't dug into this yet. I first want to make sure lustre has 
the high output performance at a basic level. I tried to write a C program uses 
simple POSIX calls (open and looping over writes) but I don't see much increase 
in performance (I've tried 8 and 19 OST's, 1MB and 4MB chunks, I write a 6GB 
file). 

Does anyone know if this should work? What is the simplest C program I could 
write to see an increase in output performance after I stripe? Do I need 
separate processes/threads with separate file handles? I am on linux red hat 5. 
I'm not sure what version of lustre this is. I have skimmed through a 450 page 
pdf of lustre documentation, I saw references to destructive testing one does 
in the beginning, but I'm not sure what I can do now. I think this is the first 
work we've done to get high performance when writing a single file, so I'm 
worried there is something buried in the lustre configuration that needs to be 
changed. I can run /usr/sbin/lcntl, maybe there are certain parameters I should 
check? 

best,

David Schneider
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] problem getting high performance output to single file

2015-05-19 Thread Schneider, David A.
Thanks, for the client, where I am running from, I have 

$ cat /proc/fs/lustre/version
lustre: 2.1.6
kernel: patchless_client
build:  jenkins--PRISTINE-2.6.18-348.4.1.el5


best,

David Schneider

From: Patrick Farrell [p...@cray.com]
Sent: Tuesday, May 19, 2015 9:03 AM
To: Schneider, David A.; John Bauer; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem getting high performance output to single 
file

For the clients, cat /proc/fs/lustre/version

For the servers, it¹s the same, but presumably you don¹t have access.

On 5/19/15, 11:01 AM, Schneider, David A. david...@slac.stanford.edu
wrote:

Hi,

My first test was just to do the for loop where I allocate a 4MB buffer,
initialize it, and delete it. That program ran at about 6GB/sec. Once I
write to a file, I drop down to 370mb/sec. Our top performance for I/O to
one file has been about 400 mb/sec.

For this question: Which versions are you using in servers and clients?
I don't know what command to determine this, I suspect it is older since
we are on red hat 5. I will ask.

best,

David Schneider

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf
of John Bauer [bau...@iodoctors.com]
Sent: Tuesday, May 19, 2015 8:52 AM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem getting high performance output to
single file

David

You note that you write a 6GB file.  I suspect that your Linux systems
have significantly more memory than 6GB meaning your file will end being
cached in the system buffers.  It wont matter how many OSTs you use as
you probably are not measuring the speed to the OST's, but rather, you
are measuring the memory copy speed.
What transfer rate are you seeing?

John

On 5/19/2015 10:40 AM, Schneider, David A. wrote:
 I am trying to get good performance with parallel writing to one file
through MPI. Our cluster has high performance when I write to separate
files, but when I use one file - I see very little performance increase.

 As I understand, our cluster defaults to use one OST per file. There
are many OST's though, which is how we get good performance when writing
to multiple files. I have been using the command

   lfs setstripe

 to change the stripe count and block size. I can see that this works,
when I do lfs getstripe, I see the output file is striped, but I'm
getting very little I/O performance when I create the striped file.

 When working from hdf5 and mpi, I have seen a number of references
about tuning parameters, I haven't dug into this yet. I first want to
make sure lustre has the high output performance at a basic level. I
tried to write a C program uses simple POSIX calls (open and looping
over writes) but I don't see much increase in performance (I've tried 8
and 19 OST's, 1MB and 4MB chunks, I write a 6GB file).

 Does anyone know if this should work? What is the simplest C program I
could write to see an increase in output performance after I stripe? Do
I need separate processes/threads with separate file handles? I am on
linux red hat 5. I'm not sure what version of lustre this is. I have
skimmed through a 450 page pdf of lustre documentation, I saw references
to destructive testing one does in the beginning, but I'm not sure what
I can do now. I think this is the first work we've done to get high
performance when writing a single file, so I'm worried there is
something buried in the lustre configuration that needs to be changed. I
can run /usr/sbin/lcntl, maybe there are certain parameters I should
check?

 best,

 David Schneider
 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

--
I/O Doctors, LLC
507-766-0378
bau...@iodoctors.com

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] problem getting high performance output to single file

2015-05-19 Thread Schneider, David A.
Hi Jeff,

I know we have infini-band, however when I ran lctl, what I see (maybe I should 
not put our ip addresses on the internet, so I'll xxx them out) is

.xx.xx.xx@tcp2
.xx.xx.xx@tcp

unfortunately, I'm not sure how to look at the interface for these types, maybe 
they are in turn connected to infiniband.

I don't know much about the OSTs. I know there is a raid structure that allows 
for the 400MB/sec on each one. In one of my tests, I believe I wrote 44GB in 
100 separate files in under 10 seconds, so the system can support 4.4GB/sec.

best,

David Schneider

From: Jeff Johnson [jeff.john...@aeoncomputing.com]
Sent: Tuesday, May 19, 2015 9:11 AM
To: Schneider, David A.; Patrick Farrell; John Bauer; 
lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem getting high performance output to single 
file

David,

What interconnect are you using for Lustre? ( IB/o2ib [fdr,qdr,ddr],
Ethernet/tcp [40GbE,10Gbe,1GbE] ). You can run 'lctl list_nids' and see
what protocol lnet is binding to, then look at that interface for the
specific type.

Also, do you know anything about the server side of your Lustre FS? What
make/model of block devices are used in OSTs?

--Jeff


On 5/19/15 9:05 AM, Schneider, David A. wrote:
 Thanks, for the client, where I am running from, I have

 $ cat /proc/fs/lustre/version
 lustre: 2.1.6
 kernel: patchless_client
 build:  jenkins--PRISTINE-2.6.18-348.4.1.el5


 best,

 David Schneider
 
 From: Patrick Farrell [p...@cray.com]
 Sent: Tuesday, May 19, 2015 9:03 AM
 To: Schneider, David A.; John Bauer; lustre-discuss@lists.lustre.org
 Subject: Re: [lustre-discuss] problem getting high performance output to 
 single file

 For the clients, cat /proc/fs/lustre/version

 For the servers, it¹s the same, but presumably you don¹t have access.

 On 5/19/15, 11:01 AM, Schneider, David A. david...@slac.stanford.edu
 wrote:

 Hi,

 My first test was just to do the for loop where I allocate a 4MB buffer,
 initialize it, and delete it. That program ran at about 6GB/sec. Once I
 write to a file, I drop down to 370mb/sec. Our top performance for I/O to
 one file has been about 400 mb/sec.

 For this question: Which versions are you using in servers and clients?
 I don't know what command to determine this, I suspect it is older since
 we are on red hat 5. I will ask.

 best,

 David Schneider
 
 From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf
 of John Bauer [bau...@iodoctors.com]
 Sent: Tuesday, May 19, 2015 8:52 AM
 To: lustre-discuss@lists.lustre.org
 Subject: Re: [lustre-discuss] problem getting high performance output to
 single file

 David

 You note that you write a 6GB file.  I suspect that your Linux systems
 have significantly more memory than 6GB meaning your file will end being
 cached in the system buffers.  It wont matter how many OSTs you use as
 you probably are not measuring the speed to the OST's, but rather, you
 are measuring the memory copy speed.
 What transfer rate are you seeing?

 John

 On 5/19/2015 10:40 AM, Schneider, David A. wrote:
 I am trying to get good performance with parallel writing to one file
 through MPI. Our cluster has high performance when I write to separate
 files, but when I use one file - I see very little performance increase.

 As I understand, our cluster defaults to use one OST per file. There
 are many OST's though, which is how we get good performance when writing
 to multiple files. I have been using the command

lfs setstripe

 to change the stripe count and block size. I can see that this works,
 when I do lfs getstripe, I see the output file is striped, but I'm
 getting very little I/O performance when I create the striped file.

 When working from hdf5 and mpi, I have seen a number of references
 about tuning parameters, I haven't dug into this yet. I first want to
 make sure lustre has the high output performance at a basic level. I
 tried to write a C program uses simple POSIX calls (open and looping
 over writes) but I don't see much increase in performance (I've tried 8
 and 19 OST's, 1MB and 4MB chunks, I write a 6GB file).

 Does anyone know if this should work? What is the simplest C program I
 could write to see an increase in output performance after I stripe? Do
 I need separate processes/threads with separate file handles? I am on
 linux red hat 5. I'm not sure what version of lustre this is. I have
 skimmed through a 450 page pdf of lustre documentation, I saw references
 to destructive testing one does in the beginning, but I'm not sure what
 I can do now. I think this is the first work we've done to get high
 performance when writing a single file, so I'm worried there is
 something buried in the lustre configuration that needs to be changed. I
 can run /usr/sbin/lcntl, maybe there are certain parameters I should
 check?

 best,

 David Schneider

Re: [lustre-discuss] problem getting high performance output to single file

2015-05-19 Thread Schneider, David A.
Thanks for the suggestion! When I had each rank run on a separate compute 
node/host, I saw parallel performance (4 seconds for the 6GB of writing). When 
I ran the MPI job on one host (the hosts have 12 cores, by default we pack 
ranks onto as few hosts as possible), things happened serially, each rank 
finished about 2 seconds after a different rank. I'm told that the hosts can 
handle a lot of I/O, but it seems there a some issues with getting that to work 
well. I believe we get good performance with different ranks on one host 
reading from different files. I'll look into tuning the MPI/Hdf5 parameter now, 
with an eye for designing my application to write from different hosts. My 
initial tests with MPI showed degraded performance when I used different hosts 
for the writing, but maybe there are some parameters that will help. I can try 
the openmpi forum at that point. 

best,

David Schneider

From: Mohr Jr, Richard Frank (Rick Mohr) [rm...@utk.edu]
Sent: Tuesday, May 19, 2015 9:15 AM
To: Schneider, David A.
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem getting high performance output to single 
file

 On May 19, 2015, at 11:40 AM, Schneider, David A. 
 david...@slac.stanford.edu wrote:

 When working from hdf5 and mpi, I have seen a number of references about 
 tuning parameters, I haven't dug into this yet. I first want to make sure 
 lustre has the high output performance at a basic level. I tried to write a C 
 program uses simple POSIX calls (open and looping over writes) but I don't 
 see much increase in performance (I've tried 8 and 19 OST's, 1MB and 4MB 
 chunks, I write a 6GB file).

 Does anyone know if this should work? What is the simplest C program I could 
 write to see an increase in output performance after I stripe? Do I need 
 separate processes/threads with separate file handles?

If you are looking for a simple shared-file test, you could try something like 
this:

1) Create a file with a stripe size of 1 GB and a stripe count of 6.

2) Write an MPI program where each process writes 1 GB of sequential data.  
Each process should first seek to (mpi_rank)*(1GB) and then write 1 GB.  This 
will ensure that all processes are writing to non-overlapping parts of the file.

3) Start the program running on 6 nodes (1 process per node).

In a scenario like that, you should effectively be getting file-per-process 
speeds even though you are writing to a shared file because each process is 
writing to a different OST.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org