RE: [Pvfs2-users] how to troubleshoot performance problems

Moye,Roger V Thu, 02 Jan 2014 08:55:24 -0800

Becky,

I've been looking into this problem and your suggestion.   The users apps that 
are causing problems are reading 64KB per read all the way down to 1 Byte per 
read and they spending much of their run time reading input.    Many are doing 
4KB to 8KB per read.    I have read the PVFS FAQ about tuning individual 
directories.


I confess I do not know what my options are regarding tuning of the directories 
to improve performance.  What is the purpose of the four different types of 
distributions?   What stripe size should I use?    I realize that PVFS is not 
suited for reads this small but *any* performance improvement that I can get 
for these particular problem jobs would be helpful.

Thanks!
-Roger
-----------------------------------------------------------
Roger V. Moye
Systems Analyst III
XSEDE Campus Champion
University of Texas - MD Anderson Cancer Center
Division of Quantitative Sciences
Pickens Academic Tower - FCT4.6109
Houston, Texas
(713) 792-2134
-----------------------------------------------------------

From: Becky Ligon [mailto:[email protected]]
Sent: Monday, December 16, 2013 4:10 PM
To: Moye,Roger V
Cc: Kyle Schochenmaier; [email protected]
Subject: Re: [Pvfs2-users] how to troubleshoot performance problems

Roger:

In general, if your filesystem has x-number of servers and you have used the 
default 64K stripe size, then you would want to be reading or writing in *at 
least* (64K * number of servers) bytes at a time (but preferably more) in order 
to take advantage of the parallelism.  You also want to minimize the *number* 
of files that you create/delete in one job, since these operations require 
additional metadata accesses.

These are guide lines not rules.  Looking at what kind of reads/writes and file 
accesses are being used is the best way to tune your filesystem for a 
particular purpose.  Keep in mind that directories and files can have different 
attributes than those specified in the config file as the defaults.  So, you 
can tune files or files in a directory to use a different number of servers, a 
different stripe size, etc.
Hope this little bit of information is helpful.

Becky

On Mon, Dec 16, 2013 at 3:33 PM, Moye,Roger V 
<[email protected]<mailto:[email protected]>> wrote:
Becky,

You nailed it:

read(7, "4", 1)                         = 1
read(7, "|", 1)                         = 1
read(7, "4", 1)                         = 1
read(7, "2", 1)                         = 1
read(7, "0", 1)                         = 1
read(7, "6", 1)                         = 1
read(7, "5", 1)                         = 1
read(7, "1", 1)                         = 1
read(7, "1", 1)                         = 1
read(7, "2", 1)                         = 1
read(7, "|", 1)                         = 1
read(7, "4", 1)                         = 1
read(7, "2", 1)                         = 1
read(7, "0", 1)                         = 1
read(7, "6", 1)                         = 1
read(7, "5", 1)                         = 1
read(7, "1", 1)                         = 1
read(7, "3", 1)                         = 1

He's doing this from multiple processes on multiple nodes.

Question to you:  Is there a rule of thumb to follow for 'how small is too 
small'?

-Roger


-----------------------------------------------------------
Roger V. Moye
Systems Analyst III
XSEDE Campus Champion
University of Texas - MD Anderson Cancer Center
Division of Quantitative Sciences
Pickens Academic Tower - FCT4.6109
Houston, Texas
(713) 792-2134<tel:%28713%29%20792-2134>
-----------------------------------------------------------

From: Becky Ligon [mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, December 16, 2013 1:25 PM
To: Kyle Schochenmaier
Cc: Moye,Roger V; 
[email protected]<mailto:[email protected]>
Subject: Re: [Pvfs2-users] how to troubleshoot performance problems

Roger:

I have also seen some codes that read/write one byte at a time, which is not 
appropriate for a parallel filesystem.  Try this:  While the user's process is 
running, attach to it with strace and see what kind of read/writes are being 
issued.
Becky

On Mon, Dec 16, 2013 at 1:57 PM, Becky Ligon 
<[email protected]<mailto:[email protected]>> wrote:
Roger:
Are all of your filesystem servers ALSO metadata servers?
Becky

On Mon, Dec 16, 2013 at 1:18 PM, Kyle Schochenmaier 
<[email protected]<mailto:[email protected]>> wrote:
There are some tuning params that you can look into here, by default there is a 
round robin loading on the servers and that is done in chunks of FlowBufferSize 
(iirc?), you can set this in your config file but by default the size is quite 
small (64k) and I've pushed it up over 1-2MB and seen drastic improvements in 
bandwidth for larger requests; but if you're doing tons of small requests this 
obviously wont help.

Can you attach your config file so we can see how things are setup?



Kyle Schochenmaier

On Mon, Dec 16, 2013 at 11:57 AM, Moye,Roger V 
<[email protected]<mailto:[email protected]>> wrote:

Over the past weekend one of my users reported that his compute jobs running on 
a server with local disks usually takes about 5 hours.  However, running the 
same jobs on our small Linux cluster using a PVFS filesystem exceeded 24 hours.

Here is the environment we are using:

1.        RHEL 6.4 on PVFS servers and clients.

2.       Computations are performed on any of 16 Linux clients, all running 
RHEL 6.4.

3.       We are running Orangefs-2.8.7.

4.       We have 4 PVFS servers, each with an XFS filesystem on a ~35TB RAID 6. 
 Total PVFS filesystem is 146TB.

5.       All components are connected via a 10GigE  network.

I started looking for the source of the problem.   For the user(s) showing this 
poor performance, I found that pvfs-client is using about 65% of the CPU while 
the compute jobs themselves are using only 4% each.    Thus the compute nodes 
are very lightly loaded and the compute jobs are hardly doing anything.    The 
pvfs2-server process on each PVFS server is using about 140% CPU.   No time is 
being spent in the wait state (so I assume the speed of the disks are not an 
issue).    While the system was exhibiting poor performance I tried to 
read/write some 10GB  files myself and found the performance to be normal for 
this system (around 450MB/s).   I used 'iperf' to measure the network bandwidth 
between the affected nodes and the PVFS serves and found it normal at 9.38Gb/s. 
 The directories that the users are reading/writing only have a few files in 
each.

Iostat shows that the disk system is being constantly read by something as 
shown by 'iostat -d 2' on the PVFS servers:
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb              19.00      4864.00         0.00       9728          0
dm-0              0.00         0.00         0.00          0          0
dm-1              0.00         0.00         0.00          0          0

iostat has looked like over the last 48 hours (since Saturday).

I can not find any documentation on how to get stats directly from pvfs2 so I 
tried this command:
pvfs2-statfs -m /pvfs2-mnt

I received these results:
I/O server statistics:
---------------------------------------

server: tcp://dqspvfs01:3334
        RAM bytes total  : 33619419136
        RAM bytes free   : 284790784
        uptime (seconds) : 14499577
        load averages    : 0 0 0
        handles available: 2305843009213589192
        handles total    : 2305843009213693950
        bytes available  : 31456490479616
        bytes total      : 40000112558080
        mode: serving both metadata and I/O data

server: tcp://dqspvfs02:3334
        RAM bytes total  : 33619419136
        RAM bytes free   : 217452544
        uptime (seconds) : 14499840
        load averages    : 0 0 0
        handles available: 2305843009213589104
        handles total    : 2305843009213693950
        bytes available  : 31456971476992
        bytes total      : 40000112558080
        mode: serving both metadata and I/O data

server: tcp://dqspvfs03:3334
        RAM bytes total  : 33619419136
        RAM bytes free   : 428965888
        uptime (seconds) : 5437269
        load averages    : 320 192 0
        handles available: 2305843009213588929
        handles total    : 2305843009213693950
        bytes available  : 31439132123136
        bytes total      : 40000112558080
        mode: serving both metadata and I/O data

server: tcp://dqspvfs04:3334
        RAM bytes total  : 33619419136
        RAM bytes free   : 223281152
        uptime (seconds) : 10089825
        load averages    : 1664 3072 0
        handles available: 2305843009213588989
        handles total    : 2305843009213693950
        bytes available  : 31452933193728
        bytes total      : 40000112558080
        mode: serving both metadata and I/O data

Notice that the 'load averages' are 0 for servers #1 and #2 but not #3 and #4.  
 Earlier this morning only #4 showed a non-zero load average.  The other three 
were 0.  What does this number mean?

My two theories about the source of the problem are:

1.        Someone is doing 'a lot' of tiny reads.

2.       Or, based on the load averages the PVFS filesystem is somehow not 
balanced.   All of the load is on a single server.

How can I prove either of these?  Or what other types of diagnostics can I do?

Thank you!
-Roger

-----------------------------------------------
Roger V. Moye
Systems Analyst III
XSEDE Campus Champion
University of Texas - MD Anderson Cancer Center
Division of Quantitative Sciences
Pickens Academic Tower - FCT4.6109
Houston, Texas
(713) 792-2134<tel:%28713%29%20792-2134>
-----------------------------------------------------------


_______________________________________________
Pvfs2-users mailing list
[email protected]<mailto:[email protected]>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


_______________________________________________
Pvfs2-users mailing list
[email protected]<mailto:[email protected]>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina



--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina



--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

RE: [Pvfs2-users] how to troubleshoot performance problems

Reply via email to