Becky,

You nailed it:

read(7, "4", 1)                         = 1
read(7, "|", 1)                         = 1
read(7, "4", 1)                         = 1
read(7, "2", 1)                         = 1
read(7, "0", 1)                         = 1
read(7, "6", 1)                         = 1
read(7, "5", 1)                         = 1
read(7, "1", 1)                         = 1
read(7, "1", 1)                         = 1
read(7, "2", 1)                         = 1
read(7, "|", 1)                         = 1
read(7, "4", 1)                         = 1
read(7, "2", 1)                         = 1
read(7, "0", 1)                         = 1
read(7, "6", 1)                         = 1
read(7, "5", 1)                         = 1
read(7, "1", 1)                         = 1
read(7, "3", 1)                         = 1

He's doing this from multiple processes on multiple nodes.

Question to you:  Is there a rule of thumb to follow for 'how small is too 
small'?

-Roger


-----------------------------------------------------------
Roger V. Moye
Systems Analyst III
XSEDE Campus Champion
University of Texas - MD Anderson Cancer Center
Division of Quantitative Sciences
Pickens Academic Tower - FCT4.6109
Houston, Texas
(713) 792-2134
-----------------------------------------------------------

From: Becky Ligon [mailto:[email protected]]
Sent: Monday, December 16, 2013 1:25 PM
To: Kyle Schochenmaier
Cc: Moye,Roger V; [email protected]
Subject: Re: [Pvfs2-users] how to troubleshoot performance problems

Roger:

I have also seen some codes that read/write one byte at a time, which is not 
appropriate for a parallel filesystem.  Try this:  While the user's process is 
running, attach to it with strace and see what kind of read/writes are being 
issued.
Becky

On Mon, Dec 16, 2013 at 1:57 PM, Becky Ligon 
<[email protected]<mailto:[email protected]>> wrote:
Roger:
Are all of your filesystem servers ALSO metadata servers?
Becky

On Mon, Dec 16, 2013 at 1:18 PM, Kyle Schochenmaier 
<[email protected]<mailto:[email protected]>> wrote:
There are some tuning params that you can look into here, by default there is a 
round robin loading on the servers and that is done in chunks of FlowBufferSize 
(iirc?), you can set this in your config file but by default the size is quite 
small (64k) and I've pushed it up over 1-2MB and seen drastic improvements in 
bandwidth for larger requests; but if you're doing tons of small requests this 
obviously wont help.

Can you attach your config file so we can see how things are setup?



Kyle Schochenmaier

On Mon, Dec 16, 2013 at 11:57 AM, Moye,Roger V 
<[email protected]<mailto:[email protected]>> wrote:

Over the past weekend one of my users reported that his compute jobs running on 
a server with local disks usually takes about 5 hours.  However, running the 
same jobs on our small Linux cluster using a PVFS filesystem exceeded 24 hours.

Here is the environment we are using:

1.        RHEL 6.4 on PVFS servers and clients.

2.       Computations are performed on any of 16 Linux clients, all running 
RHEL 6.4.

3.       We are running Orangefs-2.8.7.

4.       We have 4 PVFS servers, each with an XFS filesystem on a ~35TB RAID 6. 
 Total PVFS filesystem is 146TB.

5.       All components are connected via a 10GigE  network.

I started looking for the source of the problem.   For the user(s) showing this 
poor performance, I found that pvfs-client is using about 65% of the CPU while 
the compute jobs themselves are using only 4% each.    Thus the compute nodes 
are very lightly loaded and the compute jobs are hardly doing anything.    The 
pvfs2-server process on each PVFS server is using about 140% CPU.   No time is 
being spent in the wait state (so I assume the speed of the disks are not an 
issue).    While the system was exhibiting poor performance I tried to 
read/write some 10GB  files myself and found the performance to be normal for 
this system (around 450MB/s).   I used 'iperf' to measure the network bandwidth 
between the affected nodes and the PVFS serves and found it normal at 9.38Gb/s. 
 The directories that the users are reading/writing only have a few files in 
each.

Iostat shows that the disk system is being constantly read by something as 
shown by 'iostat -d 2' on the PVFS servers:
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb              19.00      4864.00         0.00       9728          0
dm-0              0.00         0.00         0.00          0          0
dm-1              0.00         0.00         0.00          0          0

iostat has looked like over the last 48 hours (since Saturday).

I can not find any documentation on how to get stats directly from pvfs2 so I 
tried this command:
pvfs2-statfs -m /pvfs2-mnt

I received these results:
I/O server statistics:
---------------------------------------

server: tcp://dqspvfs01:3334
        RAM bytes total  : 33619419136
        RAM bytes free   : 284790784
        uptime (seconds) : 14499577
        load averages    : 0 0 0
        handles available: 2305843009213589192
        handles total    : 2305843009213693950
        bytes available  : 31456490479616
        bytes total      : 40000112558080
        mode: serving both metadata and I/O data

server: tcp://dqspvfs02:3334
        RAM bytes total  : 33619419136
        RAM bytes free   : 217452544
        uptime (seconds) : 14499840
        load averages    : 0 0 0
        handles available: 2305843009213589104
        handles total    : 2305843009213693950
        bytes available  : 31456971476992
        bytes total      : 40000112558080
        mode: serving both metadata and I/O data

server: tcp://dqspvfs03:3334
        RAM bytes total  : 33619419136
        RAM bytes free   : 428965888
        uptime (seconds) : 5437269
        load averages    : 320 192 0
        handles available: 2305843009213588929
        handles total    : 2305843009213693950
        bytes available  : 31439132123136
        bytes total      : 40000112558080
        mode: serving both metadata and I/O data

server: tcp://dqspvfs04:3334
        RAM bytes total  : 33619419136
        RAM bytes free   : 223281152
        uptime (seconds) : 10089825
        load averages    : 1664 3072 0
        handles available: 2305843009213588989
        handles total    : 2305843009213693950
        bytes available  : 31452933193728
        bytes total      : 40000112558080
        mode: serving both metadata and I/O data

Notice that the 'load averages' are 0 for servers #1 and #2 but not #3 and #4.  
 Earlier this morning only #4 showed a non-zero load average.  The other three 
were 0.  What does this number mean?

My two theories about the source of the problem are:

1.        Someone is doing 'a lot' of tiny reads.

2.       Or, based on the load averages the PVFS filesystem is somehow not 
balanced.   All of the load is on a single server.

How can I prove either of these?  Or what other types of diagnostics can I do?

Thank you!
-Roger

-----------------------------------------------
Roger V. Moye
Systems Analyst III
XSEDE Campus Champion
University of Texas - MD Anderson Cancer Center
Division of Quantitative Sciences
Pickens Academic Tower - FCT4.6109
Houston, Texas
(713) 792-2134<tel:%28713%29%20792-2134>
-----------------------------------------------------------


_______________________________________________
Pvfs2-users mailing list
[email protected]<mailto:[email protected]>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


_______________________________________________
Pvfs2-users mailing list
[email protected]<mailto:[email protected]>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina



--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to