Roger:

I have also seen some codes that read/write one byte at a time, which is
not appropriate for a parallel filesystem.  Try this:  While the user's
process is running, attach to it with strace and see what kind of
read/writes are being issued.

Becky


On Mon, Dec 16, 2013 at 1:57 PM, Becky Ligon <[email protected]> wrote:

> Roger:
>
> Are all of your filesystem servers ALSO metadata servers?
>
> Becky
>
>
> On Mon, Dec 16, 2013 at 1:18 PM, Kyle Schochenmaier <[email protected]>wrote:
>
>> There are some tuning params that you can look into here, by default
>> there is a round robin loading on the servers and that is done in chunks of
>> FlowBufferSize (iirc?), you can set this in your config file but by default
>> the size is quite small (64k) and I've pushed it up over 1-2MB and seen
>> drastic improvements in bandwidth for larger requests; but if you're doing
>> tons of small requests this obviously wont help.
>>
>> Can you attach your config file so we can see how things are setup?
>>
>>
>>
>> Kyle Schochenmaier
>>
>>
>> On Mon, Dec 16, 2013 at 11:57 AM, Moye,Roger V <[email protected]>wrote:
>>
>>>
>>>
>>> Over the past weekend one of my users reported that his compute jobs
>>> running on a server with local disks usually takes about 5 hours.  However,
>>> running the same jobs on our small Linux cluster using a PVFS filesystem
>>> exceeded 24 hours.
>>>
>>>
>>>
>>> Here is the environment we are using:
>>>
>>> 1.        RHEL 6.4 on PVFS servers and clients.
>>>
>>> 2.       Computations are performed on any of 16 Linux clients, all
>>> running RHEL 6.4.
>>>
>>> 3.       We are running Orangefs-2.8.7.
>>>
>>> 4.       We have 4 PVFS servers, each with an XFS filesystem on a ~35TB
>>> RAID 6.  Total PVFS filesystem is 146TB.
>>>
>>> 5.       All components are connected via a 10GigE  network.
>>>
>>>
>>>
>>> I started looking for the source of the problem.   For the user(s)
>>> showing this poor performance, I found that pvfs-client is using about 65%
>>> of the CPU while the compute jobs themselves are using only 4% each.
>>> Thus the compute nodes are very lightly loaded and the compute jobs are
>>> hardly doing anything.    The pvfs2-server process on each PVFS server is
>>> using about 140% CPU.   No time is being spent in the wait state (so I
>>> assume the speed of the disks are not an issue).    While the system was
>>> exhibiting poor performance I tried to read/write some 10GB  files myself
>>> and found the performance to be normal for this system (around 450MB/s).
>>>   I used ‘iperf’ to measure the network bandwidth between the affected
>>> nodes and the PVFS serves and found it normal at 9.38Gb/s.  The directories
>>> that the users are reading/writing only have a few files in each.
>>>
>>>
>>>
>>> Iostat shows that the disk system is being constantly read by something
>>> as shown by ‘iostat –d 2’ on the PVFS servers:
>>>
>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>>>
>>> sda               0.00         0.00         0.00          0          0
>>>
>>> sdb              19.00      4864.00         0.00       9728          0
>>>
>>> dm-0              0.00         0.00         0.00          0          0
>>>
>>> dm-1              0.00         0.00         0.00          0          0
>>>
>>>
>>>
>>> iostat has looked like over the last 48 hours (since Saturday).
>>>
>>>
>>>
>>> I can not find any documentation on how to get stats directly from pvfs2
>>> so I tried this command:
>>>
>>> pvfs2-statfs –m /pvfs2-mnt
>>>
>>>
>>>
>>> I received these results:
>>>
>>> I/O server statistics:
>>>
>>> ---------------------------------------
>>>
>>>
>>>
>>> server: tcp://dqspvfs01:3334
>>>
>>>         RAM bytes total  : 33619419136
>>>
>>>         RAM bytes free   : 284790784
>>>
>>>         uptime (seconds) : 14499577
>>>
>>>         load averages    : 0 0 0
>>>
>>>         handles available: 2305843009213589192
>>>
>>>         handles total    : 2305843009213693950
>>>
>>>         bytes available  : 31456490479616
>>>
>>>         bytes total      : 40000112558080
>>>
>>>         mode: serving both metadata and I/O data
>>>
>>>
>>>
>>> server: tcp://dqspvfs02:3334
>>>
>>>         RAM bytes total  : 33619419136
>>>
>>>         RAM bytes free   : 217452544
>>>
>>>         uptime (seconds) : 14499840
>>>
>>>         load averages    : 0 0 0
>>>
>>>         handles available: 2305843009213589104
>>>
>>>         handles total    : 2305843009213693950
>>>
>>>         bytes available  : 31456971476992
>>>
>>>         bytes total      : 40000112558080
>>>
>>>         mode: serving both metadata and I/O data
>>>
>>>
>>>
>>> server: tcp://dqspvfs03:3334
>>>
>>>         RAM bytes total  : 33619419136
>>>
>>>         RAM bytes free   : 428965888
>>>
>>>         uptime (seconds) : 5437269
>>>
>>>         load averages    : 320 192 0
>>>
>>>         handles available: 2305843009213588929
>>>
>>>         handles total    : 2305843009213693950
>>>
>>>         bytes available  : 31439132123136
>>>
>>>         bytes total      : 40000112558080
>>>
>>>         mode: serving both metadata and I/O data
>>>
>>>
>>>
>>> server: tcp://dqspvfs04:3334
>>>
>>>         RAM bytes total  : 33619419136
>>>
>>>         RAM bytes free   : 223281152
>>>
>>>         uptime (seconds) : 10089825
>>>
>>>         load averages    : 1664 3072 0
>>>
>>>         handles available: 2305843009213588989
>>>
>>>         handles total    : 2305843009213693950
>>>
>>>         bytes available  : 31452933193728
>>>
>>>         bytes total      : 40000112558080
>>>
>>>         mode: serving both metadata and I/O data
>>>
>>>
>>>
>>> Notice that the ‘load averages’ are 0 for servers #1 and #2 but not #3
>>> and #4.   Earlier this morning only #4 showed a non-zero load average.  The
>>> other three were 0.  What does this number mean?
>>>
>>>
>>>
>>> My two theories about the source of the problem are:
>>>
>>> 1.        Someone is doing ‘a lot’ of tiny reads.
>>>
>>> 2.       Or, based on the load averages the PVFS filesystem is somehow
>>> not balanced.   All of the load is on a single server.
>>>
>>>
>>>
>>> How can I prove either of these?  Or what other types of diagnostics can
>>> I do?
>>>
>>>
>>>
>>> Thank you!
>>>
>>> -Roger
>>>
>>>
>>>
>>> -----------------------------------------------
>>>
>>> Roger V. Moye
>>>
>>> Systems Analyst III
>>>
>>> XSEDE Campus Champion
>>>
>>> University of Texas - MD Anderson Cancer Center
>>>
>>> Division of Quantitative Sciences
>>>
>>> Pickens Academic Tower - FCT4.6109
>>>
>>> Houston, Texas
>>>
>>> (713) 792-2134
>>>
>>> -----------------------------------------------------------
>>>
>>>
>>>
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> [email protected]
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>
>>>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to