Re: [Pvfs2-users] Major Performance Issues with my pvfs2 install

Jim Kusznir Mon, 03 Oct 2011 11:46:02 -0700

Another question: we've expanded our pvfs2 disk storage (uniformly)
twice now.  Do I need to run some form of "defrag" or other optimizer?


--Jim

On Mon, Oct 3, 2011 at 11:38 AM, Jim Kusznir <[email protected]> wrote:
> All speeds were in Mpbs, the default from iperf.
>
> Our files are multi-GB in size, so they do involve all three servers.
> It applies to all files on the system.
>
> Can I change the stripe size "on the go"?  I already have about 50TB
> of data in the system, and have no place big enough to back it up to
> rebuild the pvfs2 array and restore....
>
> --Jim
>
> On Fri, Sep 30, 2011 at 1:46 PM, Michael Moore <[email protected]> wrote:
>> See below for specific items. Can you run iostat on the servers while
>> writing a file that experiences the slow performance? If you could watch
>> iostat -dmx <device of pvfs storage space> and provide any salient snippets
>> (high utilization, low utilization, odd looking output, etc) that could
>> help.
>>
>> On Thu, Sep 29, 2011 at 11:42 AM, Jim Kusznir <[email protected]> wrote:
>>>
>>> 1) iperf (defaults) reported 873, 884, and 929 for connections form
>>> the three servers to the head node (a pvfs2 client)
>>
>> Just to be clear, those are Mbps, right?
>>
>>>
>>> 2) no errors showed up on any of the ports on the managed switch.
>>
>> Hmm, if those are Mbps not seeming to be a network layer.
>>>
>>> 3) I'm not sure what this will do, as the pvfs2 volume is comprised of
>>> 3 servers, so mounting it on a server still uses the network for the
>>> other two.  I also don't understand "single file per datafile"
>>> statement.  In any case, I do not have the kernel module compiled on
>>> my servers; they ONLY have the pvfs2 server software installed.
>>>
>>
>> A logical file (e.g. foo.out) in a PVFS2 file system is made up of one or
>> more datafiles. Based on your config I would assume most are made up of 3
>> datafiles with the default stripe size of 64k.
>>
>> You can run pvfs2-viewdist -f <file name> to see what the distribution and
>> what servers a given file lives on. To see cumulative throughput from
>> multiple PVFS2 servers the number of datafiles must be greater than one.
>> Check a couple of the problematic files to see what their distribution is.
>>
>> For a quick test to see if the distribution is impacting performance set the
>> following extended attribute on a directory and then check the performance
>> of writing a file into it:
>> setfattr -n user.pvfs2.num_dfiles -v "3" <some pvfs2 dir>
>>
>> Also, you can test if a larger strip_size would help doing a something
>> similar to (for 256k strip):
>> setfattr -n user.pvfs2.dist_name -v simple_stripe <some pvfs2 dir>
>> setfattr -n user.pvfs2.dist_params -v strip_size:262144 <some pvfs2 dir>
>>
>>>
>>> 4) I'm not sure; I used largely defaults.  I've attached my config below.
>>>
>>> 5) the network bandwidth is on one of the servers (the one I checked;
>>> I believe them to all be similar).
>>>
>>> 6) Not sure.  I have created an XFS filesystem using LVM to combine
>>> the two hardware raid6 volumes and mounted that at /mnt/pvfs2 on the
>>> servers.  I then let pvfs do its magic.  Config files below.
>>>
>>> 7(from second e-mail): Config file attached.
>>>
>>> ----------
>>> /etc/pvfs2-fs.conf:
>>> ----------
>>> [root@pvfs2-io-0-2 mnt]# cat /etc/pvfs2-fs.conf
>>> <Defaults>
>>>        UnexpectedRequests 50
>>>        EventLogging none
>>>        LogStamp datetime
>>>        BMIModules bmi_tcp
>>>        FlowModules flowproto_multiqueue
>>>        PerfUpdateInterval 1000
>>>        ServerJobBMITimeoutSecs 30
>>>        ServerJobFlowTimeoutSecs 30
>>>        ClientJobBMITimeoutSecs 300
>>>        ClientJobFlowTimeoutSecs 300
>>>        ClientRetryLimit 5
>>>        ClientRetryDelayMilliSecs 2000
>>>        StorageSpace /mnt/pvfs2
>>>        LogFile /var/log/pvfs2-server.log
>>> </Defaults>
>>>
>>> <Aliases>
>>>        Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
>>>        Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
>>>        Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
>>> </Aliases>
>>>
>>> <Filesystem>
>>>        Name pvfs2-fs
>>>        ID 62659950
>>>        RootHandle 1048576
>>>        <MetaHandleRanges>
>>>                Range pvfs2-io-0-0 4-715827885
>>>                Range pvfs2-io-0-1 715827886-1431655767
>>>                Range pvfs2-io-0-2 1431655768-2147483649
>>>        </MetaHandleRanges>
>>>        <DataHandleRanges>
>>>                Range pvfs2-io-0-0 2147483650-2863311531
>>>                Range pvfs2-io-0-1 2863311532-3579139413
>>>                Range pvfs2-io-0-2 3579139414-4294967295
>>>        </DataHandleRanges>
>>>        <StorageHints>
>>>                TroveSyncMeta yes
>>>                TroveSyncData no
>>>        </StorageHints>
>>> </Filesystem>
>>>
>>>
>>> ---------------------
>>> /etc/pvfs2-server.conf-pvfs2-io-0-2
>>> ---------------------
>>> StorageSpace /mnt/pvfs2
>>> HostID "tcp://pvfs2-io-0-2:3334"
>>> LogFile /var/log/pvfs2-server.log
>>> ---------------------
>>>
>>> All the server config files are very similar.
>>>
>>> --Jim
>>>
>>>
>>> On Wed, Sep 28, 2011 at 4:45 PM, Michael Moore <[email protected]>
>>> wrote:
>>> > No doubt something is awry. Offhand I'm suspecting the network. A couple
>>> > things that might help give a direction:
>>> > 1) Do an end-to-end TCP test between client/server. Something like iperf
>>> > or
>>> > nuttcp should do the trick.
>>> > 2) Check server and client ethernet ports on the switch for high error
>>> > counts (not familiar with that switch, not sure if it's managed or not).
>>> > Hardware (port/cable) errors should show up in the above test.
>>> > 3) Can you mount the PVFS2 file system on the server and run some I/O
>>> > tests
>>> > (single datafile per file) to see if the network is in fact in play.
>>> > 4) What are the number of datafiles (by default) each file you're
>>> > writing to
>>> > is using? 3?
>>> > 5) When you watch network bandwidth and see 10 MB/s where is that? On
>>> > the
>>> > server?
>>> > 6) What backend are you using for I/O, direct or alt-aio. Nothing really
>>> > wrong either way, just wondering.
>>> >
>>> > It sounds like based on the dd output the disks are capable of more than
>>> > you're seeing, just need to narrow down where the performance is getting
>>> > squelched.
>>> >
>>> > Michael
>>> >
>>> > On Wed, Sep 28, 2011 at 6:10 PM, Jim Kusznir <[email protected]> wrote:
>>> >>
>>> >> Hi all:
>>> >>
>>> >> I've got a pvfs2 install on my cluster.  I never felt it was
>>> >> performing up to snuff, but lately it seems that things have gone way,
>>> >> way down in total throughput and overall usability.  To the tune that
>>> >> jobs writing out 900MB will take an extra 1-2 hours to complete due to
>>> >> disk I/O waits.  A 2-hr job that would write about 30GB over the
>>> >> course of the run (normally about 2hrs long) takes up to 20hrs.  Once
>>> >> the disk I/O is cut out, it completes in 1.5-2hrs.  I've noticed
>>> >> personally that there's up to a 5 sec lag time when I cd into
>>> >> /mnt/pvfs2 and do an ls.  Note that all of our operations are using
>>> >> the kernel module / mount point.  Our problems and code base do not
>>> >> support the use of other tools (such as the pvfs2-* or the native MPI
>>> >> libraries); its all done through the kernel module / filesystem
>>> >> mountpoint.
>>> >>
>>> >> My configuration is this:  3 pvfs2 servers (Dell PowerEdge 1950's with
>>> >> 1.6Ghz quad-core CPUs, 4GB ram, raid-0 for metadata+os on perc5i
>>> >> card), Dell Perc6e card with hardware raid6 in two volumes: one on a
>>> >> bunch of 750GB sata drives, and the other on its second SAS connector
>>> >> to about 12 2tb WD drives.  The two raid volumes are lvm'ed together
>>> >> in the OS and mounted as the pvfs2 data store.  Each server is
>>> >> connected via ethernet to a stack of LG-errison gig-e switches
>>> >> (stack==2 switches with 40Gbit stacking cables installed).  PVFS 2.8.2
>>> >> used throughout the cluster on Rocks (using site-compiled pvfs, not
>>> >> the rocks-supplied pvfs).  OSes are CentOS5-x-based (both clients and
>>> >> servers).
>>> >>
>>> >> As I said, I always felt something wasn't quite right, but a few
>>> >> months back, I performed a series of upgrades and reconfigurations on
>>> >> the infrastructure and hardware.  Specifically, I upgraded to the
>>> >> lg-errison switches and replaced a full 12-bay drive shelf with a
>>> >> 24-bay one (moving all the disks through) and adding some additional
>>> >> disks.  All three pvfs2 servers are identical in this.  At some point
>>> >> prior to these changes, my users were able to get acceptable
>>> >> performance from pvfs2; now they are not.  I don't have any evidence
>>> >> pointing to the switch or to the disks.
>>> >>
>>> >> I can run dd if=/dev/zero of=testfile bs=1024k count=10000 and get
>>> >> 380+MB/s locally on the pvfs server, writing to the partition on the
>>> >> hardware raid6 card.  From a compute node, doing that for 100MB file,
>>> >> I get 47.7MB/s to my RAID-5 NFS server on the head node, and 36.5MB/s
>>> >> to my pvfs2 mounted share.  When I watch the network
>>> >> bandwidth/throughput using bwm-ng, I rarely see more than 10MB/s, and
>>> >> often its around 4MB/s with a 12-node IO-bound job running.
>>> >>
>>> >> I originally had the pvfs2 servers connected to the switch with dual
>>> >> gig-e connections and using bonding (ALB) to make it more able to
>>> >> serve multiple nodes.  I never saw anywhere close to the throughput I
>>> >> should.  In any case, to test of that was the problem, I removed the
>>> >> bonding and am running through a single gig-e pipe now, but
>>> >> performance hasn't improved at all.
>>> >>
>>> >> I'm not sure how to troubleshoot this problem further.  Presently, the
>>> >> cluster isn't usable for large I/O jobs, so I really have to fix this.
>>> >>
>>> >> --Jim
>>> >> _______________________________________________
>>> >> Pvfs2-users mailing list
>>> >> [email protected]
>>> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>> >
>>> >
>>
>>
>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] Major Performance Issues with my pvfs2 install

Reply via email to