Oh, and while watching iostat on my 3 pvfs servers, I noticed that the
"middle" one had notably less I/O reported than the other two (which
both peaked in the 80% during portions of the write; the biggest on
the middle was about 60%).

--Jim

On Thu, Oct 13, 2011 at 9:45 AM, Jim Kusznir <[email protected]> wrote:
> The iostat was running while some IO jobs were running from at least
> some nodes.  I'm not sure exactly what portion of the job they were in
> (I don't run jobs, I run the system....), but I watched it for a
> while.  I did eventually see 50% I/O at times.
>
> I definitely saw a disproportionate amount of I/O to one of the two
> devices.  I did not specify any stripe size when I built the LVM
> (didn't know anything about it), so that's probably the problem with
> the disproportionate I/O.  Is there a way to correct that
> non-destructively?
>
> ping times:
> rtt min/avg/max/mdev = 0.114/0.777/3.079/1.014 ms
>
> Did some testing with the stripe size.  I think I did what was asked:
>
> [root@compute-0-2 ~]# cd /mnt/pvfs2/kusznir
> [root@compute-0-2 kusznir]# setfattr -n user.pvfs2.dist_name -v simple_stripe 
> .
> [root@compute-0-2 kusznir]# setfattr -n user.pvfs2.dist_params -v
> strip_size:1048576 .
> [root@compute-0-2 kusznir]# dd if=/dev/zero of=testfile bs=1024k count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 11.2155 seconds, 95.7 MB/s
> [root@compute-0-2 kusznir]# pvfs2-viewdist -f testfile
> dist_name = simple_stripe
> dist_params:
> strip_size:65536
>
> Metadataserver: tcp://pvfs2-io-0-1:3334
> Number of datafiles/servers = 3
> Datafile 0 - tcp://pvfs2-io-0-1:3334, handle: 3571633946 (d4e2cf1a.bstream)
> Datafile 1 - tcp://pvfs2-io-0-2:3334, handle: 4288072941 (ff96cced.bstream)
> Datafile 2 - tcp://pvfs2-io-0-0:3334, handle: 2856061933 (aa3c0bed.bstream)
> [root@compute-0-2 kusznir]#
>
> --Jim
>
> On Tue, Oct 11, 2011 at 1:39 PM, Michael Moore <[email protected]> wrote:
>> Tp clarify, these utilization % numbers were during a job running on some
>> number of clients that was I/O bound? The server side was/is not CPU bound,
>> right?
>>
>> When you LVM'd the two RAID together did you specify number stripes and
>> stripe width of the logical volumes? Specifically, did you use the --stripes
>> and --stripesize options to lvcreate? Or none? I would expect that you did
>> not based on the behavior you're seeing.
>>
>> I know originally you said you were getting 30MBps when doing a 1MB block
>> size dd. Could you do that same test now in a directory with the stripe size
>> set to 1M as I mentioned in previous e-mails.
>>
>> What's the network latency between a compute node and PVFS server when doing
>> a ping. I would expect something in the ballpark of:
>> rtt min/avg/max/mdev = 0.126/0.159/0.178/0.019 ms
>>
>> Michael
>>
>> On Tue, Oct 11, 2011 at 2:33 PM, Jim Kusznir <[email protected]> wrote:
>>>
>>> I finally did manage to do this, and the results were a bit
>>> interesting.  First, the highest amount I saw in the %utilization
>>> column was 16% on one server, and that was only there for 1
>>> measurement period.  Typical maximums were 7%.
>>>
>>> The interesting part was that my second server was rarely over 1%, my
>>> first server was 4-7% and my 3rd server was 5-9%.
>>>
>>> The other interesting part was where the I/O was principally
>>> happening.  Originally, I had 8TB of 750GB SATA disks (in a hardware
>>> raid-6), and then I added a second RAID-6 of 2TB disks which has the
>>> majority of the disk space.  The two are lvm'ed together.  So far,
>>> nearly all the %utilization numbers were showing up on the 750GB
>>> disks.
>>>
>>> I have been running xfs_fsr to get the fragmentation down.  My 3rd
>>> node is still at 17%; the first node is at 5%, and the 2nd node is at
>>> 0.7%.  I've put in a cron job to run xfs_fsr for 4 hours each Sunday
>>> night starting at midnight (when my cluster is usually idle anyway) to
>>> try and improve/manage that.  I'm not sure if there is actually a
>>> causality relationship here, but the load% seems to follow the frag%
>>> (higher frag, higher load).
>>>
>>> Still, the fact that ti peaks out so low has me questioning what's going
>>> on...
>>>
>>> Watching it a bit longer into another workload, and I do see %use
>>> spike up to 35%, but network I/O (as measured by bwm-ng) still peaks
>>> at 8MB/s on pure gig-e (which should be capable of 90MB/s).
>>>
>>> --Jim
>>>
>>> On Thu, Oct 6, 2011 at 1:36 PM, Emmanuel Florac <[email protected]>
>>> wrote:
>>> > Le Wed, 5 Oct 2011 08:44:11 -0700 vous écriviez:
>>> >
>>> >>  I don't
>>> >> know how to watch actual IOPS or other more direct metrics.
>>> >
>>> > Use the iostat command, something like
>>> >
>>> > iostat -mx 4
>>> >
>>> > you'll have a very detailed report on disk activity. The percentage of
>>> > usage (last column to the right) might be interesting. Let it run for a
>>> > while and see if there's a pattern.
>>> >
>>> > --
>>> > ------------------------------------------------------------------------
>>> > Emmanuel Florac     |   Direction technique
>>> >                    |   Intellique
>>> >                    |   <[email protected]>
>>> >                    |   +33 1 78 94 84 02
>>> > ------------------------------------------------------------------------
>>> >
>>>
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> [email protected]
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to