Oh, and while watching iostat on my 3 pvfs servers, I noticed that the "middle" one had notably less I/O reported than the other two (which both peaked in the 80% during portions of the write; the biggest on the middle was about 60%).
--Jim On Thu, Oct 13, 2011 at 9:45 AM, Jim Kusznir <[email protected]> wrote: > The iostat was running while some IO jobs were running from at least > some nodes. I'm not sure exactly what portion of the job they were in > (I don't run jobs, I run the system....), but I watched it for a > while. I did eventually see 50% I/O at times. > > I definitely saw a disproportionate amount of I/O to one of the two > devices. I did not specify any stripe size when I built the LVM > (didn't know anything about it), so that's probably the problem with > the disproportionate I/O. Is there a way to correct that > non-destructively? > > ping times: > rtt min/avg/max/mdev = 0.114/0.777/3.079/1.014 ms > > Did some testing with the stripe size. I think I did what was asked: > > [root@compute-0-2 ~]# cd /mnt/pvfs2/kusznir > [root@compute-0-2 kusznir]# setfattr -n user.pvfs2.dist_name -v simple_stripe > . > [root@compute-0-2 kusznir]# setfattr -n user.pvfs2.dist_params -v > strip_size:1048576 . > [root@compute-0-2 kusznir]# dd if=/dev/zero of=testfile bs=1024k count=1024 > 1024+0 records in > 1024+0 records out > 1073741824 bytes (1.1 GB) copied, 11.2155 seconds, 95.7 MB/s > [root@compute-0-2 kusznir]# pvfs2-viewdist -f testfile > dist_name = simple_stripe > dist_params: > strip_size:65536 > > Metadataserver: tcp://pvfs2-io-0-1:3334 > Number of datafiles/servers = 3 > Datafile 0 - tcp://pvfs2-io-0-1:3334, handle: 3571633946 (d4e2cf1a.bstream) > Datafile 1 - tcp://pvfs2-io-0-2:3334, handle: 4288072941 (ff96cced.bstream) > Datafile 2 - tcp://pvfs2-io-0-0:3334, handle: 2856061933 (aa3c0bed.bstream) > [root@compute-0-2 kusznir]# > > --Jim > > On Tue, Oct 11, 2011 at 1:39 PM, Michael Moore <[email protected]> wrote: >> Tp clarify, these utilization % numbers were during a job running on some >> number of clients that was I/O bound? The server side was/is not CPU bound, >> right? >> >> When you LVM'd the two RAID together did you specify number stripes and >> stripe width of the logical volumes? Specifically, did you use the --stripes >> and --stripesize options to lvcreate? Or none? I would expect that you did >> not based on the behavior you're seeing. >> >> I know originally you said you were getting 30MBps when doing a 1MB block >> size dd. Could you do that same test now in a directory with the stripe size >> set to 1M as I mentioned in previous e-mails. >> >> What's the network latency between a compute node and PVFS server when doing >> a ping. I would expect something in the ballpark of: >> rtt min/avg/max/mdev = 0.126/0.159/0.178/0.019 ms >> >> Michael >> >> On Tue, Oct 11, 2011 at 2:33 PM, Jim Kusznir <[email protected]> wrote: >>> >>> I finally did manage to do this, and the results were a bit >>> interesting. First, the highest amount I saw in the %utilization >>> column was 16% on one server, and that was only there for 1 >>> measurement period. Typical maximums were 7%. >>> >>> The interesting part was that my second server was rarely over 1%, my >>> first server was 4-7% and my 3rd server was 5-9%. >>> >>> The other interesting part was where the I/O was principally >>> happening. Originally, I had 8TB of 750GB SATA disks (in a hardware >>> raid-6), and then I added a second RAID-6 of 2TB disks which has the >>> majority of the disk space. The two are lvm'ed together. So far, >>> nearly all the %utilization numbers were showing up on the 750GB >>> disks. >>> >>> I have been running xfs_fsr to get the fragmentation down. My 3rd >>> node is still at 17%; the first node is at 5%, and the 2nd node is at >>> 0.7%. I've put in a cron job to run xfs_fsr for 4 hours each Sunday >>> night starting at midnight (when my cluster is usually idle anyway) to >>> try and improve/manage that. I'm not sure if there is actually a >>> causality relationship here, but the load% seems to follow the frag% >>> (higher frag, higher load). >>> >>> Still, the fact that ti peaks out so low has me questioning what's going >>> on... >>> >>> Watching it a bit longer into another workload, and I do see %use >>> spike up to 35%, but network I/O (as measured by bwm-ng) still peaks >>> at 8MB/s on pure gig-e (which should be capable of 90MB/s). >>> >>> --Jim >>> >>> On Thu, Oct 6, 2011 at 1:36 PM, Emmanuel Florac <[email protected]> >>> wrote: >>> > Le Wed, 5 Oct 2011 08:44:11 -0700 vous écriviez: >>> > >>> >> I don't >>> >> know how to watch actual IOPS or other more direct metrics. >>> > >>> > Use the iostat command, something like >>> > >>> > iostat -mx 4 >>> > >>> > you'll have a very detailed report on disk activity. The percentage of >>> > usage (last column to the right) might be interesting. Let it run for a >>> > while and see if there's a pattern. >>> > >>> > -- >>> > ------------------------------------------------------------------------ >>> > Emmanuel Florac | Direction technique >>> > | Intellique >>> > | <[email protected]> >>> > | +33 1 78 94 84 02 >>> > ------------------------------------------------------------------------ >>> > >>> >>> _______________________________________________ >>> Pvfs2-users mailing list >>> [email protected] >>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >> >> > _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
