The iostat was running while some IO jobs were running from at least some nodes. I'm not sure exactly what portion of the job they were in (I don't run jobs, I run the system....), but I watched it for a while. I did eventually see 50% I/O at times.
I definitely saw a disproportionate amount of I/O to one of the two devices. I did not specify any stripe size when I built the LVM (didn't know anything about it), so that's probably the problem with the disproportionate I/O. Is there a way to correct that non-destructively? ping times: rtt min/avg/max/mdev = 0.114/0.777/3.079/1.014 ms Did some testing with the stripe size. I think I did what was asked: [root@compute-0-2 ~]# cd /mnt/pvfs2/kusznir [root@compute-0-2 kusznir]# setfattr -n user.pvfs2.dist_name -v simple_stripe . [root@compute-0-2 kusznir]# setfattr -n user.pvfs2.dist_params -v strip_size:1048576 . [root@compute-0-2 kusznir]# dd if=/dev/zero of=testfile bs=1024k count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 11.2155 seconds, 95.7 MB/s [root@compute-0-2 kusznir]# pvfs2-viewdist -f testfile dist_name = simple_stripe dist_params: strip_size:65536 Metadataserver: tcp://pvfs2-io-0-1:3334 Number of datafiles/servers = 3 Datafile 0 - tcp://pvfs2-io-0-1:3334, handle: 3571633946 (d4e2cf1a.bstream) Datafile 1 - tcp://pvfs2-io-0-2:3334, handle: 4288072941 (ff96cced.bstream) Datafile 2 - tcp://pvfs2-io-0-0:3334, handle: 2856061933 (aa3c0bed.bstream) [root@compute-0-2 kusznir]# --Jim On Tue, Oct 11, 2011 at 1:39 PM, Michael Moore <[email protected]> wrote: > Tp clarify, these utilization % numbers were during a job running on some > number of clients that was I/O bound? The server side was/is not CPU bound, > right? > > When you LVM'd the two RAID together did you specify number stripes and > stripe width of the logical volumes? Specifically, did you use the --stripes > and --stripesize options to lvcreate? Or none? I would expect that you did > not based on the behavior you're seeing. > > I know originally you said you were getting 30MBps when doing a 1MB block > size dd. Could you do that same test now in a directory with the stripe size > set to 1M as I mentioned in previous e-mails. > > What's the network latency between a compute node and PVFS server when doing > a ping. I would expect something in the ballpark of: > rtt min/avg/max/mdev = 0.126/0.159/0.178/0.019 ms > > Michael > > On Tue, Oct 11, 2011 at 2:33 PM, Jim Kusznir <[email protected]> wrote: >> >> I finally did manage to do this, and the results were a bit >> interesting. First, the highest amount I saw in the %utilization >> column was 16% on one server, and that was only there for 1 >> measurement period. Typical maximums were 7%. >> >> The interesting part was that my second server was rarely over 1%, my >> first server was 4-7% and my 3rd server was 5-9%. >> >> The other interesting part was where the I/O was principally >> happening. Originally, I had 8TB of 750GB SATA disks (in a hardware >> raid-6), and then I added a second RAID-6 of 2TB disks which has the >> majority of the disk space. The two are lvm'ed together. So far, >> nearly all the %utilization numbers were showing up on the 750GB >> disks. >> >> I have been running xfs_fsr to get the fragmentation down. My 3rd >> node is still at 17%; the first node is at 5%, and the 2nd node is at >> 0.7%. I've put in a cron job to run xfs_fsr for 4 hours each Sunday >> night starting at midnight (when my cluster is usually idle anyway) to >> try and improve/manage that. I'm not sure if there is actually a >> causality relationship here, but the load% seems to follow the frag% >> (higher frag, higher load). >> >> Still, the fact that ti peaks out so low has me questioning what's going >> on... >> >> Watching it a bit longer into another workload, and I do see %use >> spike up to 35%, but network I/O (as measured by bwm-ng) still peaks >> at 8MB/s on pure gig-e (which should be capable of 90MB/s). >> >> --Jim >> >> On Thu, Oct 6, 2011 at 1:36 PM, Emmanuel Florac <[email protected]> >> wrote: >> > Le Wed, 5 Oct 2011 08:44:11 -0700 vous écriviez: >> > >> >> I don't >> >> know how to watch actual IOPS or other more direct metrics. >> > >> > Use the iostat command, something like >> > >> > iostat -mx 4 >> > >> > you'll have a very detailed report on disk activity. The percentage of >> > usage (last column to the right) might be interesting. Let it run for a >> > while and see if there's a pattern. >> > >> > -- >> > ------------------------------------------------------------------------ >> > Emmanuel Florac | Direction technique >> > | Intellique >> > | <[email protected]> >> > | +33 1 78 94 84 02 >> > ------------------------------------------------------------------------ >> > >> >> _______________________________________________ >> Pvfs2-users mailing list >> [email protected] >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
