Tp clarify, these utilization % numbers were during a job running on some number of clients that was I/O bound? The server side was/is not CPU bound, right?
When you LVM'd the two RAID together did you specify number stripes and stripe width of the logical volumes? Specifically, did you use the --stripes and --stripesize options to lvcreate? Or none? I would expect that you did not based on the behavior you're seeing. I know originally you said you were getting 30MBps when doing a 1MB block size dd. Could you do that same test now in a directory with the stripe size set to 1M as I mentioned in previous e-mails. What's the network latency between a compute node and PVFS server when doing a ping. I would expect something in the ballpark of: rtt min/avg/max/mdev = 0.126/0.159/0.178/0.019 ms Michael On Tue, Oct 11, 2011 at 2:33 PM, Jim Kusznir <[email protected]> wrote: > I finally did manage to do this, and the results were a bit > interesting. First, the highest amount I saw in the %utilization > column was 16% on one server, and that was only there for 1 > measurement period. Typical maximums were 7%. > > The interesting part was that my second server was rarely over 1%, my > first server was 4-7% and my 3rd server was 5-9%. > > The other interesting part was where the I/O was principally > happening. Originally, I had 8TB of 750GB SATA disks (in a hardware > raid-6), and then I added a second RAID-6 of 2TB disks which has the > majority of the disk space. The two are lvm'ed together. So far, > nearly all the %utilization numbers were showing up on the 750GB > disks. > > I have been running xfs_fsr to get the fragmentation down. My 3rd > node is still at 17%; the first node is at 5%, and the 2nd node is at > 0.7%. I've put in a cron job to run xfs_fsr for 4 hours each Sunday > night starting at midnight (when my cluster is usually idle anyway) to > try and improve/manage that. I'm not sure if there is actually a > causality relationship here, but the load% seems to follow the frag% > (higher frag, higher load). > > Still, the fact that ti peaks out so low has me questioning what's going > on... > > Watching it a bit longer into another workload, and I do see %use > spike up to 35%, but network I/O (as measured by bwm-ng) still peaks > at 8MB/s on pure gig-e (which should be capable of 90MB/s). > > --Jim > > On Thu, Oct 6, 2011 at 1:36 PM, Emmanuel Florac <[email protected]> > wrote: > > Le Wed, 5 Oct 2011 08:44:11 -0700 vous écriviez: > > > >> I don't > >> know how to watch actual IOPS or other more direct metrics. > > > > Use the iostat command, something like > > > > iostat -mx 4 > > > > you'll have a very detailed report on disk activity. The percentage of > > usage (last column to the right) might be interesting. Let it run for a > > while and see if there's a pattern. > > > > -- > > ------------------------------------------------------------------------ > > Emmanuel Florac | Direction technique > > | Intellique > > | <[email protected]> > > | +33 1 78 94 84 02 > > ------------------------------------------------------------------------ > > > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
