On Sat, Oct 20, 2012 at 7:39 AM, Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris) <
opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:

> > From: Timothy Coalson [mailto:tsc...@mst.edu]
> > Sent: Friday, October 19, 2012 9:43 PM
> >
> > A shot in the dark here, but perhaps one of the disks involved is taking
> a long
> > time to return from reads, but is returning eventually, so ZFS doesn't
> notice
> > the problem?  Watching 'iostat -x' for busy time while a VM is hung
> might tell
> > you something.
> Oh yeah - this is also bizarre.  I watched "zpool iostat" for a while.  It
> was showing me :
> Operations (read and write) consistently 0
> Bandwidth (read and write) consistently non-zero, but something small,
> like 1k-20k or so.
> Maybe that is normal to someone who uses zpool iostat more often than I
> do.  But to me, zero operations resulting in non-zero bandwidth defies
> logic.
> It might be operations per second, and is rounding down (I know this
happens in DTrace normalization, not sure about zpool/zfs), try an interval
of 1 (perhaps with -v) and see if you still get 0 operations.  I haven't
seen zero operations with nonzero bandwidth on my pools, I always see lots
of operations in bursts, so it sounds like you might be on to something.

Also, iostat -x shows device busy time, which is usually higher on the
slowest disk when there is an imbalance, while zpool iostat does not.  So,
if it happens to be a single device's fault, iostat -nx has a better chance
of finding it (the n flag translates the disk names to the device names
used by the system, so you can figure out which one is the problem).

