Here's what I was thinking for disk stats:

Device (cXtYdZ) (not sure if there's any benefit for listing individual slices)
Description (Vendor, product, revision from scsi inquiry)

Read ops
Read bytes
Write ops
Write bytes

size (in bytes)

Errors:
iostat has soft, hard, and transport errors -- would we want to use
these, or does anyone have an idea of a better breakdown?

Possible useful, but not needs more thought:

type (disk, tape, etc.) -- based on scsi types
paths -- Would we want to present data based on path and try to tie
things together somehow?  I can see this being useful for diagnosing
things like misconfigured or misbalanced IO paths.  It seems like
every place I've seen that uses Clariions (for example) always has
problems with lun tresspassing (which kills performance).  But how
should this be presented?  The above stats could be duplicated for
each path, though having some means (a common key) to tie multiple
paths of a single lun would be useful.  This all assumes one is using
mpxio of course, other products I would think you'd be on your own.

Another one that's probably worthy of discussion in its own right is
the concept of average service time.  IIRC (it's been a while since
I've had to dig too deeply here, so my memory might be wrong), a low
IOP rate can often lead to apparently high service times, which might
cause undue focus there when tracking down a problem.  I guess what
I'm wondering is it's good to be able to see relatively fast and slow
disks show up, but do the current metrics:
       hrtime_t   wtime;            /* cumulative wait (pre-service) time */
       hrtime_t   wlentime;         /* cumulative wait length*time product*/
       hrtime_t   wlastupdate;      /* last time wait queue changed */
       hrtime_t   rtime;            /* cumulative run (service) time */
       hrtime_t   rlentime;         /* cumulative run length*time product */
       hrtime_t   rlastupdate;      /* last time run queue changed */
       uint_t     wcnt;             /* count of elements in wait state */
       uint_t     rcnt;             /* count of elements in run state */

give a good picture of that, or should we be looking for something else?

Reply via email to