Here's what I was thinking for disk stats: Device (cXtYdZ) (not sure if there's any benefit for listing individual slices) Description (Vendor, product, revision from scsi inquiry)
Read ops Read bytes Write ops Write bytes size (in bytes) Errors: iostat has soft, hard, and transport errors -- would we want to use these, or does anyone have an idea of a better breakdown? Possible useful, but not needs more thought: type (disk, tape, etc.) -- based on scsi types paths -- Would we want to present data based on path and try to tie things together somehow? I can see this being useful for diagnosing things like misconfigured or misbalanced IO paths. It seems like every place I've seen that uses Clariions (for example) always has problems with lun tresspassing (which kills performance). But how should this be presented? The above stats could be duplicated for each path, though having some means (a common key) to tie multiple paths of a single lun would be useful. This all assumes one is using mpxio of course, other products I would think you'd be on your own. Another one that's probably worthy of discussion in its own right is the concept of average service time. IIRC (it's been a while since I've had to dig too deeply here, so my memory might be wrong), a low IOP rate can often lead to apparently high service times, which might cause undue focus there when tracking down a problem. I guess what I'm wondering is it's good to be able to see relatively fast and slow disks show up, but do the current metrics: hrtime_t wtime; /* cumulative wait (pre-service) time */ hrtime_t wlentime; /* cumulative wait length*time product*/ hrtime_t wlastupdate; /* last time wait queue changed */ hrtime_t rtime; /* cumulative run (service) time */ hrtime_t rlentime; /* cumulative run length*time product */ hrtime_t rlastupdate; /* last time run queue changed */ uint_t wcnt; /* count of elements in wait state */ uint_t rcnt; /* count of elements in run state */ give a good picture of that, or should we be looking for something else?