[Perfmib-dev] Disk Statistics

Jason King Tue, 17 Feb 2009 14:48:59 -0600

On Tue, Feb 17, 2009 at 2:25 PM, Peter Tribble <peter.tribble at gmail.com> 
wrote:
> On Mon, Feb 16, 2009 at 4:17 PM, Jason King <jason at ansipunx.net> wrote:
>> Here's what I was thinking for disk stats:
>>
>> Device (cXtYdZ)
>
> In that form, or or simply enumerated? (Like sd#.)


I'd go with the cXtYdZ form -- I find the sd form to be almost useless honestly.

>
>> (not sure if there's any benefit for listing individual slices)
>
> There is. (At least, those slices that are of non-zero size and therefore 
> might
> actually be used for something. Filtering like that may be awkward,
> but necessary.)
> If a disk has two slices used for different purposes, then you need to know 
> how
> much traffic goes to each slice so as to be able to work back to a filesystem
> or metadevice and hence what's using it.

But if you already have the filesystem stats, would that be enough already?

Filtering on non-zero slices wouldn't be too difficult (a few ioctls
will give you that info that could be cached) -- the trick would be
detecting if the partitioning has changed.  I would suggest that the
key be the 'cXtYdZ[sN]' value -- I believe strings can be used as keys
in the table.

>
>> Description (Vendor, product, revision from scsi inquiry)
>
> I was assuming you would pull this from the device_error kstats that
> iostat -E uses.
>
>> Read ops
>> Read bytes
>> Write ops
>> Write bytes
>>
>> size (in bytes)
>
> Ah. Now that's where it gets easy for a device (disks export the size
> in the error
> kstats). I would love to be able to get slice sizes just as easily,
> but I'm not aware
> of a simple way of doing so. (That doesn't require privilege.)

net-snmp runs will full privs, so that wouldn't be an issue.  If we
had to, we could create a small helper program that could be invoked

>
>> Errors:
>> iostat has soft, hard, and transport errors -- would we want to use
>> these, or does anyone have an idea of a better breakdown?
>
> Why not all the kstats:

I suspect so, but are all the different error category part of the
scsi standard?  What about *ata devices? Are there different
categories?  I.e. are any of these likely to go away anytime soon?


>
> module: sderr                           instance: 10
> name:   sd10,err                        class:    device_error
>        Device Not Ready                0
>        Hard Errors                     0
>        Illegal Request                 0
>        Media Error                     0
>        No Device                       0
>        Predictive Failure Analysis     0
>        Product                         MAP3735NP       Revision
>        Recoverable                     0
>        Revision                        0108
>        Serial No
>        Size                            73508513280
>        Soft Errors                     0
>        Transport Errors                0
>        Vendor                          FUJITSU
>
>> Possible useful, but not needs more thought:
>>
>> type (disk, tape, etc.) -- based on scsi types
>
> What about removable media?

That becomes a bit more interesting -- do you clear/reset the data
when media is removed/replaced with different media?  If not, how is
it made clear what is being measured in that instance?

>
>> paths -- Would we want to present data based on path and try to tie
>> things together somehow?  I can see this being useful for diagnosing
>> things like misconfigured or misbalanced IO paths.  It seems like
>> every place I've seen that uses Clariions (for example) always has
>> problems with lun tresspassing (which kills performance).  But how
>> should this be presented?  The above stats could be duplicated for
>> each path, though having some means (a common key) to tie multiple
>> paths of a single lun would be useful.  This all assumes one is using
>> mpxio of course, other products I would think you'd be on your own.
>
> Well, actually if you're using mpxio then you just emulate iostat -Y, no?
>
> If not then you just show the individual paths and leave it at that. If the
> poor user needs more information then they can go talk to powerpath
> or whatever.

Would we want the stats to then be per controller, per physical
device/lun, and per path?  I think having some information to tie them
all together would be useful to have in there.  Doing that by hand, or
having to parse the output of a bunch of commands not meant to be
parsed by machine becomes tedious. That might mean per controller, per
lun, and per path are in separate tables with virtually the same
fields.

>
>> Another one that's probably worthy of discussion in its own right is
>> the concept of average service time.  IIRC (it's been a while since
>> I've had to dig too deeply here, so my memory might be wrong), a low
>> IOP rate can often lead to apparently high service times, which might
>> cause undue focus there when tracking down a problem.  I guess what
>> I'm wondering is it's good to be able to see relatively fast and slow
>> disks show up, but do the current metrics:
>>       hrtime_t   wtime;            /* cumulative wait (pre-service) time */
>>       hrtime_t   wlentime;         /* cumulative wait length*time product*/
>>       hrtime_t   wlastupdate;      /* last time wait queue changed */
>>       hrtime_t   rtime;            /* cumulative run (service) time */
>>       hrtime_t   rlentime;         /* cumulative run length*time product */
>>       hrtime_t   rlastupdate;      /* last time run queue changed */
>>       uint_t     wcnt;             /* count of elements in wait state */
>>       uint_t     rcnt;             /* count of elements in run state */
>>
>> give a good picture of that, or should we be looking for something else?
>
> They're what we have, and are thus reasonably well understood. (By understood,
> I mean that they can be turned into numbers, not that the meaningful
> interpretation
> of those numbers is well understood.)
>
> I think that exposing these as a first pass is reasonable. The
> question is whether
> it's easy for a client to make sense of them (the client has to massage the 
> data
> into service times etc), but I would rather have the client try and
> interpret them than
> try to make up some other metric. (Besides, for snmp polling you tend to 
> average
> over reasonably long timescales.)

With SNMP, it's seems easier to add than remove, so I'd like to go
with an incremental approach and just add the things that we feel
confident in, deferring less confident metrics.  If the stats are hard
to comprehend, then as Brendan suggested, might there be better
metrics we can think of that would be better?  They may not make it
into revision 1, but could be added (after support is placed within
the appropriate parts of the kernel).

[Perfmib-dev] Disk Statistics

Reply via email to