Awesome Richard. So between sg3_utils, sdparm, smartctl, iostat, kstat, and
zpool iostat, I should have more than enough at my disposal to get the
information I'm looking for.

Any other tools?

I might reply back in the future while building some of these tools (if
things don't go right)


On Mon, Apr 7, 2014 at 6:25 PM, Richard Elling <
[email protected]> wrote:

>
> On Apr 7, 2014, at 10:20 AM, Jakob Borg <[email protected]> wrote:
>
> > 2014-04-07 18:13 GMT+02:00 Evan Rowley <[email protected]>:
> > >
> > > I'd like to measure the total amount of data written over time to the
> physical devices that make up zpool zils, l2arcs, and vdevs. Each one of
> these physical devices has a projected Mean Time Before Failure (MTBF),
> often measured in writes or data written to the device, which I'd like to
> compare with the total number of writes to the device.
> > >
> > > The "zpool iostat" command can do this for a given period of time, but
> unless I'm mistaken, it can't do this continually for an unpredictable
> amount of time.
> > >
> > > I'm imagining that something like this can be done using dtrace, but
> I'm not completley certain if that is the best way to tackle the problem.
> > >
> > > Are there any other ways to do this? (before i go re-inventing the
> wheel)
> >
> > Many SSD:s keep SMART-accessible counters for data read and written;
>
> Wow, that is revisionist history :-)
> SCSI devices reported this info in the read/write logs long before SMART
> (circa 2004) existed.
>
> >
> > [root@anto ~]# /opt/smartmontools/sbin/smartctl -d sat,12  -A
> /dev/rdsk/c3t1d0p0
> > ...
> > ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
>  UPDATED  WHEN_FAILED RAW_VALUE
> > ...
> > 241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always
>       -       355603
> > 242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always
>       -       64535
> >
> > Haven't seen the same on any spinning disks...
>
> Try something like:
> sg_logs -a /dev/rdsk/c0t5000C500476C5B37d0
>     SEAGATE   ST3300657SS       0008
> Supported log pages  (spc-2) [0x0]:
>     0x00        Supported log pages
>     0x02        Error counters (write)
>     0x03        Error counters (read)
>     0x05        Error counters (verify)
>     0x06        Non-medium errors
>     0x0d        Temperature
>     0x10        Self-test results
>     0x15        Background scan results (sbc-3)
>     0x18        Protocol specific port
>     0x37        Cache (Seagate), Miscellaneous (Hitachi)
>     0x38        [unknown vendor specific page code]
>     0x3e        Factory (Seagate/Hitachi)
> Write error counter page  (spc-3) [0x2]
>   Errors corrected with possible delays = 0
>   Total rewrites or rereads = 0
>   Total errors corrected = 0
>   Total times correction algorithm processed = 0
>   Total bytes processed = 60158093824
>   Total uncorrected errors = 0
>
> ^^^^ look for zeros in the error counters, especially errors corrected
> with possible delays
>
> Read error counter page  (spc-3) [0x3]
>   Errors corrected without substantial delay = 258780
>   Errors corrected with possible delays = 0
>   Total rewrites or rereads = 0
>   Total errors corrected = 258780
>   Total times correction algorithm processed = 258780
>   Total bytes processed = 22475677696
>   Total uncorrected errors = 0
> Verify error counter page  (spc-3) [0x5]
>   Errors corrected without substantial delay = 0
>   Errors corrected with possible delays = 0
>   Total rewrites or rereads = 0
>   Total errors corrected = 0
>   Total times correction algorithm processed = 0
>   Total bytes processed = 0
>   Total uncorrected errors = 0
>
> ^^^^ rarely do we see verify enabled... too slow
>
> Non-medium error page  (spc-2) [0x6]
>   Non-medium error count = 1
>
> ^^^^ rare to discover what these actually reveal... usually need to get
> access to private data
>
> Temperature page  (spc-3) [0xd]
>   Current temperature = 39 C
>   Reference temperature = 68 C
> Self-test results page  (spc-3) [0x10]
> Background scan results page (sbc-3) [0x15]
>   Status parameters:
>     Accumulated power on minutes: 542377 [h:m  9039:37]
>
> ^^^^ here is the POH, useful for MTBF-related reliability calculations
>
>     Status: background scan enabled, none active (waiting for BMS interval
> timer to expire)
>     Number of background scans performed: 139
>     Background medium scan progress: 0.00%
>     Number of background medium scans performed: 1203
> Protocol Specific port page for SAS SSP  (sas-2) [0x18]
> relative target port id = 1
>   generation code = 0
>   number of phys = 1
>   phy identifier = 0
>     attached device type: expander device
>     attached reason: SMP phy control function
>     reason: power on
>     negotiated logical link rate: 6 Gbps
>     attached initiator port: ssp=0 stp=0 smp=0
>     attached target port: ssp=0 stp=0 smp=1
>     SAS address = 0x5000c500476c5b35
>     attached SAS address = 0x50030480009b7a3f
>     attached phy identifier = 21
>     Invalid DWORD count = 0
>     Running disparity error count = 0
>     Loss of DWORD synchronization = 0
>     Phy reset problem = 0
>
> ^^^^ zeros here are good, too... no running disparity errors means good
> cabling
>
>     Phy event descriptors:
>      Invalid word count: 0
>      Running disparity error count: 0
>      Loss of dword synchronization count: 0
>      Phy reset problem count: 0
> relative target port id = 2
>   generation code = 0
>   number of phys = 1
>   phy identifier = 1
>     attached device type: no device attached
>     attached reason: unknown
>     reason: unknown
>     negotiated logical link rate: 1.5 Gbps
>     attached initiator port: ssp=0 stp=0 smp=0
>     attached target port: ssp=0 stp=0 smp=0
>     SAS address = 0x5000c500476c5b36
>     attached SAS address = 0x0
>     attached phy identifier = 0
>     Invalid DWORD count = 0
>     Running disparity error count = 0
>     Loss of DWORD synchronization = 0
>     Phy reset problem = 0
>     Phy event descriptors:
>      Invalid word count: 0
>      Running disparity error count: 0
>      Loss of dword synchronization count: 0
>      Phy reset problem count: 0
> Seagate cache page [0x37]
>   Blocks sent to initiator = 43897808
>   Blocks received from initiator = 111026739
>   Blocks read from cache and sent to initiator = 4654807
>   Number of read and write commands whose size <= segment size = 1133705
>   Number of read and write commands whose size > segment size = 2
> No ascii information for page = 0x38, here is hex:
>  00     38 00 01 b4 00 00 03 d6  00 00 00 10 13 eb 00 08
>  10     46 a9 13 ae 00 00 00 00  13 fb 00 00 00 00 13 ee
>  20     00 08 46 a9 13 e9 00 07  d1 db 13 e9 00 07 d1 db
>  30     13 f3 00 00 00 00 13 da  00 00 00 00 13 e2 00 00
>  .....  [truncated after 64 of 440 bytes (use '-H' to see the rest)]
> Seagate/Hitachi factory page [0x3e]
>   number of hours powered up = 9039.62
>
> ^^^^ another counter for POH
>
>   number of minutes until next internal SMART test = 14
>
> sg3_utils compiles nicely on SmartOS, as does several of the other commonly
> used tools for decoding VPD, log, and mode pages. You might find sdparm to
> be more user friendly (actually, you might find a rock to be more user
> friendly than
> sg3_utils :) But if you really want user-unfriendly, you can get VPD pages
> from format(1m)
>  -- richard
>
> --
>
> [email protected]
> +1-760-896-4422
>
>
>
>
>
> -------------------------------------------
> smartos-discuss
> Archives: https://www.listbox.com/member/archive/184463/=now
> RSS Feed:
> https://www.listbox.com/member/archive/rss/184463/24484565-d47e1b4e
> Modify Your Subscription:
> https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com
>



-- 
 - EJR



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to