On Apr 7, 2014, at 10:20 AM, Jakob Borg <[email protected]> wrote:

> 2014-04-07 18:13 GMT+02:00 Evan Rowley <[email protected]>:
> >
> > I'd like to measure the total amount of data written over time to the 
> > physical devices that make up zpool zils, l2arcs, and vdevs. Each one of 
> > these physical devices has a projected Mean Time Before Failure (MTBF), 
> > often measured in writes or data written to the device, which I'd like to 
> > compare with the total number of writes to the device.
> >  
> > The "zpool iostat" command can do this for a given period of time, but 
> > unless I'm mistaken, it can't do this continually for an unpredictable 
> > amount of time.
> >  
> > I'm imagining that something like this can be done using dtrace, but I'm 
> > not completley certain if that is the best way to tackle the problem.
> >  
> > Are there any other ways to do this? (before i go re-inventing the wheel)
> 
> Many SSD:s keep SMART-accessible counters for data read and written;

Wow, that is revisionist history :-)
SCSI devices reported this info in the read/write logs long before SMART (circa 
2004) existed.

> 
> [root@anto ~]# /opt/smartmontools/sbin/smartctl -d sat,12  -A 
> /dev/rdsk/c3t1d0p0
> ...
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  
> WHEN_FAILED RAW_VALUE
> ...
> 241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always      
>  -       355603
> 242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always      
>  -       64535
> 
> Haven't seen the same on any spinning disks...

Try something like:
sg_logs -a /dev/rdsk/c0t5000C500476C5B37d0
    SEAGATE   ST3300657SS       0008
Supported log pages  (spc-2) [0x0]:
    0x00        Supported log pages
    0x02        Error counters (write)
    0x03        Error counters (read)
    0x05        Error counters (verify)
    0x06        Non-medium errors
    0x0d        Temperature
    0x10        Self-test results
    0x15        Background scan results (sbc-3)
    0x18        Protocol specific port
    0x37        Cache (Seagate), Miscellaneous (Hitachi)
    0x38        [unknown vendor specific page code]
    0x3e        Factory (Seagate/Hitachi)
Write error counter page  (spc-3) [0x2]
  Errors corrected with possible delays = 0
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total times correction algorithm processed = 0
  Total bytes processed = 60158093824
  Total uncorrected errors = 0

^^^^ look for zeros in the error counters, especially errors corrected with 
possible delays

Read error counter page  (spc-3) [0x3]
  Errors corrected without substantial delay = 258780
  Errors corrected with possible delays = 0
  Total rewrites or rereads = 0
  Total errors corrected = 258780
  Total times correction algorithm processed = 258780
  Total bytes processed = 22475677696
  Total uncorrected errors = 0
Verify error counter page  (spc-3) [0x5]
  Errors corrected without substantial delay = 0
  Errors corrected with possible delays = 0
  Total rewrites or rereads = 0
  Total errors corrected = 0
  Total times correction algorithm processed = 0
  Total bytes processed = 0
  Total uncorrected errors = 0

^^^^ rarely do we see verify enabled... too slow

Non-medium error page  (spc-2) [0x6]
  Non-medium error count = 1

^^^^ rare to discover what these actually reveal... usually need to get access 
to private data

Temperature page  (spc-3) [0xd]
  Current temperature = 39 C
  Reference temperature = 68 C
Self-test results page  (spc-3) [0x10]
Background scan results page (sbc-3) [0x15]
  Status parameters:
    Accumulated power on minutes: 542377 [h:m  9039:37]

^^^^ here is the POH, useful for MTBF-related reliability calculations

    Status: background scan enabled, none active (waiting for BMS interval 
timer to expire)
    Number of background scans performed: 139
    Background medium scan progress: 0.00%
    Number of background medium scans performed: 1203
Protocol Specific port page for SAS SSP  (sas-2) [0x18]
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: expander device
    attached reason: SMP phy control function
    reason: power on
    negotiated logical link rate: 6 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=1
    SAS address = 0x5000c500476c5b35
    attached SAS address = 0x50030480009b7a3f
    attached phy identifier = 21
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0

^^^^ zeros here are good, too... no running disparity errors means good cabling

    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0
relative target port id = 2
  generation code = 0
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: 1.5 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c500476c5b36
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0
Seagate cache page [0x37]
  Blocks sent to initiator = 43897808
  Blocks received from initiator = 111026739
  Blocks read from cache and sent to initiator = 4654807
  Number of read and write commands whose size <= segment size = 1133705
  Number of read and write commands whose size > segment size = 2
No ascii information for page = 0x38, here is hex:
 00     38 00 01 b4 00 00 03 d6  00 00 00 10 13 eb 00 08
 10     46 a9 13 ae 00 00 00 00  13 fb 00 00 00 00 13 ee
 20     00 08 46 a9 13 e9 00 07  d1 db 13 e9 00 07 d1 db
 30     13 f3 00 00 00 00 13 da  00 00 00 00 13 e2 00 00
 .....  [truncated after 64 of 440 bytes (use '-H' to see the rest)]
Seagate/Hitachi factory page [0x3e]
  number of hours powered up = 9039.62

^^^^ another counter for POH

  number of minutes until next internal SMART test = 14

sg3_utils compiles nicely on SmartOS, as does several of the other commonly
used tools for decoding VPD, log, and mode pages. You might find sdparm to
be more user friendly (actually, you might find a rock to be more user friendly 
than
sg3_utils :) But if you really want user-unfriendly, you can get VPD pages from 
format(1m)
 -- richard

--

[email protected]
+1-760-896-4422





-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to