Awesome Richard. So between sg3_utils, sdparm, smartctl, iostat, kstat, and zpool iostat, I should have more than enough at my disposal to get the information I'm looking for.
Any other tools? I might reply back in the future while building some of these tools (if things don't go right) On Mon, Apr 7, 2014 at 6:25 PM, Richard Elling < [email protected]> wrote: > > On Apr 7, 2014, at 10:20 AM, Jakob Borg <[email protected]> wrote: > > > 2014-04-07 18:13 GMT+02:00 Evan Rowley <[email protected]>: > > > > > > I'd like to measure the total amount of data written over time to the > physical devices that make up zpool zils, l2arcs, and vdevs. Each one of > these physical devices has a projected Mean Time Before Failure (MTBF), > often measured in writes or data written to the device, which I'd like to > compare with the total number of writes to the device. > > > > > > The "zpool iostat" command can do this for a given period of time, but > unless I'm mistaken, it can't do this continually for an unpredictable > amount of time. > > > > > > I'm imagining that something like this can be done using dtrace, but > I'm not completley certain if that is the best way to tackle the problem. > > > > > > Are there any other ways to do this? (before i go re-inventing the > wheel) > > > > Many SSD:s keep SMART-accessible counters for data read and written; > > Wow, that is revisionist history :-) > SCSI devices reported this info in the read/write logs long before SMART > (circa 2004) existed. > > > > > [root@anto ~]# /opt/smartmontools/sbin/smartctl -d sat,12 -A > /dev/rdsk/c3t1d0p0 > > ... > > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > > ... > > 241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always > - 355603 > > 242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always > - 64535 > > > > Haven't seen the same on any spinning disks... > > Try something like: > sg_logs -a /dev/rdsk/c0t5000C500476C5B37d0 > SEAGATE ST3300657SS 0008 > Supported log pages (spc-2) [0x0]: > 0x00 Supported log pages > 0x02 Error counters (write) > 0x03 Error counters (read) > 0x05 Error counters (verify) > 0x06 Non-medium errors > 0x0d Temperature > 0x10 Self-test results > 0x15 Background scan results (sbc-3) > 0x18 Protocol specific port > 0x37 Cache (Seagate), Miscellaneous (Hitachi) > 0x38 [unknown vendor specific page code] > 0x3e Factory (Seagate/Hitachi) > Write error counter page (spc-3) [0x2] > Errors corrected with possible delays = 0 > Total rewrites or rereads = 0 > Total errors corrected = 0 > Total times correction algorithm processed = 0 > Total bytes processed = 60158093824 > Total uncorrected errors = 0 > > ^^^^ look for zeros in the error counters, especially errors corrected > with possible delays > > Read error counter page (spc-3) [0x3] > Errors corrected without substantial delay = 258780 > Errors corrected with possible delays = 0 > Total rewrites or rereads = 0 > Total errors corrected = 258780 > Total times correction algorithm processed = 258780 > Total bytes processed = 22475677696 > Total uncorrected errors = 0 > Verify error counter page (spc-3) [0x5] > Errors corrected without substantial delay = 0 > Errors corrected with possible delays = 0 > Total rewrites or rereads = 0 > Total errors corrected = 0 > Total times correction algorithm processed = 0 > Total bytes processed = 0 > Total uncorrected errors = 0 > > ^^^^ rarely do we see verify enabled... too slow > > Non-medium error page (spc-2) [0x6] > Non-medium error count = 1 > > ^^^^ rare to discover what these actually reveal... usually need to get > access to private data > > Temperature page (spc-3) [0xd] > Current temperature = 39 C > Reference temperature = 68 C > Self-test results page (spc-3) [0x10] > Background scan results page (sbc-3) [0x15] > Status parameters: > Accumulated power on minutes: 542377 [h:m 9039:37] > > ^^^^ here is the POH, useful for MTBF-related reliability calculations > > Status: background scan enabled, none active (waiting for BMS interval > timer to expire) > Number of background scans performed: 139 > Background medium scan progress: 0.00% > Number of background medium scans performed: 1203 > Protocol Specific port page for SAS SSP (sas-2) [0x18] > relative target port id = 1 > generation code = 0 > number of phys = 1 > phy identifier = 0 > attached device type: expander device > attached reason: SMP phy control function > reason: power on > negotiated logical link rate: 6 Gbps > attached initiator port: ssp=0 stp=0 smp=0 > attached target port: ssp=0 stp=0 smp=1 > SAS address = 0x5000c500476c5b35 > attached SAS address = 0x50030480009b7a3f > attached phy identifier = 21 > Invalid DWORD count = 0 > Running disparity error count = 0 > Loss of DWORD synchronization = 0 > Phy reset problem = 0 > > ^^^^ zeros here are good, too... no running disparity errors means good > cabling > > Phy event descriptors: > Invalid word count: 0 > Running disparity error count: 0 > Loss of dword synchronization count: 0 > Phy reset problem count: 0 > relative target port id = 2 > generation code = 0 > number of phys = 1 > phy identifier = 1 > attached device type: no device attached > attached reason: unknown > reason: unknown > negotiated logical link rate: 1.5 Gbps > attached initiator port: ssp=0 stp=0 smp=0 > attached target port: ssp=0 stp=0 smp=0 > SAS address = 0x5000c500476c5b36 > attached SAS address = 0x0 > attached phy identifier = 0 > Invalid DWORD count = 0 > Running disparity error count = 0 > Loss of DWORD synchronization = 0 > Phy reset problem = 0 > Phy event descriptors: > Invalid word count: 0 > Running disparity error count: 0 > Loss of dword synchronization count: 0 > Phy reset problem count: 0 > Seagate cache page [0x37] > Blocks sent to initiator = 43897808 > Blocks received from initiator = 111026739 > Blocks read from cache and sent to initiator = 4654807 > Number of read and write commands whose size <= segment size = 1133705 > Number of read and write commands whose size > segment size = 2 > No ascii information for page = 0x38, here is hex: > 00 38 00 01 b4 00 00 03 d6 00 00 00 10 13 eb 00 08 > 10 46 a9 13 ae 00 00 00 00 13 fb 00 00 00 00 13 ee > 20 00 08 46 a9 13 e9 00 07 d1 db 13 e9 00 07 d1 db > 30 13 f3 00 00 00 00 13 da 00 00 00 00 13 e2 00 00 > ..... [truncated after 64 of 440 bytes (use '-H' to see the rest)] > Seagate/Hitachi factory page [0x3e] > number of hours powered up = 9039.62 > > ^^^^ another counter for POH > > number of minutes until next internal SMART test = 14 > > sg3_utils compiles nicely on SmartOS, as does several of the other commonly > used tools for decoding VPD, log, and mode pages. You might find sdparm to > be more user friendly (actually, you might find a rock to be more user > friendly than > sg3_utils :) But if you really want user-unfriendly, you can get VPD pages > from format(1m) > -- richard > > -- > > [email protected] > +1-760-896-4422 > > > > > > ------------------------------------------- > smartos-discuss > Archives: https://www.listbox.com/member/archive/184463/=now > RSS Feed: > https://www.listbox.com/member/archive/rss/184463/24484565-d47e1b4e > Modify Your Subscription: > https://www.listbox.com/member/?& > Powered by Listbox: http://www.listbox.com > -- - EJR ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
