Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-17 Thread Carson Gaspar
On 2/16/11 9:58 PM, Krunal Desai wrote: When I try to do a SMART status read (more than just a simple identify), looks like the 1068E drops the drive for a little bit. I bought the Intel-branded LSI SAS3081E: Current active firmware version is 0120 (1.32.00) Firmware image's version is

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-17 Thread Krunal Desai
On Thu, Feb 17, 2011 at 10:52 AM, Carson Gaspar car...@taltos.org wrote: Please give the _exact_ command you are running. I see the same thing, but only if I tray and retrieve some of the extended info (-x...). I don't see it with -a. Sure, here it is (apologies in advance if GMail applies its

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-16 Thread Krunal Desai
On Wed, Feb 2, 2011 at 8:38 PM, Carson Gaspar car...@taltos.org wrote: Works For Me (TM). c7t0d0 is hanging off an LSI SAS3081E-R (SAS1068E chip) rev B3 MPT rev 105 Firmware rev 011d (1.29.00.00) (IT FW) This is a SATA disk - I don't have any SAS disks behind a LSI1068E to test. When I

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-02 Thread Richard Elling
On Feb 1, 2011, at 8:54 PM, Krunal Desai wrote: On Tue, Feb 1, 2011 at 11:34 PM, Richard Elling richard.ell...@gmail.com wrote: There is a failure going on here. It could be a cable or it could be a bad disk or firmware. The actual fault might not be in the disk reporting the errors (!)

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-02 Thread Oyvind Syljuasen
I agree that we need to get email updates for failing devices. If FMA discovers it, email can be sent, at least in Solaris Express 11; http://blogs.sun.com/robj/entry/fma_and_email_notifications br, syljua -- This message posted from opensolaris.org

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-02 Thread Carson Gaspar
On 2/1/11 5:52 PM, Krunal Desai wrote: SMART status was reported healthy as well (got smartctl kind of working), but I cannot read the SMART data of my disks behind the 1068E due to limitations of smartmontools I guess. (e.g. 'smartctl -d scsi -a /dev/rdsk/c10t0d0' gives me serial #, model, and

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-02 Thread Krunal Desai
This error code means the device is gone. The command got the bus, but could not access the target. Thanks for that! I updated firmware on both of my USAS-L8i (LSI1068E based), and while controller numbering has shifted around in Solaris (went from c10/c11 to c11/c12, not a big deal I think),

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-02 Thread Krunal Desai
# uname -a SunOS gandalf.taltos.org 5.11 snv_151a i86pc i386 i86pc movax@megatron:~# uname -a SunOS megatron 5.11 snv_151a i86pc i386 i86pc # /usr/local/sbin/smartctl -H -i -d sat /dev/rdsk/c7t0d0                                       smartctl 5.40 2010-10-16 r3189 [i386-pc-solaris2.11]

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-02 Thread Carson Gaspar
On 2/2/11 5:47 PM, Krunal Desai wrote: Fails for me, my version does not recognize the 'sat' option. I've been using -d scsi: movax@megatron:~# smartctl -h smartctl version 5.36 [i386-pc-solaris2.8] Copyright (C) 2002-6 Bruce Allen So build the current version of smartmontools. As you should

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-02 Thread Krunal Desai
So build the current version of smartmontools. As you should have seen in my original response, I'm using 5.40. Bugs in 5.36 are unlikely to be interesting to the maintainers of the package ;-) Oops, missed that in your log. Will try compiling from source and see what happens. Also,

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-02 Thread Eric D. Mudama
On Wed, Feb 2 at 21:05, Krunal Desai wrote: So build the current version of smartmontools. As you should have seen in my original response, I'm using 5.40. Bugs in 5.36 are unlikely to be interesting to the maintainers of the package ;-) Oops, missed that in your log. Will try compiling

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-02 Thread Krunal Desai
If you search for 'lsiutil solaris' on lsi.com, it'll direct you to zipfile that includes a solaris binary for x86 solaris. Yep, that worked, grabbed it off some other adapter's page. Thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-02 Thread Richard Elling
On Feb 2, 2011, at 8:59 AM, Oyvind Syljuasen wrote: I agree that we need to get email updates for failing devices. If FMA discovers it, email can be sent, at least in Solaris Express 11; http://blogs.sun.com/robj/entry/fma_and_email_notifications For NexentaStor we have a slightly

[zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-01 Thread Krunal Desai
I recently discovered a drive failure (either that or a loose cable, I need to investigate further) on my home fileserver. 'fmadm faulty' returns no output, but I can clearly see a failure when I do zpool status -v: pool: tank state: DEGRADED status: One or more devices has been removed by the

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-01 Thread Cindy Swearingen
Hi Krunal, It looks to me like FMA thinks that you removed the disk so you'll need to confirm whether the cable dropped or something else. I agree that we need to get email updates for failing devices. See if fmdump generated an error report using the commands below. Thanks, Cindy # fmdump

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-01 Thread Krunal Desai
On Tue, Feb 1, 2011 at 1:29 PM, Cindy Swearingen cindy.swearin...@oracle.com wrote: I agree that we need to get email updates for failing devices. Definitely! See if fmdump generated an error report using the commands below. Unfortunately not, see below: movax@megatron:/root# fmdump TIME

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-01 Thread Cindy Swearingen
I misspoke and should clarify: 1. fmdump identifies fault reports that explain system issues 2. fmdump -eV identifies errors or problem symptoms I'm unclear about your REMOVED status. I don't see it very often. The ZFS Admin Guide says: REMOVED The device was physically removed while the

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-01 Thread Krunal Desai
On Tue, Feb 1, 2011 at 6:11 PM, Cindy Swearingen cindy.swearin...@oracle.com wrote: I misspoke and should clarify: 1. fmdump identifies fault reports that explain system issues 2. fmdump -eV identifies errors or problem symptoms Gotcha; fmdump -eV gives me the information I need. It appears

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-01 Thread Richard Elling
On Feb 1, 2011, at 5:52 PM, Krunal Desai wrote: On Tue, Feb 1, 2011 at 6:11 PM, Cindy Swearingen cindy.swearin...@oracle.com wrote: I misspoke and should clarify: 1. fmdump identifies fault reports that explain system issues 2. fmdump -eV identifies errors or problem symptoms Gotcha;

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-01 Thread Krunal Desai
The output of fmdump is explicit. I am interested to know if you saw aborts and timeouts or some other errors. I have the machine off atm while I install new disks (18x ST32000542AS), but IIRC they appeared as transport errors (scsi.something.transport, I can paste the exact errors in a

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-01 Thread Richard Elling
On Feb 1, 2011, at 6:49 PM, Krunal Desai wrote: The output of fmdump is explicit. I am interested to know if you saw aborts and timeouts or some other errors. I have the machine off atm while I install new disks (18x ST32000542AS), but IIRC they appeared as transport errors

Re: [zfs-discuss] fmadm faulty not showing faulty/offline disks?

2011-02-01 Thread Krunal Desai
On Tue, Feb 1, 2011 at 11:34 PM, Richard Elling richard.ell...@gmail.com wrote: There is a failure going on here.  It could be a cable or it could be a bad disk or firmware. The actual fault might not be in the disk reporting the errors (!) It is not a media error. Errors were as follows: