Re: [zfs-discuss] Disk failure chokes all the disks attached to the failing disk HBA

2012-05-31 Thread Weber, Markus

Antonio S. Cofiño wrote:
 [...]
 The system is a supermicro motherboard X8DTH-6F in a 4U chassis
 (SC847E1-R1400LPB) and an external SAS2 JBOD (SC847E16-RJBOD1).
 It makes a system with a total of 4 backplanes (2x SAS + 2x SAS2)
 each of them connected to a 4 different HBA (2x LSI 3081E-R (1068
 chip) + 2x LSI SAS9200-8e (2008 chip)).
 This system is has a total of 81 disk (2x SAS (SEAGATE ST3146356SS)
 + 34 SATA3 (Hitachi HDS722020ALA330) + 45 SATA6 (Hitachi HDS723020BLA642))

 The issue arise when one of the disk starts to fail making long time
 accesses. After some time (minutes, but I'm not sure) all the disks,
 connected to the same HBA, start to report errors. This situation
 produce a general failure on the ZFS making the whole POOL unavailable.
 [...]


Have been there and gave up at the end[1]. Could reproduce (even though
it took a bit longer) under most Linux versions (incl. using latest LSI
drivers) and LSI 3081E-R HBA.

Is it just mpt causing the errors or also mpt_sas?

In a lab environment the LSI 9200 HBA behaved better - I/O only dropped
shortly and then continued on the other disks without generating errors.

Had a lengthy Oracle case on this, but all proposed workarounds did
not worked for me at all, which had been (some also from other forums)

- disabling NCQ
- allow-bus-device-reset=0; to /kernel/drv/sd.conf
- set zfs:zfs_vdev_max_pending=1
- set mpt:mpt_enable_msi=0
- keep usage below 90%
- no fmservices running and did temporarily did fmadm unload disk-transport
  or other disk access stuff (smartd?)
- tried changing retries-timeout via sd-conf for the disks without any
  success and ended it doing via mdb 

At the end I knew the bad sector of the bad disk and by simply dd
this sector once or twice to /dev/zero I could easily bring down the
system/pool without any load on the disk system.


General consensus from various people: don't use SATA drives on SAS back-
planes. Some SATA drives might work better, but there seems to be no
guarantee. And even for SAS-SAS, try to avoid SAS1 backplanes.

Markus



[1] Search for What's wrong with LSI 3081 (1068) + expander + (bad) SATA
disk?
-- 
KPN International

Darmstädter Landstrasse 184| 60598 Frankfurt  | Germany
[T] +49 (0)69 96874-298| [F] -289 | [M] +49 (0)178 5352346
[E] markus.we...@kpn.de  | [W] www.kpn.de

KPN International ist ein eingetragenes Markenzeichen der KPN EuroRings B.V.

KPN Eurorings B.V.   | Niederlassung Frankfurt am Main
Amtsgericht Frankfurt HRB56874   | USt.IdNr. DE 225602449
Geschäftsführer  Jacobus Snijder  Louis Rustenhoven

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA disk?

2012-04-05 Thread Weber, Markus
Even though it's not directly ZFS related, I've seen some similar discussion
on this list and maybe someone has the final answer to this problem, as most
tips and these things could help I have found so far have not fully solved 
the problem.


We are struggling with the behaviour of the combination LSI 3081E-R and SATA
disks behind an expander.

One disk behind the expander is known to be bad. DDing from that disk causes
I/O to other (good) disks to fail soon (Solaris) or later (Linux), but for sure
it will fail and make the system unusable. 

Under Linux after some time (maybe when certain things come together) a few
LogInfo(0x31123000) will shortly interrupt I/O to other disks, but then more
and more of these logs show up making any kind of I/O to disks behind the
expander impossible.

Under Solaris it doesn't even take that long: reading once or twice from the
bad disk, then I/O to other disks mostly immediately fail (and it looks like
sometimes the HBA/bus(?) re-initialize completely).


The error code 0x31123000 is SAS(3) + PL(1) + PL_LOGINFO_CODE_ABORT(12) +
PL_LOGINFO_SUB_CODE_BREAK_ON_STUCK_LINK (3000) - I guess this relates to the
HBA -- expander link(s) not being up/re-established/??? at that time and the
HBA is maybe just not waiting long enough (but how long is long enough and
not too long?)???


I'm trying to understand a bit better

- why and who is triggering this (e.g. because mpt sends a reset bus/target)

- what exactly is going on (e.g. a reset is sent, the HBA -- expander link takes
too long or sees problems and then gets caught in a reset loop)

- if this as-per-design (e.g. SATA disks behind expanders are always toxic) 

- if this problem could be pinpointed to one instance (like it's the HBA's FW
or the expander FW) or a combination of things (like WD drives acts strange,
causing problems with the expander or so).

- any ideas to pinpoint the problem and get a clearer picture of the issue.


I did some quick, preliminary other tests, which let me think, it's most
likely a fatal LSI3081--expander problem, but I could be wrong:

- Moving the bad disk away from the expander to another port on the same HBA:
When reading from the bad disk (not behind the expander), I/O to other disks
(behind the expander) seems to be not affected at all.

- Replacing the 3081 with a 9211, keeping the bad disk behind the expander:
When reading from the bad disk, I/O to other disks seems to be shortly stopped,
but continues quickly and no errors for the good disks are seen so far (at
least under Solaris 11 booted from a LiveCD) - still not perfect, but better ..



I do have an Oracle case on this, but -even though learned a few things-
with no real result (it's not Oracle HW). WD was so kind to provide quickly
the latest FW for the drives, but not more so far and LSI is ... well, they
take their time and gave as first reply no, we are not aware of any issues
like this (strange, there are quite a bunch of postings about this out
there).


Many thanks for sharing your experience or ideas on this.
Markus



PS:

LSI 3081E-R (SAS1068E B3), running 01.33.00.00 / 6.36.00.00; expander backplane
is SC216-E1 (SASX36 A1 7015) and WD3000BLFS FW 04.04V06.

Solaris 10  11, OpenSolaris 134, OpenIndina 151a, CentOS 6.2 with 3.04.19 MPT,
OpenSuse with 11.1  4.00.43.00suse MPT and/or latest LSI drivers 4.28.00.00 ...


-- 
KPN International

Darmstädter Landstrasse 184| 60598 Frankfurt  | Germany
[T] +49 (0)69 96874-298| [F] -289 | [M] +49 (0)178 5352346
[E] markus.we...@kpn.de  | [W] www.kpn.de

KPN International ist ein eingetragenes Markenzeichen der KPN EuroRings B.V.

KPN Eurorings B.V.   | Niederlassung Frankfurt am Main
Amtsgericht Frankfurt HRB56874   | USt.IdNr. DE 225602449
Geschäftsführer  Jacobus Snijder  Louis Rustenhoven


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA disk?

2012-04-05 Thread Weber, Markus
Paul wrote:
I have not seen any odd issues with the five J4400 configuration
 since we went production.

I'm not familiar with the J4400 at all, but isn't Sun/Oracle using -like NetAPP-
Interposer cards and thus handling the SATA drives more or less like SAS ones?


Markus


-- 
KPN International

Darmstädter Landstrasse 184| 60598 Frankfurt  | Germany
[T] +49 (0)69 96874-298| [F] -289 | [M] +49 (0)178 5352346
[E] markus.we...@kpn.de  | [W] www.kpn.de

KPN International ist ein eingetragenes Markenzeichen der KPN EuroRings B.V.

KPN Eurorings B.V.   | Niederlassung Frankfurt am Main
Amtsgericht Frankfurt HRB56874   | USt.IdNr. DE 225602449
Geschäftsführer  Jacobus Snijder  Louis Rustenhoven



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss