[zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA disk?

2012-04-05 Thread Weber, Markus
Even though it's not directly ZFS related, I've seen some similar discussion
on this list and maybe someone has the final answer to this problem, as most
tips and these things could help I have found so far have not fully solved 
the problem.


We are struggling with the behaviour of the combination LSI 3081E-R and SATA
disks behind an expander.

One disk behind the expander is known to be bad. DDing from that disk causes
I/O to other (good) disks to fail soon (Solaris) or later (Linux), but for sure
it will fail and make the system unusable. 

Under Linux after some time (maybe when certain things come together) a few
LogInfo(0x31123000) will shortly interrupt I/O to other disks, but then more
and more of these logs show up making any kind of I/O to disks behind the
expander impossible.

Under Solaris it doesn't even take that long: reading once or twice from the
bad disk, then I/O to other disks mostly immediately fail (and it looks like
sometimes the HBA/bus(?) re-initialize completely).


The error code 0x31123000 is SAS(3) + PL(1) + PL_LOGINFO_CODE_ABORT(12) +
PL_LOGINFO_SUB_CODE_BREAK_ON_STUCK_LINK (3000) - I guess this relates to the
HBA -- expander link(s) not being up/re-established/??? at that time and the
HBA is maybe just not waiting long enough (but how long is long enough and
not too long?)???


I'm trying to understand a bit better

- why and who is triggering this (e.g. because mpt sends a reset bus/target)

- what exactly is going on (e.g. a reset is sent, the HBA -- expander link takes
too long or sees problems and then gets caught in a reset loop)

- if this as-per-design (e.g. SATA disks behind expanders are always toxic) 

- if this problem could be pinpointed to one instance (like it's the HBA's FW
or the expander FW) or a combination of things (like WD drives acts strange,
causing problems with the expander or so).

- any ideas to pinpoint the problem and get a clearer picture of the issue.


I did some quick, preliminary other tests, which let me think, it's most
likely a fatal LSI3081--expander problem, but I could be wrong:

- Moving the bad disk away from the expander to another port on the same HBA:
When reading from the bad disk (not behind the expander), I/O to other disks
(behind the expander) seems to be not affected at all.

- Replacing the 3081 with a 9211, keeping the bad disk behind the expander:
When reading from the bad disk, I/O to other disks seems to be shortly stopped,
but continues quickly and no errors for the good disks are seen so far (at
least under Solaris 11 booted from a LiveCD) - still not perfect, but better ..



I do have an Oracle case on this, but -even though learned a few things-
with no real result (it's not Oracle HW). WD was so kind to provide quickly
the latest FW for the drives, but not more so far and LSI is ... well, they
take their time and gave as first reply no, we are not aware of any issues
like this (strange, there are quite a bunch of postings about this out
there).


Many thanks for sharing your experience or ideas on this.
Markus



PS:

LSI 3081E-R (SAS1068E B3), running 01.33.00.00 / 6.36.00.00; expander backplane
is SC216-E1 (SASX36 A1 7015) and WD3000BLFS FW 04.04V06.

Solaris 10  11, OpenSolaris 134, OpenIndina 151a, CentOS 6.2 with 3.04.19 MPT,
OpenSuse with 11.1  4.00.43.00suse MPT and/or latest LSI drivers 4.28.00.00 ...


-- 
KPN International

Darmstädter Landstrasse 184| 60598 Frankfurt  | Germany
[T] +49 (0)69 96874-298| [F] -289 | [M] +49 (0)178 5352346
[E] markus.we...@kpn.de  | [W] www.kpn.de

KPN International ist ein eingetragenes Markenzeichen der KPN EuroRings B.V.

KPN Eurorings B.V.   | Niederlassung Frankfurt am Main
Amtsgericht Frankfurt HRB56874   | USt.IdNr. DE 225602449
Geschäftsführer  Jacobus Snijder  Louis Rustenhoven


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA disk?

2012-04-05 Thread Hung-Sheng Tsaio (Lao Tsao) Ph.D.


hi
I donot have answer but just want to mention
ZFS appliance move away from SATA to SAS with expander for a reason
IIRC ZFS appliance based on SATA  has one biggest problem, one SATA HDD 
will cause the whole array down

regards

On 4/5/2012 4:48 AM, Weber, Markus wrote:

Even though it's not directly ZFS related, I've seen some similar discussion
on this list and maybe someone has the final answer to this problem, as most
tips and these things could help I have found so far have not fully solved
the problem.


We are struggling with the behaviour of the combination LSI 3081E-R and SATA
disks behind an expander.

One disk behind the expander is known to be bad. DDing from that disk causes
I/O to other (good) disks to fail soon (Solaris) or later (Linux), but for sure
it will fail and make the system unusable.

Under Linux after some time (maybe when certain things come together) a few
LogInfo(0x31123000) will shortly interrupt I/O to other disks, but then more
and more of these logs show up making any kind of I/O to disks behind the
expander impossible.

Under Solaris it doesn't even take that long: reading once or twice from the
bad disk, then I/O to other disks mostly immediately fail (and it looks like
sometimes the HBA/bus(?) re-initialize completely).


The error code 0x31123000 is SAS(3) + PL(1) + PL_LOGINFO_CODE_ABORT(12) +
PL_LOGINFO_SUB_CODE_BREAK_ON_STUCK_LINK (3000) - I guess this relates to the
HBA -- expander link(s) not being up/re-established/??? at that time and the
HBA is maybe just not waiting long enough (but how long is long enough and
not too long?)???


I'm trying to understand a bit better

- why and who is triggering this (e.g. because mpt sends a reset bus/target)

- what exactly is going on (e.g. a reset is sent, the HBA -- expander link takes
too long or sees problems and then gets caught in a reset loop)

- if this as-per-design (e.g. SATA disks behind expanders are always toxic)

- if this problem could be pinpointed to one instance (like it's the HBA's FW
or the expander FW) or a combination of things (like WD drives acts strange,
causing problems with the expander or so).

- any ideas to pinpoint the problem and get a clearer picture of the issue.


I did some quick, preliminary other tests, which let me think, it's most
likely a fatal LSI3081--expander problem, but I could be wrong:

- Moving the bad disk away from the expander to another port on the same HBA:
When reading from the bad disk (not behind the expander), I/O to other disks
(behind the expander) seems to be not affected at all.

- Replacing the 3081 with a 9211, keeping the bad disk behind the expander:
When reading from the bad disk, I/O to other disks seems to be shortly stopped,
but continues quickly and no errors for the good disks are seen so far (at
least under Solaris 11 booted from a LiveCD) - still not perfect, but better ..



I do have an Oracle case on this, but -even though learned a few things-
with no real result (it's not Oracle HW). WD was so kind to provide quickly
the latest FW for the drives, but not more so far and LSI is ... well, they
take their time and gave as first reply no, we are not aware of any issues
like this (strange, there are quite a bunch of postings about this out
there).


Many thanks for sharing your experience or ideas on this.
Markus



PS:

LSI 3081E-R (SAS1068E B3), running 01.33.00.00 / 6.36.00.00; expander backplane
is SC216-E1 (SASX36 A1 7015) and WD3000BLFS FW 04.04V06.

Solaris 10  11, OpenSolaris 134, OpenIndina 151a, CentOS 6.2 with 3.04.19 MPT,
OpenSuse with 11.1  4.00.43.00suse MPT and/or latest LSI drivers 4.28.00.00 ...


attachment: laotsao.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] no valid replicas

2012-04-05 Thread Jim Klimov

2012-04-04 23:27, Jan-Aage Frydenbø-Bruvoll wrote:

Which OS and release?


This is OpenIndiana oi_148, ZFS pool version 28.


There was a bug in some releases circa 2010 that you might be hitting. It is
harmless, but annoying.


Ok - what bug is this, how do I verify whether I am facing it here and what 
remedies are there?


Well, if this machine can afford some downtime, you can try
to boot it from an oi_151a (or later if available by the time
you try) LiveCD/LiveUSB media, and import this pool, and do
your offlining attempt.

However before that test you should wait a few minutes after
the import and look at zpool iostat 10 output - in my recent
experience, faulty pools (i.e. having recent errors cleared
since then, like your broken file from a deleted snapshot)
began some housekeeping (i.e. releasnig the deferred-deleted
blocks). It is possible that the imported pool would cleanse
itself, and that might not be the moment where you want to
interfere with offlining - just in case.

If the pool is silent, then go on.

You could also use zdb to print out pool usage statistics
(i.e. how many blocks are there in the deferred-delete list),
but IIRC on my test pools gathering the stats took zdb at
least 40 minutes.

HTH,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA disk?

2012-04-05 Thread Paul Kraus
On Thu, Apr 5, 2012 at 4:48 AM, Weber, Markus f...@de.kpn-eurorings.net wrote:

 Even though it's not directly ZFS related, I've seen some similar discussion
 on this list and maybe someone has the final answer to this problem, as most
 tips and these things could help I have found so far have not fully solved
 the problem.


 We are struggling with the behaviour of the combination LSI 3081E-R and SATA
 disks behind an expander.

 One disk behind the expander is known to be bad. DDing from that disk causes
 I/O to other (good) disks to fail soon (Solaris) or later (Linux), but for 
 sure
 it will fail and make the system unusable.

snip

We have five J4400 loaded with SATA drives connected to two dual
port 1068E based controllers. A fully supported configuration as of
when we bought it. We also two J4400 loaded with SATA drives behind a
single dual port 1068E based controller. We also have three instances
of a single J4400 behind a dual port 1068E controller. In all cases
when I say dual port I mean dual external SAS connector, each with 4
channels, so a total of 8 channels per controller. All J4400 are dual
attached and we are running Solaris 10U9 with multi-pathing enabled.

I have not seen any odd issues with the five J4400 configuration
since we went production. In pre-production testing we found a bug in
the MPT driver that would cause a failed dead drive go undetected for
_hours_ while zfs blindly trusted the FMD layer and kept issuing I/O
requests and waiting for responses that were never coming back. This
was fixed in an IDR (which we are running) but has been fully
integrated in 10U10.

I have seen odd behavior of the single J4400 configurations when a
drive fails. I have not been able to really qualify the problem, just
very slow I/O and no logs to point at anything other than the single
failed drive. Sometimes reseating the failed drive will make it come
back to life, sometimes for a short while sometimes (apparently)
permanently.

I have not seen any odd behavior due to the J4400 in the two J4400
configuration (we have had other issues with this system, but they
were not related to the J4400).

No data has been lost due to any of the failures or outages. Thank you ZFS.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, Troy Civic Theatre Company
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] no valid replicas

2012-04-05 Thread Jim Klimov

2012-04-05 16:04, Jim Klimov написал:

2012-04-04 23:27, Jan-Aage Frydenbø-Bruvoll wrote:

Which OS and release?


This is OpenIndiana oi_148, ZFS pool version 28.


There was a bug in some releases circa 2010 that you might be
hitting. It is
harmless, but annoying.


Ok - what bug is this, how do I verify whether I am facing it here and
what remedies are there?


Well, if this machine can afford some downtime, you can try
to boot it from an oi_151a (or later if available by the time
you try) LiveCD/LiveUSB media, and import this pool, and do
your offlining attempt.


For the sake of completeness, your other options are:

* run pkg update or similar to upgrade your installation
  to the current published baseline (and reboot into it),
  but that will likely require more time and traffic to
  test than downloading a LiveCD;

* try to zpool clear and zpool scrub again in order
  to, apparently, force processing of deferred-deleted
  blocks such as that leftover from a corrupt snapshot.
  It is possible that your current oi_148 might clean
  it up or not, but it is more likely to allow offlining
  a disk when there are no recorded errors...

HTH,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA disk?

2012-04-05 Thread Weber, Markus
Paul wrote:
I have not seen any odd issues with the five J4400 configuration
 since we went production.

I'm not familiar with the J4400 at all, but isn't Sun/Oracle using -like NetAPP-
Interposer cards and thus handling the SATA drives more or less like SAS ones?


Markus


-- 
KPN International

Darmstädter Landstrasse 184| 60598 Frankfurt  | Germany
[T] +49 (0)69 96874-298| [F] -289 | [M] +49 (0)178 5352346
[E] markus.we...@kpn.de  | [W] www.kpn.de

KPN International ist ein eingetragenes Markenzeichen der KPN EuroRings B.V.

KPN Eurorings B.V.   | Niederlassung Frankfurt am Main
Amtsgericht Frankfurt HRB56874   | USt.IdNr. DE 225602449
Geschäftsführer  Jacobus Snijder  Louis Rustenhoven



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA disk?

2012-04-05 Thread Paul Kraus
On Thu, Apr 5, 2012 at 10:04 AM, Weber, Markus f...@de.kpn-eurorings.net 
wrote:
 Paul wrote:
    I have not seen any odd issues with the five J4400 configuration
 since we went production.

 I'm not familiar with the J4400 at all, but isn't Sun/Oracle using -like 
 NetAPP-
 Interposer cards and thus handling the SATA drives more or less like SAS ones?

I believe so and Oracle has pulled the SATA configurations from
what you can buy today. I'm not sure it is really valid to say that a
SATA drive with an interposer is like a SAS drive, they do take a
different path through the MPT code, so there are differences.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, Troy Civic Theatre Company
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA disk?

2012-04-05 Thread Hung-Sheng Tsaio (Lao Tsao) Ph.D.



On 4/5/2012 10:22 AM, Paul Kraus wrote:

On Thu, Apr 5, 2012 at 10:04 AM, Weber, Markusf...@de.kpn-eurorings.net  
wrote:

Paul wrote:

I have not seen any odd issues with the five J4400 configuration
since we went production.

I'm not familiar with the J4400 at all, but isn't Sun/Oracle using -like NetAPP-
Interposer cards and thus handling the SATA drives more or less like SAS ones?
J4400 is the 1st generation SAS-1  JBOD that using LSI  based SAS HBA, 
you can read about it on the 7xxx unified storage


Current generation  is  SAS-2 based oracle only sale them in ZFS appliance
now it is all SAS based 7200rpm please read the ZFS appliance doc

just google SAS vs SATA you will find the difference
regards



 I believe so and Oracle has pulled the SATA configurations from
what you can buy today. I'm not sure it is really valid to say that a
SATA drive with an interposer is like a SAS drive, they do take a
different path through the MPT code, so there are differences.

attachment: laotsao.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA disk?

2012-04-05 Thread Rocky Shek
J4400 is using LSI/SiliconStor SS1320  Interposer cards to handle SATA HD

With past experience, we have better luck with Hitachi SATA then WD SATA HD.
Sun was using Hitachi HD, too.

If you have me to choose, 7200 RPM SAS HD is still the best choice. Most of
our customers use pure SAS HD on our JBOD.  

Rocky 



-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Weber, Markus
Sent: Thursday, April 05, 2012 7:05 AM
To: Paul Kraus; ZFS Discussions
Subject: Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander +
(bad) SATA disk?

Paul wrote:
I have not seen any odd issues with the five J4400 configuration 
 since we went production.

I'm not familiar with the J4400 at all, but isn't Sun/Oracle using -like
NetAPP- Interposer cards and thus handling the SATA drives more or less like
SAS ones?


Markus


--
KPN International

Darmstädter Landstrasse 184| 60598 Frankfurt  | Germany
[T] +49 (0)69 96874-298| [F] -289 | [M] +49 (0)178 5352346
[E] markus.we...@kpn.de  | [W] www.kpn.de

KPN International ist ein eingetragenes Markenzeichen der KPN EuroRings B.V.

KPN Eurorings B.V.   | Niederlassung Frankfurt am Main
Amtsgericht Frankfurt HRB56874   | USt.IdNr. DE 225602449
Geschäftsführer  Jacobus Snijder  Louis Rustenhoven



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss