Well if we want to get into theories on faulty hardware batches and such we
can. Though I think the likelihood is slim but not impossible I suppose.
I did the best I can diagnostic wise given I have no spare parts that have
never been a part of this SAN. As I said, I still think the likelihood of two
failed HBAs or failed cables just doesn't add up. The errors thrown between
cards is pretty consistent between cable swaps too, so nothing really
indicative of A bad cable, let alone two.
My vendor has more hardware on it's way to me early this coming week.. so I'll
be able to report back once I have new HBAs and cables too.
On Dec 4, 2011, at 4:11 PM, James C. McPherson wrote:
> On 5/12/11 02:50 AM, Ryan Wehler wrote:
>> In an effort to solve this problem I did update my 3442E-R HBAs from a
>> 2009 firmware to "Phase 21" which came out earlier this year from LSI. The
>> replacement backplane I got from my VAR when they thought that was the
>> issue moved the backplane firmware from 7015 to 7017 per lsiutil's output.
>> You're right it must be a physical issue but it just seems highly unlikely
>> that BOTH HBAs failed and BOTH SAS cables failed (we'll take the expander
>> out of the equation since it was replaced)
> You need to look at the data available, rather than making
> assumptions. When I was part of CPRE (now PTS?) in Sun we
> referred to swapping hardware without investigation as
> practicing "swaptronics". Every escalation we got where this
> had happened took longer to resolve as a result.
> So yes, it certainly could be a hardware problem twice in a
> row. You'd want to examine the serial numbers and other identifying
> data such as manufacturing date codes to see how likely that is.
> In the past I've seen cases where replacement disks turned out to
> be duds across several different batches and different factories
> involved. The true root cause was traced to a chip that was supplied
> to the manufacturer by a third party.
> Personally, I'd start looking at the cables first - in my
> experience they seem to incur more physical stress through the
> connect/disconnect operations than HBAs.
> James C. McPherson
zfs-discuss mailing list