Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of "hidden" problems found by Sun where the HBA resets, and due to market time pressure the "quick and dirty" solution was to spread the load over multiple HBA's instead of software fix?

Just my 2 cents..


Bruno


Adam Cheal wrote:
Just submitted the bug yesterday, under advice of James, so I don't have a number you can 
refer to you...the "change request" number is 6894775 if that helps or is 
directly related to the future bugid.

>From what I seen/read this problem has been around for awhile but only rears 
its ugly head under heavy IO with large filesets, probably related to large 
metadata sets as you spoke of. We are using snv_118 x64 but it seems to appear in 
snv_123 and snv_125 as well from what I read here.

We've tried installing SSD's to act as a read-cache for the pool to reduce the metadata 
hits on the physical disks and as a last-ditch effort we even tried switching to the 
"latest" LSI-supplied itmpt driver from 2007 (from reading 
http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/) and disabling 
the mpt driver but we ended up with the same timeout issues. In our case, the drives in 
the JBODs are all WD (model WD1002FBYS-18A6B0) 1TB 7.2k SATA drives.

In revisting our architecture, we compared it to Sun's x4540 Thumper offering which uses 
the same controller with similar (though apparently customized) firmware and 48 disks. 
The difference is that they use 6 x LSI1068e controllers which each have to deal with 
only 8 disks...obviously better on performance but this architecture could be 
"hiding" the real IO issue by distributing the IO across so many controllers.


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to