On Oct 23, 2009, at 4:46 PM, Tim Cook wrote:
On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal <ach...@pnimedia.com> wrote: I don't think there was any intention on Sun's part to ignore the problem...obviously their target market wants a performance-oriented box and the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY channels = 1 channel per drive = no contention for channels. The x4540 is a monster and performs like a dream with snv_118 (we have a few ourselves).

My issue is that implementing an archival-type solution demands a dense, simple storage platform that performs at a reasonable level, nothing more. Our design has the same controller chip (8 SAS PHY channels) driving 46 disks, so there is bound to be contention there especially in high-load situations. I just need it to work and handle load gracefully, not timeout and cause disk "failures"; at this point I can't even scrub the zpools to verify the data we have on there is valid. From a hardware perspective, the 3801E card is spec'ed to handle our architecture; the OS just seems to fall over somewhere though and not be able to throttle itself in certain intensive IO situations.

That said, I don't know whether to point the finger at LSI's firmware or mpt-driver/ZFS. Sun obviously has a good relationship with LSI as their 1068E is the recommended SAS controller chip and is used in their own products. At least we've got a bug filed now, and we can hopefully follow this through to find out where the system breaks down.


Have you checked in with LSI to verify the IOPS ability of the chip? Just because it supports having 46 drives attached to one ASIC doesn't mean it can actually service all 46 at once. You're talking (VERY conservatively) 2800 IOPS.

Tim has a valid point. By default, ZFS will queue 35 commands per disk.
For 46 disks that is 1,610 concurrent I/Os. Historically, it has proven to be
relatively easy to crater performance or cause problems with very, very,
very expensive arrays that are easily overrun by Solaris. As a result, it is not uncommon to see references to setting throttles, especially in older docs.

Fortunately, this is  simple to test by reducing the number of I/Os ZFS
will queue.  See the Evil Tuning Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

The mpt source is not open, so the mpt driver's reaction to 1,610 concurrent I/Os can only be guessed from afar -- public LSI docs mention a number of 511 concurrent I/Os for SAS1068, but it is not clear to me that is an explicit limit. If
you have success with zfs_vdev_max_pending set to 10, then the mystery
might be solved. Use iostat to observe the wait and actv columns, which
show the number of transactions in the queues.  JCMP?

NB sometimes a driver will have the limit be configurable. For example, to get high performance out of a high-end array attached to a qlc card, I've set the execution-throttle in /kernel/drv/qlc.conf to be more than two orders of magnitude greater than its default of 32. /kernel/drv/mpt*.conf does not seem
to have a similar throttle.
 -- richard

Even ignoring that, I know for a fact that the chip can't handle raw throughput numbers on 46 disks unless you've got some very severe raid overhead. That chip is good for roughly 2GB/sec each direction. 46 7200RPM drives can fairly easily push 4x that amount in streaming IO loads.

Long story short, it appears you've got a 5lbs bag a 50lbs load...

--Tim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to