On 04/15/10 12:19 AM, James C.McPherson wrote:
From what I've observed, the consistent feature of this problem (where no SAS expanders are involved) is the use of WD disk drives.
If I run Open Solaris (and ZFS) on a COMSTAR initiator, run a scrub, and also run a scrub on the ZFS target, the iops get cranked up to a sustained 130 or so and eventually something very similar to CR6894775 may occur. Setting /etc/system zfs:zfs_vdev_max_pending = 4 definitely reduces the frequency of it happening but eventually it does if you do it often enough. This on a simple whole-disk mirror of Seagate 7200 RPM drives on SPARC running SXCE snv125 with 4GB. Both disks are on the same controller with no expanders. So if this is the same problem (and I'm pretty certain it is) then it happens on (these) Seagates, too. It seems to require a power cycle to reset; if you can get it to do a warm reboot (e.g., by forcing a panic) the disks remain offline. What is the difference between a warm reboot and a power cycle in this context? Just wondering if there is some way that the mpt driver could detect that every disk on a given controller has suddenly gone offline more or less at once, then it could somehow reset the controller. Sorry if this is a naive question, but when this happens, however rarely, it is rather annoying albeit relatively harmless, and a brute force fix would be better than having to power cycle, which is always scary... Thanks _______________________________________________ storage-discuss mailing list storage-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/storage-discuss