On 17/04/10 08:34 AM, Frank Middleton wrote:
On 04/15/10 12:19 AM, James C.McPherson wrote:
From what I've observed, the consistent feature of this
problem (where no SAS expanders are involved) is the
use of WD disk drives.
If I run Open Solaris (and ZFS) on a COMSTAR initiator, run a
scrub, and also run a scrub on the ZFS target, the iops get
cranked up to a sustained 130 or so and eventually something
very similar to CR6894775 may occur. Setting /etc/system
zfs:zfs_vdev_max_pending = 4 definitely reduces the frequency
of it happening but eventually it does if you do it often enough.
This on a simple whole-disk mirror of Seagate 7200 RPM drives
on SPARC running SXCE snv125 with 4GB. Both disks are on the
same controller with no expanders. So if this is the same problem
(and I'm pretty certain it is) then it happens on (these) Seagates,
too.
It seems to require a power cycle to reset; if you can get it to
do a warm reboot (e.g., by forcing a panic) the disks remain offline.
What is the difference between a warm reboot and a power cycle
in this context? Just wondering if there is some way that the mpt
driver could detect that every disk on a given controller has suddenly
gone offline more or less at once, then it could somehow reset the
controller. Sorry if this is a naive question, but when this happens,
however rarely, it is rather annoying albeit relatively harmless, and a
brute force fix would be better than having to power cycle, which
is always scary...
There are a bunch of fixes (performance, mostly) that went
into mpt(7d) after snv_125 closed. There have also been a
bunch of changes in the fast reboot area since snv_125.
You are doing yourself a disservice by not running a newer build.
James C. McPherson
--
Senior Software Engineer, Solaris
Oracle
http://www.jmcp.homeunix.com/blog
_______________________________________________
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss