I had a similar problem with a RAID shelf (switched to JBOD mode, with each 
physical disk presented as a LUN) connected via FC (qlc driver, but no MPIO).  
Running a scrub would eventually generate I/O errors and many messages like 
this:

Sep  6 15:12:53 imsfs scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,5...@e/pci1077,1...@0/f...@0,0/d...@w2100
0004d960cdec,e (sd4):
Sep  6 15:12:53 imsfs   Request Sense couldn't get sense data

and eventually one or more disks would get marked as faulted by ZFS.  This was 
under s10u6 (10/08, I think)  but I imagine it still holds for u8.  I did not 
have these problems with just one or two LUNs presented from the array, but I 
prefer to run ZFS in the recommended configuration where it manages the disks.

My storage vendor (3rd-party, not Sun) recommended that in /etc/system I add 
'set ssd:ssd_max_throttle = 23' or less and 'set ssd:ssd_io_time = 0x60' or 
0x78.   The default 0x20 (in what version of Solaris?) is apparently not enough 
in many cases.

In my case (x64) I discovered I needed sd:sd_max_throttle, etc. (not ssd, which 
is apparently only for sparc), and that the default sd_io_time on recent 
Solaris 10 already is 0x60.  Apparently the general rule for max_throttle is 
256/# of LUNs, but my vendor found that 23 was the maximum reliable setting for 
16 LUNs.

This may or may not help you but it's something to try.  Without the 
max_throttle setting, I would get errors somewhere between 30 minutes and 4 
hours into a scrub, and with it scrubs run successfully.

-Andrew


>>> Demian Phillips <demianphill...@gmail.com> 5/23/2010 8:01 AM >>> 
On Sat, May 22, 2010 at 11:33 AM, Bob Friesenhahn
<bfrie...@simple.dallas.tx.us> wrote:
> On Fri, 21 May 2010, Demian Phillips wrote:
>
>> For years I have been running a zpool using a Fibre Channel array with
>> no problems. I would scrub every so often and dump huge amounts of
>> data (tens or hundreds of GB) around and it never had a problem
>> outside of one confirmed (by the array) disk failure.
>>
>> I upgraded to sol10x86 05/09 last year and since then I have
>> discovered any sufficiently high I/O from ZFS starts causing timeouts
>> and off-lining disks. This leads to failure (once rebooted and cleaned
>> all is well) long term because you can no longer scrub reliably.
>
> The problem could be with the device driver, your FC card, or the array
> itself.  In my case, issues I thought were to blame on my motherboard or
> Solaris were due to a defective FC card and replacing the card resolved the
> problem.
>
> If the problem is that your storage array is becoming overloaded with
> requests, then try adding this to your /etc/system file:
>
> * Set device I/O maximum concurrency
> *
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29
> set zfs:zfs_vdev_max_pending = 5
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>

I've gone back to Solaris 10 11/06.
It's working fine, but I notice some differences in performance that
are I think key to the problem.

With the latest Solaris 10 (u8) throughput according to zpool iostat
was hitting about 115MB/sec sometimes a little higher.

With 11/06 it maxes out at 40MB/sec.

Both setups are using mpio devices as far as I can tell.

Next is to go back to u8 and see if the tuning you suggested will
help. It really looks to me that the OS is asking too much of the FC
chain I have.

The really puzzling thing is I just got told about a brand new Dell
Solaris x86 production box using current and supported FC devices and
a supported SAN get the same kind of problems when a scrub is run. I'm
going to investigate that and see if we can get a fix from Oracle as
that does have a support contract. It may shed some light on the issue
I am seeing on the older hardware.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to