Re: [zfs-discuss] zpool replace - choke point

2008-12-05 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> Thanks for the tips.  I'm not sure if they will be relevant, though.  We
> don't talk directly with the AMS1000.  We are using a USP-VM to virtualize
> all of our storage and we didn't have to add anything to the drv
> configuration files to see the new disk (mpxio was already turned on).  We
> are using the Sun drivers and mpxio and we didn't require any tinkering to
> see the new LUNs.

Yes, the fact that the USP-VM was recognized automatically by Solaris drivers
is a good sign.  I suggest that you check to see what queue-depth and disksort
values you ended up with from the automatic settings:

  echo "*ssd_state::walk softstate |::print -t struct sd_lun un_throttle" \
   | mdb -k

The "ssd_state" would be "sd_state" on an x86 machine (Solaris-10).
The "un_throttle" above will show the current max_throttle (queue depth);
Replace it with "un_min_throttle" to see the min, and "un_f_disksort_disabled"
to see the current queue-sort setting.

The HDS docs for 9500 series suggested 32 as the max_throttle to use, and
the default setting (Solaris-10) was 256 (hopefully with the USP-VM you get
something more reasonable).  And while 32 did work for us, i.e. no operations
were ever lost as far as I could tell, the array back-end -- the drives
themselves, and the internal SATA shelf connections, have an actual queue
depth of four for each array controller.  The AMS1000 has the same limitation
for SATA shelves, according to our HDS engineer.

In short, Solaris, especially with ZFS, functions much better if it does
not try to send more FC operations to the array than the actual physical
devices can handle.  We were actually seeing NFS client operations hang
for minutes at a time when the SAN-hosted NFS server was making its ZFS
devices busy -- and this was true even if clients were using different
devices than the busy ones.  We do not see these hangs after making the
described changes, and I believe this is because the OS is no longer waiting
around for a response from devices that aren't going to respond in a
reasonable amount of time.

Yes, having the USP between the host and the AMS1000 will affect things;
There's probably some huge cache in there somewhere.  But unless you've
got cache of hundreds of GB in size, at some point a resilver operation
is going to end up running at the speed of the actual back-end device.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool replace - choke point

2008-12-04 Thread Alan Rubin
Thanks for the tips.  I'm not sure if they will be relevant, though.  We don't 
talk directly with the AMS1000.  We are using a USP-VM to virtualize all of our 
storage and we didn't have to add anything to the drv configuration files to 
see the new disk (mpxio was already turned on).  We are using the Sun drivers 
and mpxio and we didn't require any tinkering to see the new LUNs.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool replace - choke point

2008-12-04 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> I think we found the choke point.  The silver lining is that it isn't the
> T2000 or ZFS.  We think it is the new SAN, an Hitachi AMS1000, which has
> 7200RPM SATA disks with the cache turned off.  This system has a very small
> cache, and when we did turn it on for one of the replacement LUNs we saw a
> 10x improvement - until the cache filled up about 1 minute later (was using
> zpool iostat).  Oh well. 

We have experience with a T2000 connected to the HDS 9520V, predecessor
to the AMS arrays, with SATA drives, and it's likely that your AMS1000 SATA
has similar characteristics.  I didn't see if you're using Sun's drivers to
talk to the SAN/array, but we are using Solaris-10 (and Sun drivers + MPXIO),
and since the Hitachi storage isn't automatically recognized (sd/ssd,
scsi_vhci), it took a fair amount of tinkering to get parameters adjusted
to work well with the HDS storage.

The combination that has given us best results with ZFS is:
 (a) Tell the array to ignore SYNCHRONIZE_CACHE requests from the host.
 (b) Balance drives within each AMS disk shelf across both array controllers.
 (c) Set the host's max queue depth to 4 for the SATA LUN's (sd/ssd driver).
 (d) Set the host's disable_disksort flag (sd/ssd driver) for HDS LUN's.

Here's the reference we used for setting the parameters in Solaris-10:
  http://wikis.sun.com/display/StorageDev/Parameter+Configuration

Note that the AMS uses read-after-write verification on SATA drives,
so you only have half the IOP's for writes that the drives are capable
of handling.  We've found that small RAID volumes (e.g. a two-drive
mirror) are unbelievably slow, so you'd want to go toward having more
drives per RAID group, if possible.

Honestly, if I recall correctly what I saw in your "iostat" listings
earlier, your situation is not nearly as "bad" as with our older array.
You don't seem to be driving those HDS LUN's to the extreme busy states
that we have seen on our 9520V.  It was not unusual for us to see LUN's
at 100% busy, 100% wait, with 35 ops total in the "actv" and "wait" columns,
and I don't recall seeing any 100%-busy devices in your logs.

But getting the FC queue-depth (max-throttle) setting to match what the
array's back-end I/O can handle greatly reduced the long "zpool status"
and other I/O-related hangs that we were experiencing.  And disabling
the host-side FC queue-sorting greatly improved the overall latency of
the system when busy.  Maybe it'll help yours too.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool replace - choke point

2008-12-03 Thread Alan Rubin
I think we found the choke point.  The silver lining is that it isn't the T2000 
or ZFS.  We think it is the new SAN, an Hitachi AMS1000, which has 7200RPM SATA 
disks with the cache turned off.  This system has a very small cache, and when 
we did turn it on for one of the replacement LUNs we saw a 10x improvement - 
until the cache filled up about 1 minute later (was using zpool iostat).  Oh 
well.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool replace - choke point

2008-12-02 Thread Alan Rubin
It's something we've considered here as well.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool replace - choke point

2008-12-02 Thread Matt Walburn
Would any of this have to do with the system being a T2000? Would ZFS
resilvering be affected by single threadedness, slowish US-T1 clock
speed or lack of strong FPU performance?

On 12/1/08, Alan Rubin <[EMAIL PROTECTED]> wrote:
> We will be considering it in the new year,  but that will not happen in time
> to affect our current SAN migration.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
--
Matt Walburn
http://mattwalburn.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool replace - choke point

2008-12-01 Thread Alan Rubin
We will be considering it in the new year,  but that will not happen in time to 
affect our current SAN migration.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool replace - choke point

2008-12-01 Thread Blake
Have you considered moving to 10/08 ?  ZFS resilver performance is
much improved in this release, and I suspect that code might help you.

You can easily test upgrading with Live Upgrade.  I did the transition
using LU and was very happy with the results.

For example, I added a disk to a mirror and resilvering the new disk
took about 6 min for almost 300GB, IIRC.

Blake



On Mon, Dec 1, 2008 at 11:04 PM, Alan Rubin <[EMAIL PROTECTED]> wrote:
> I had posted at the Sun forums, but it was recommended to me to try here as 
> well.  For reference, please see 
> http://forums.sun.com/thread.jspa?threadID=5351916&tstart=0.
>
> In the process of a large SAN migration project we are moving many large 
> volumes from the old SAN to the new. We are making use of the 'replace' 
> function to replace the old volumes with similar or larger new volumes. This 
> process is moving very slowly, sometimes as slow as only moving one 
> percentage of data every 10 minutes. Is there any way to streamline this 
> method? The system is Solaris 10 08/07. How much is dependent on the activity 
> of the box? How about on the architecture of the box? The primary system in 
> question at this point is a T2000 with 8GB of RAM and a 4-core CPU. This 
> server has 6 4Gb fibre channel connections to our SAN environment. At times 
> this server is quite busy because it is our backup server, but performance 
> seems no better when backup operations have ceased their daily activities.
>
> Our pools are only stripes. Would we expect better performance from a mirror 
> or raidz pool? It is worrisome that if the environment were compromised by a 
> failed disk that it could take so long to replace and correct the usual 
> redundancies (if it was a mirror or raidz pool).
>
> I have previously applied the kernel change described here: 
> http://blogs.digitar.com/jjww/?itemid=52
>
> I just moved a 1TB volume which took approx. 27h.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool replace - choke point

2008-12-01 Thread Alan Rubin
I had posted at the Sun forums, but it was recommended to me to try here as 
well.  For reference, please see 
http://forums.sun.com/thread.jspa?threadID=5351916&tstart=0.

In the process of a large SAN migration project we are moving many large 
volumes from the old SAN to the new. We are making use of the 'replace' 
function to replace the old volumes with similar or larger new volumes. This 
process is moving very slowly, sometimes as slow as only moving one percentage 
of data every 10 minutes. Is there any way to streamline this method? The 
system is Solaris 10 08/07. How much is dependent on the activity of the box? 
How about on the architecture of the box? The primary system in question at 
this point is a T2000 with 8GB of RAM and a 4-core CPU. This server has 6 4Gb 
fibre channel connections to our SAN environment. At times this server is quite 
busy because it is our backup server, but performance seems no better when 
backup operations have ceased their daily activities.

Our pools are only stripes. Would we expect better performance from a mirror or 
raidz pool? It is worrisome that if the environment were compromised by a 
failed disk that it could take so long to replace and correct the usual 
redundancies (if it was a mirror or raidz pool). 

I have previously applied the kernel change described here: 
http://blogs.digitar.com/jjww/?itemid=52

I just moved a 1TB volume which took approx. 27h.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss