[zfs-discuss] Performance Testing

2010-08-11 Thread Paul Kraus
I know that performance has been discussed often here, but I
have just gone through some testing in preparation for deploying a
large configuration (120 drives is a large configuration for me) and I
wanted to share my results, both to share the results as well as to
see if anyone sees anything wrong in either my methodology or results.

First the hardware and requirements. We have an M4000
production server and a T2000 test server. The storage resides in five
J4400 dual attached to the T2000 (and soon to be connected to the
M4000 as well). The drives are all 750 GB SATA disks. So we have 120
drives. The data is currently residing on other storage and will be
migrated to the new storage as soon as we are happy with the
configuration. There is about 20 TB or data today, and we need growth
to at least 40 TB. We also need a small set of drives for testing. My
plan is to use 80 to 100 drives for production and 20 drives for test.
The I/O pattern is a small number of large sequential writes (to load
the data) followed by lots of random reads and some random writes (5%
sequential writes, 10% random writes, 85% random reads). The files are
relatively small, as they are scans (TIFF) of documents, median size
is 23KB. The data is divided into projects, each of which varies in
size between almost nothing up to almost 50 million objects. We
currently have multiple zpools (based on department) and multiple
datasets in each (based on project). The plan for the new storage is
to go with one zpool, and then have a dataset per department, and
datasets within the departments for each project.

Based on recommendations from our local Sun / Oracle staff, we
are planning on using raidz2 for recoverability reasons over mirroring
(to get a comparable level of fault tolerance with mirrors would
require three-way mirrors, and that does not get us the capacity we
need). I have been testing various raidz2 configurations to confirm
the data I have found regarding performance vs. number of vdevs and
size of raidz2 vdevs. I used 40 disks out of the 120 and used the same
40 disks (after culling out any that showed unusual asvc_t via iostat.
I used filebench for the testing as it seemed to generate real
differences based on zpool configuration (other tools I tried show no
statistical difference between zpool configurations).

See 
https://spreadsheets.google.com/pub?key=0AtReWsGW-SB1dFB1cmw0QWNNd0RkR1ZnN0JEb2RsLXchl=enoutput=html
for a summary of the results. The random read numbers agree with what
is expected (performance scales linearly with the number of vdevs).
The random write numbers also agree with the expected result, except
for the 4 vdevs of 10 disk raidz2 which showed higher performance than
expected. The sequential write performance actually was fairly
consistent and even showed a slight improvement with fewer vdevs of
more disks. Based on these results, and our capacity needs, I am
planning to go with 5 disk raidz2 vdevs. Since we have five J4400, I
am considering using one disk in each of the five arrays per vdev, so
that a complete failure of a J4400 does not cause any loss of data.
What is the general opinion of that approach and does anyone know how
to map the MPxIO device name back to a physical drive ?

Does anyone see any issues with either the results or the
tentative plan for the new storage layout ? Thanks, in advance, for
your feedback.

P.S. Let me know if you want me to post the filebench workloads I
used, they are the defaults with a few small tweeks (random workload
ran 64 threads, for example).

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance Testing

2010-08-11 Thread Marion Hakanson
p...@kraus-haus.org said:
  Based on these results, and our capacity needs, I am planning to go with 5
 disk raidz2 vdevs.

I did similar tests with a Thumper in 2008, with X4150/J4400 in 2009,
and more recently comparing X4170/J4400 and X4170/MD1200:
http://acc.ohsu.edu/~hakansom/thumper_bench.html
http://acc.ohsu.edu/~hakansom/j4400_bench.html
http://acc.ohsu.edu/~hakansom/md1200_loadbal_bench.html

On the Thumper, we went with 7x(4D+2P) raidz2, and as a general-purpose
NFS server performance has been fantastic except (as expected without
any NV ZIL) for the very rare lots of small synchronous I/O workloads
(like extracting a tar archive via an NFS client).

In fact, our experience with the above has led us to go with 6x(5D+2P)
on our new X4170/J4400 NFS server.  The difference between this config
and 7x(4D+2P) on the same hardware is pretty small, and both are faster
than the Thumper.


 Since we have five J4400, I am considering using one disk
 in each of the five arrays per vdev, so that a complete failure of a J4400
 does not cause any loss of data. What is the general opinion of that approach

We did something like this on the Thumper, with one disk on each of
the internal HBA's.  Since our new system has only two J4400's, we
didn't try to cover this type of failure.


 and does anyone know how to map the MPxIO device name back to a physical
 drive ?

You can use CAM to view the mapping of physical drives to device names
(with or without MPxIO enabled).  That's the most human-friendly way
that I've found.

If you're using Oracle/Sun LSI HBA's (mpt), a raidctl -l will list
out devices names like 0.0.0, 0.1.0, and so on.  That middle digit does
seem to correspond with the physical slot number in the J4400's, at least
initially.  Unfortunately (for this purpose), if you move drives around,
the raidctl names follow the drives to their new locations, as do the
Solaris device names (verified by dd if=/dev/dsk/... of=/dev/null and
watching the blinkenlights).  Also, with multiple paths, devices will
show up with two different names in raidctl -l, so it's a bit of a
pain to make sense of it all.

So, just use CAM

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss