Hey there,

Few things:
- Using /dev/zero is not necessarily a great test. I typically use /dev/urandom to create an initial block-o-stuff - something like a gig or so worth, in /tmp, then use dd to push that to my zpool. (/dev/zero will return dramatically different results depending on pool/dataset settings for compression etc.) - Indeed - getting a total aggregate of 180MB/s seems pretty low on the face of it for the setup you have. What's the controller you are using? Any details on the driver, backplane, expander, array or other you might be using? - Have you tried your dd on individual spindles? You might find that they behave differently - Does your controller have DRAM on it? Can you put it in passthrough mode rather than cache? - I have done some testing trying to find odd behaviour like this before, and found on different occasions a number of different things:
    - Drives: Things like the WD 'green' drives getting in my way
- Alignment for non-EFI labled disks (hm - maybe even on EFI... that one was a while ago) (particularly for 4K 'advanced format' (ha!) disks) - The controller was unable to keep up. (In one case, I ended up tossing an HP P400 (IIRC) and using the on-motherboard chipset as it was considerably faster when running four disks - Disks with wildly different performance characteristics were also bad (eg: Enterprise SATA mixed with 5400 RPM disks. ;)


I'd suggest that you spend a little time validating the basic assumptions around:
 - speed of individual disks,
 - speed of individual buses
- Whether you are being limited by CPU (ie: If you have compression or dedupe turned on) (view with mpstat and friends) - I'll also note that you are looking close to the number of IOPS I'd expect a consumer disk to supply assuming a somewhat random distribution of IOPS. - Consider that your 180MB/s is actually 360 (well - not quite - but it's a lot more than 180). Remember - in a mirror, you literally need to write the data twice.
          8.0 3857.8 64.0 337868.8 0.0 64.5 0.0 16.7 0 704 c5
          (Note above is your c5 controller - running at around 337 MB/s)

Incidentally - this seems awfully close to 3Gb/s... How did you say all of your external drives were attached? If I didn't know better, I'd be asking serious questions about how many lanes of a SAS connection sata attached drives were able to use... Actually - I don't know better, so I'd ask anyway... ;)

I think this will likely go along way to helping understand where the holdup is.

There is also a heap of great stuff on solarisinternals.com which I'd highly recommend taking a look at after you have validated the basics...

Were this one of my systems, (and especially if it's new, and you don't love your data and can re-create the pool) I'd be tempted to do something like a very destructive...

for i in <all your disks>
do
    dd if=/tmp/randomdata.file.I.created.earlier of=/dev/rdsk/${i} &
done

and see how much you can stuff down the pipe.

Remember - this will kill whatever is on the disks, do think twice before you do it. ;)

If you can't get at least 80-100MB/s on the outside of the platter, I'd suggest you should be looking at layers below ZFS. If you *can*, then you start looking further up the stack.

Hope this helps somewhat. Let us know how you go.

Cheers!

Nathan.

On 02/ 1/12 04:52 AM, Mohammed Naser wrote:
Hi list!

I have seen less-than-stellar ZFS performance on a setup of one main
head connected to a JBOD (using SAS, but drives are SATA).  There are
16 drives (8 mirrors) in this pool but I'm getting 180ish MB
sequential writes (using dd, I know it's not precise, but those
numbers should be higher).

With some help on IRC, it seems that part of the reason I'm slowing
down is some drives seem to be slower than the others.  Initially, I
had some drives running at 1.5 mode instead of 3.0 -- They are all
running at 3.0 now.  While running the following dd command, the
output of iostat reflects a much higher %b which seems to say that
those drives are slower (but could they really be slowing down
everything else that much? --- Or am I looking at the wrong spot
here?) -- The pool configuration is also included below

dd if=/dev/zero of=4g bs=1M count=4000

                     extended device statistics
     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
     1.0    0.0    8.0    0.0  0.0  0.0    0.0    0.2   0   0 c1
     1.0    0.0    8.0    0.0  0.0  0.0    0.0    0.2   0   0 c1t2d0
     8.0 3857.8   64.0 337868.8  0.0 64.5    0.0   16.7   0 704 c5
     0.0  259.0    0.0 26386.2  0.0  3.6    0.0   14.0   0  37
c5t50014EE0ACE4AEEFd0
     1.0  266.0    8.0 27139.2  0.0  3.6    0.0   13.5   0  37
c5t50014EE056EB0356d0
     2.0  276.0   16.0 19315.1  0.0  3.7    0.0   13.3   0  40
c5t50014EE00239C976d0
     0.0  279.0    0.0 19699.0  0.0  3.6    0.0   13.0   0  37
c5t50014EE0577C459Cd0
     1.0  232.0    8.0 23061.9  0.0  3.6    0.0   15.4   0  37
c5t50014EE0578F60F5d0
     0.0  227.0    0.0 22677.9  0.0  3.6    0.0   15.8   0  37
c5t50014EE0AC407BAEd0
     0.0  205.0    0.0 24870.2  0.0  3.4    0.0   16.6   0  35
c5t50014EE0AC408605d0
     0.0  205.0    0.0 24870.2  0.0  3.4    0.0   16.6   0  35
c5t50014EE056EB0B94d0
     1.0  210.0    8.0 15954.2  0.0  4.4    0.0   20.8   0  68
c5t5000C50010C77647d0
     0.0  212.0    0.0 16082.2  0.0  4.1    0.0   19.2   0  42
c5t5000C50010C865DEd0
     0.0  207.0    0.0 20093.9  0.0  4.2    0.0   20.3   0  45
c5t5000C50010C77679d0
     0.0  208.0    0.0 19689.5  0.0  4.1    0.0   19.8   0  44
c5t5000C50010C7672Dd0
     0.0  259.0    0.0 14013.7  0.0  5.1    0.0   19.7   0  53
c5t5000C5000A11B600d0
     2.0  320.0   16.0 19942.9  0.0  6.9    0.0   21.5   0  84
c5t5000C50008315CE5d0
     1.0  259.0    8.0 23380.2  0.0  3.6    0.0   13.9   0  37
c5t50014EE001407113d0
     0.0  234.0    0.0 20692.4  0.0  3.6    0.0   15.4   0  38
c5t50014EE00194FB1Bd0

   pool: tank
  state: ONLINE
   scan: scrub canceled on Mon Jan 30 11:07:02 2012
config:

         NAME                       STATE     READ WRITE CKSUM
         tank                       ONLINE       0     0     0
           mirror-0                 ONLINE       0     0     0
             c5t50014EE0ACE4AEEFd0  ONLINE       0     0     0
             c5t50014EE056EB0356d0  ONLINE       0     0     0
           mirror-1                 ONLINE       0     0     0
             c5t50014EE00239C976d0  ONLINE       0     0     0
             c5t50014EE0577C459Cd0  ONLINE       0     0     0
           mirror-3                 ONLINE       0     0     0
             c5t50014EE0578F60F5d0  ONLINE       0     0     0
             c5t50014EE0AC407BAEd0  ONLINE       0     0     0
           mirror-4                 ONLINE       0     0     0
             c5t50014EE056EB0B94d0  ONLINE       0     0     0
             c5t50014EE0AC408605d0  ONLINE       0     0     0
           mirror-5                 ONLINE       0     0     0
             c5t5000C50010C77647d0  ONLINE       0     0     0
             c5t5000C50010C865DEd0  ONLINE       0     0     0
           mirror-6                 ONLINE       0     0     0
             c5t5000C50010C7672Dd0  ONLINE       0     0     0
             c5t5000C50010C77679d0  ONLINE       0     0     0
           mirror-7                 ONLINE       0     0     0
             c5t50014EE001407113d0  ONLINE       0     0     0
             c5t50014EE00194FB1Bd0  ONLINE       0     0     0
           mirror-8                 ONLINE       0     0     0
             c5t5000C50008315CE5d0  ONLINE       0     0     0
             c5t5000C5000A11B600d0  ONLINE       0     0     0
         cache
           c1t2d0                   ONLINE       0     0     0
           c1t3d0                   ONLINE       0     0     0
         spares
           c5t5000C5000D46F13Dd0    AVAIL

 From c5t5000C50010C77647d0 to c5t5000C50008315CE5d0 are the 6 Seagate
drives, they are 2 ST31000340AS and 4 ST31000340NS.   The rest of the
drives are all WD RE3 (WD1002FBYS).

Could those Seagate's really be slowing down the array that much or
there is something else in here that I should be trying to look at?  I
did the same dd on the main OS pool (2 mirrors) and got 63MB/s ..
times 8 mirrors should give me 504MBs reads?

tl;dr: My tank of 8 mirrors is giving 180MB writes, how to fix?!

--
Mohammed Naser — vexxhost
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to