Sorry to plug my own blog but have you had a look at these ?

        http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to (raidz)
        http://blogs.sun.com/roller/page/roch?entry=the_dynamics_of_zfs

Also, my thinking is that raid-z is probably more friendly
when the config contains (power-of-2 + 1) disks (or + 2 for
raid-z2).

-r

Jonathan Wheeler writes:
 > Hi All,
 > 
 > I've just built an 8 disk zfs storage box, and I'm in the testing
 > phase before I put it into production. I've run into some unusual
 > results, and I was hoping the community could offer some
 > suggestions. I've bascially made the switch to Solaris on the promises
 > of ZFS alone (yes I'm that excited about it!), so naturally I'm
 > looking forward to some great performance - but it appears I'm going
 > to need some help finding all of it. 
 > 
 > I was having even lower numbers with filebench, so I decided to dial
 > back to a really simple app for testing - bonnie. 
 > 
 > The system is an nevada_41 EM64T 3ghz xeon. 1GB ram, with 8x seagate
 > sata II 300GB disks, Supermicro SAT2-MV8 8 port sata controller,
 > running at/on a 133Mhz 64pci-x bus. 
 > The bottle neck here, by my thinkng, should be the disks themselves. 
 > It's not the disk interfaces ('300MB'), the disk bus (300MB EACH), the
 > pci-x bus (1.1GB), and I'd hope a 64-bit 3Ghz cpu would be sufficent. 
 > 
 > Tests were run on a fresh clean zpool, on an idle system. Rogue
 > results were dropped, and as you can see below, all tests were run
 > more then once. 8GB should be far more then the 1GB of RAM that the
 > system has, eliminating caching issues. 
 > 
 > If I've still managed to overlook something in my testing setup,
 > please let me know - I sure did try! 
 > 
 > Sorry about the formatting - this is bound to end up ugly
 > 
 > Bonnie
 >               -------Sequential Output-------- ---Sequential Input-- 
 > --Random--
 >               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- 
 > --Seeks---
 > raid0    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
 > 8 disk   8196 78636 93.0 261804 64.2 125585 25.6 72160 95.3 246172 19.1 
 > 286.0  2.0
 > 8 disk   8196 79452 93.9 286292 70.2 129163 26.0 72422 95.5 243628 18.9 
 > 302.9  2.1
 > 
 > so ~270MB/sec writes - awesome! 240MB/sec reads though - why would this be 
 > LOWER then writes??
 > 
 >               -------Sequential Output-------- ---Sequential Input-- 
 > --Random--
 >               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- 
 > --Seeks---
 > mirror   MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
 > 8 disk   8196 33285 38.6 46033  9.9 33077  6.8 67934 90.4  93445  7.7 230.5  
 > 1.3 
 > 8 disk   8196 34821 41.4 46136  9.0 32445  6.6 67120 89.1  94403  6.9 210.4  
 > 1.8
 > 
 > 46MB/sec writes, each disk individually can do better, but I guess
 > keeping 8 disks in sync is hurting performance. The 94MB/sec writes is
 > interesting. One the one hand, that's greater then 1 disk's worth, so
 > I'm getting striping performance out of a mirror GO ZFS. On the other,
 > if I can get striping performance from mirrored reads, why is it only
 > 94MB/sec? Seemingly it's not cpu bound. 
 > 
 > 
 > Now for the important test, raid-z
 > 
 >               -------Sequential Output-------- ---Sequential Input-- 
 > --Random--
 >               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- 
 > --Seeks---
 > raidz      MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec 
 > %CPU
 > 8 disk   8196 61785 70.9 142797 29.3 89342 19.9 64197 85.7 320554 32.6 131.3 
 >  1.0
 > 8 disk   8196 62869 72.4 131801 26.7 90692 20.7 63986 85.7 306152 33.4 127.3 
 >  1.0
 > 8 disk   8196 63103 72.9 128164 25.9 86175 19.4 64126 85.7 320410 32.7 124.5 
 >  0.9
 > 7 disk   8196 51103 58.8  93815 19.1 74093 16.1 64705 86.5 331865 32.8 124.9 
 >  1.0
 > 7 disk   8196 49446 56.8  93946 18.7 73092 15.8 64708 86.7 331458 32.7 127.1 
 >  1.0
 > 7 disk   8196 49831 57.1  81305 16.2 78101 16.9 64698 86.4 331577 32.7 132.4 
 >  1.0
 > 6 disk   8196 62360 72.3 157280 33.4 99511 21.9 65360 87.3 288159 27.1 132.7 
 >  0.9
 > 6 disk   8196 63291 72.8 152598 29.1 97085 21.4 65546 87.2 292923 26.7 133.4 
 >  0.8
 > 4 disk   8196 57965 67.9 123268 27.6 78712 17.1 66635 89.3 189482 15.9 134.1 
 >  0.9
 > 
 > I'm getting distinctly non-linear scaling here.
 > 
 > Writes: 4 disks gives me 123MB/sec. Raid0 was giving me 270/8
 > =33Mb/sec with cpu to spare (roughly half on what each individual disk
 > should be capable of). Here I'm getting 123/4= 30Mb/sec, or should
 > that be 123/3= 41Mb/sec? 
 > Using 30 as a basline, I'd be expecting to see twice that with 8 disks 
 > (240ish?). What I end up with is ~135, Clearly not good scaling at all.
 > The really interesting numbers happen at 7 disks - it's slower then
 > with 4, in all tests. 
 > I ran it 3x to be sure.
 > Note this was a native 7 disk raid-z, it wasn't 8 running in degraded
 > mode with 7.  Something is really wrong with my write performance here 
 > across the board.
 > 
 > Reads: 4 disks gives me 190MB/sec. WOAH! I'm very happy with that. 8
 > disks should scale to 380 then, Well 320 isn't all that far off - no
 > biggie. 
 > Looking at the 6 disk raidz is interesting though, 290MB/sec. The
 > disks are good for 60+MB/sec individually. 290 is 48/disk - note also
 > that this is better then my raid0 performance?! 
 > Adding another 2 disks to my raidz gives me a mere 30Mb/sec extra
 > performance? Something is going very wrong here too. 
 > 
 > The 7 disk raidz read test is about what I'd expect (330/7= 47/disk), but it 
 > shows that the 8 disk is actually going backwards.
 > 
 > hmm...
 > 
 > 
 > I understand that going for an 8 disk wide raidz isn't optimal in terms of 
 > redundancy and IOPS/sec - but my workload shouldn't involve large amounts of 
 > sustained random IO, so I'm happy to take the loss in favour of absolute 
 > capacity.
 > My issue here is the scaling on sequential block transfers, not optimal 
 > design.
 > 
 > All three raid levels have had unexpected results, and I'll really 
 > apprectiate some suggestions on how I can troubleshoot this. I know how to 
 > run iostat while bonnie is running, but that's about it. Incidentally, 
 > iostat is telling me that the disks are at best on hitting around 70% B. 
 > With the 8 disk tests, it was often below 50%....
 > 
 > Is my issue perhaps with the sata card that I'm using? Maybe it's just not 
 > able to handle that much throughput, despite being advertised to do so. With 
 > Raid0 (aka dynamic stripes), I know that each disk can read at 60-70Mb/sec. 
 > Why am I not getting 65*8 (500MB/sec+) performance. Maybe it's the marvell 
 > driver at fault here?
 > 
 > My thinking is that I need to get raid0 performing as expected before 
 > looking at raidz, but I'm afraid I really don't know where to begin.
 > 
 > All thoughts & suggestions welcome. I'm not using the disks yet, so I can 
 > blow the zpool away as needed.
 > 
 > Many thanks,
 > Jonathan Wheeler
 >  
 >  
 > This message posted from opensolaris.org
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to