Sorry to plug my own blog but have you had a look at these ? http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to (raidz) http://blogs.sun.com/roller/page/roch?entry=the_dynamics_of_zfs
Also, my thinking is that raid-z is probably more friendly when the config contains (power-of-2 + 1) disks (or + 2 for raid-z2). -r Jonathan Wheeler writes: > Hi All, > > I've just built an 8 disk zfs storage box, and I'm in the testing > phase before I put it into production. I've run into some unusual > results, and I was hoping the community could offer some > suggestions. I've bascially made the switch to Solaris on the promises > of ZFS alone (yes I'm that excited about it!), so naturally I'm > looking forward to some great performance - but it appears I'm going > to need some help finding all of it. > > I was having even lower numbers with filebench, so I decided to dial > back to a really simple app for testing - bonnie. > > The system is an nevada_41 EM64T 3ghz xeon. 1GB ram, with 8x seagate > sata II 300GB disks, Supermicro SAT2-MV8 8 port sata controller, > running at/on a 133Mhz 64pci-x bus. > The bottle neck here, by my thinkng, should be the disks themselves. > It's not the disk interfaces ('300MB'), the disk bus (300MB EACH), the > pci-x bus (1.1GB), and I'd hope a 64-bit 3Ghz cpu would be sufficent. > > Tests were run on a fresh clean zpool, on an idle system. Rogue > results were dropped, and as you can see below, all tests were run > more then once. 8GB should be far more then the 1GB of RAM that the > system has, eliminating caching issues. > > If I've still managed to overlook something in my testing setup, > please let me know - I sure did try! > > Sorry about the formatting - this is bound to end up ugly > > Bonnie > -------Sequential Output-------- ---Sequential Input-- > --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- > --Seeks--- > raid0 MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 8 disk 8196 78636 93.0 261804 64.2 125585 25.6 72160 95.3 246172 19.1 > 286.0 2.0 > 8 disk 8196 79452 93.9 286292 70.2 129163 26.0 72422 95.5 243628 18.9 > 302.9 2.1 > > so ~270MB/sec writes - awesome! 240MB/sec reads though - why would this be > LOWER then writes?? > > -------Sequential Output-------- ---Sequential Input-- > --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- > --Seeks--- > mirror MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 8 disk 8196 33285 38.6 46033 9.9 33077 6.8 67934 90.4 93445 7.7 230.5 > 1.3 > 8 disk 8196 34821 41.4 46136 9.0 32445 6.6 67120 89.1 94403 6.9 210.4 > 1.8 > > 46MB/sec writes, each disk individually can do better, but I guess > keeping 8 disks in sync is hurting performance. The 94MB/sec writes is > interesting. One the one hand, that's greater then 1 disk's worth, so > I'm getting striping performance out of a mirror GO ZFS. On the other, > if I can get striping performance from mirrored reads, why is it only > 94MB/sec? Seemingly it's not cpu bound. > > > Now for the important test, raid-z > > -------Sequential Output-------- ---Sequential Input-- > --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- > --Seeks--- > raidz MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec > %CPU > 8 disk 8196 61785 70.9 142797 29.3 89342 19.9 64197 85.7 320554 32.6 131.3 > 1.0 > 8 disk 8196 62869 72.4 131801 26.7 90692 20.7 63986 85.7 306152 33.4 127.3 > 1.0 > 8 disk 8196 63103 72.9 128164 25.9 86175 19.4 64126 85.7 320410 32.7 124.5 > 0.9 > 7 disk 8196 51103 58.8 93815 19.1 74093 16.1 64705 86.5 331865 32.8 124.9 > 1.0 > 7 disk 8196 49446 56.8 93946 18.7 73092 15.8 64708 86.7 331458 32.7 127.1 > 1.0 > 7 disk 8196 49831 57.1 81305 16.2 78101 16.9 64698 86.4 331577 32.7 132.4 > 1.0 > 6 disk 8196 62360 72.3 157280 33.4 99511 21.9 65360 87.3 288159 27.1 132.7 > 0.9 > 6 disk 8196 63291 72.8 152598 29.1 97085 21.4 65546 87.2 292923 26.7 133.4 > 0.8 > 4 disk 8196 57965 67.9 123268 27.6 78712 17.1 66635 89.3 189482 15.9 134.1 > 0.9 > > I'm getting distinctly non-linear scaling here. > > Writes: 4 disks gives me 123MB/sec. Raid0 was giving me 270/8 > =33Mb/sec with cpu to spare (roughly half on what each individual disk > should be capable of). Here I'm getting 123/4= 30Mb/sec, or should > that be 123/3= 41Mb/sec? > Using 30 as a basline, I'd be expecting to see twice that with 8 disks > (240ish?). What I end up with is ~135, Clearly not good scaling at all. > The really interesting numbers happen at 7 disks - it's slower then > with 4, in all tests. > I ran it 3x to be sure. > Note this was a native 7 disk raid-z, it wasn't 8 running in degraded > mode with 7. Something is really wrong with my write performance here > across the board. > > Reads: 4 disks gives me 190MB/sec. WOAH! I'm very happy with that. 8 > disks should scale to 380 then, Well 320 isn't all that far off - no > biggie. > Looking at the 6 disk raidz is interesting though, 290MB/sec. The > disks are good for 60+MB/sec individually. 290 is 48/disk - note also > that this is better then my raid0 performance?! > Adding another 2 disks to my raidz gives me a mere 30Mb/sec extra > performance? Something is going very wrong here too. > > The 7 disk raidz read test is about what I'd expect (330/7= 47/disk), but it > shows that the 8 disk is actually going backwards. > > hmm... > > > I understand that going for an 8 disk wide raidz isn't optimal in terms of > redundancy and IOPS/sec - but my workload shouldn't involve large amounts of > sustained random IO, so I'm happy to take the loss in favour of absolute > capacity. > My issue here is the scaling on sequential block transfers, not optimal > design. > > All three raid levels have had unexpected results, and I'll really > apprectiate some suggestions on how I can troubleshoot this. I know how to > run iostat while bonnie is running, but that's about it. Incidentally, > iostat is telling me that the disks are at best on hitting around 70% B. > With the 8 disk tests, it was often below 50%.... > > Is my issue perhaps with the sata card that I'm using? Maybe it's just not > able to handle that much throughput, despite being advertised to do so. With > Raid0 (aka dynamic stripes), I know that each disk can read at 60-70Mb/sec. > Why am I not getting 65*8 (500MB/sec+) performance. Maybe it's the marvell > driver at fault here? > > My thinking is that I need to get raid0 performing as expected before > looking at raidz, but I'm afraid I really don't know where to begin. > > All thoughts & suggestions welcome. I'm not using the disks yet, so I can > blow the zpool away as needed. > > Many thanks, > Jonathan Wheeler > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss