[zfs-discuss] raidz2 read performance
We are using the following equipment: - 12 x WD RE3 1TB SATA - 1 x lsi 1068e HBA - supermicro expander - xeon 5520 / 12gb memory We're having very slow read performance on our san/nas. We have one raidz2 pool of 12 devices. We use the pool for iscsi ( xenserver virtual machines) + cifs share. The pool feels very unresponsive and especially the cifs share is slow. bonnie benchmark: Version 1.03b --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP san 25000M 335044 49 71691 16 128556 13 204.9 1 Is it true that a raidz2 pool has a read capacity equal to the slowest disk's IOPs per second ?? A 128KB block in a 12-wide raidz2 vdev will be split into 128/(12-2) = 12.8KB chunks and this affects performance? What bothers me is iostat, asvc_t shows most of the time values 10, but then jumps to 10 even when the disks are relative idle. extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 4.1 11.2 32.0 21.7 0.0 0.00.02.4 0 3 c0t10d0 4.1 12.4 33.7 22.9 0.0 0.00.02.6 0 3 c0t11d0 3.8 12.7 32.8 23.5 0.0 0.00.02.2 0 3 c0t12d0 4.0 12.0 33.7 22.6 0.0 0.00.02.6 0 3 c0t13d0 4.2 11.0 32.3 21.9 0.0 0.00.02.5 0 3 c0t14d0 4.2 11.6 32.8 23.3 0.0 0.00.02.2 0 3 c0t15d0 3.8 11.9 32.8 22.7 0.0 0.00.02.3 0 3 c0t16d0 4.1 11.7 33.9 23.3 0.0 0.00.02.1 0 2 c0t18d0 4.1 12.3 32.7 22.7 0.0 0.00.02.2 0 3 c0t19d0 3.8 11.1 32.2 21.6 0.0 0.00.02.3 0 3 c0t17d0 3.8 10.6 32.6 20.9 0.0 0.00.02.5 0 3 c0t21d0 4.1 11.2 34.0 21.9 0.0 0.00.02.2 0 3 c0t20d0 0.06.30.0 26.5 0.0 0.00.00.2 0 0 c1t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.70.40.72.6 0.0 0.00.0 10.3 0 1 c0t10d0 0.70.40.62.6 0.0 0.00.0 12.4 0 1 c0t11d0 0.60.40.42.6 0.0 0.00.0 10.2 0 1 c0t12d0 0.40.40.32.6 0.0 0.00.06.5 0 0 c0t13d0 0.50.40.32.6 0.0 0.00.03.1 0 0 c0t14d0 0.50.40.42.6 0.0 0.00.0 10.7 0 1 c0t15d0 0.50.40.42.6 0.0 0.00.0 10.6 0 1 c0t16d0 0.50.40.52.6 0.0 0.00.09.1 0 0 c0t18d0 0.50.40.52.6 0.0 0.00.0 10.7 0 1 c0t19d0 0.70.40.62.6 0.0 0.00.0 12.3 0 1 c0t17d0 0.70.40.52.5 0.0 0.00.0 11.6 0 1 c0t21d0 0.70.40.62.6 0.0 0.00.0 10.2 0 1 c0t20d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.40.00.20.0 0.0 0.00.0 18.0 0 0 c0t10d0 0.60.00.50.0 0.0 0.00.0 11.4 0 1 c0t11d0 0.60.00.60.0 0.0 0.00.0 13.4 0 0 c0t12d0 0.60.00.60.0 0.0 0.00.0 20.9 0 1 c0t13d0 0.70.00.60.0 0.0 0.00.0 16.4 0 1 c0t14d0 0.70.00.50.0 0.0 0.00.0 13.5 0 1 c0t15d0 0.70.00.50.0 0.0 0.00.0 11.1 0 1 c0t16d0 0.60.00.30.0 0.0 0.00.0 10.8 0 1 c0t18d0 0.30.00.20.0 0.0 0.00.0 16.7 0 0 c0t19d0 0.70.00.40.0 0.0 0.00.0 17.1 0 1 c0t17d0 0.60.00.50.0 0.0 0.00.0 13.5 0 1 c0t21d0 0.50.00.40.0 0.0 0.00.0 14.4 0 1 c0t20d0 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Troubleshooting help on ZFS
I have a home media server set up using OpenSolaris. All my experience with OpenSolaris has been through setting up and maintaining this server so it is rather limited. I have run in to some problems recently and I am not sure how the best way to troubleshoot this. I was hoping to get some feedback on possible fixes for this. I am running SunOS 5.11 snv_134. It is running on a tower with 6 HDD configured in as raidz2 array. Motherboard: ECS 945GCD-M(1.0) Intel Atom 330 Intel 945GC Micro ATX Motherboard/CPU Combo. Memory: 4GB. I set this up about a year ago and have had very few problems. I was streaming a movie off the server a few days ago and it all of a sudden lost connectivity with the server. When I checked the server, there was no output on the display from the server but the power supply seemed to be running and the fans were going. The next day it started working again and I was able to log in. The SMB and NFS file server was connecting without problems. Now I am able to connect remotely via SSH. I am able to bring up a zpool status screen that shows no problems. It reports no known data errors. I am able to go to the top level data directories but when I cd into the sub-directories the SSH connection freezes. I have tried to do a ZFS scrub on the pool and it only gets to 0.02% and never gets beyond that but does not report any errors. Now, also, I am unable to stop the scrub. I use the zpool scrub -s command but this freezes the SSH connection. When I reboot, it is still trying to scrub but not making progress. I have the system set up to a battery back up with surge protection and I'm not aware of any spikes in electricity recently. I have not made any modifications to the system. All the drives have been run through SpinRite less than a couple months ago without any data errors. I can't figure out how this happened all of the sudden and how best to troubleshoot it. If you have any help or technical wisdom to offer, I'd appreciate it as this has been frustrating. Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting help on ZFS
On Thu, Jan 20, 2011 at 01:47, Steve Kellam opensolaris-sjksn...@sneakemail.com wrote: I have a home media server set up using OpenSolaris. All my experience with OpenSolaris has been through setting up and maintaining this server so it is rather limited. I have run in to some problems recently and I am not sure how the best way to troubleshoot this. I was hoping to get some feedback on possible fixes for this. I am running SunOS 5.11 snv_134. It is running on a tower with 6 HDD configured in as raidz2 array. Motherboard: ECS 945GCD-M(1.0) Intel Atom 330 Intel 945GC Micro ATX Motherboard/CPU Combo. Memory: 4GB. I set this up about a year ago and have had very few problems. I was streaming a movie off the server a few days ago and it all of a sudden lost connectivity with the server. When I checked the server, there was no output on the display from the server but the power supply seemed to be running and the fans were going. The next day it started working again and I was able to log in. The SMB and NFS file server was connecting without problems. Now I am able to connect remotely via SSH. I am able to bring up a zpool status screen that shows no problems. It reports no known data errors. I am able to go to the top level data directories but when I cd into the sub-directories the SSH connection freezes. I have tried to do a ZFS scrub on the pool and it only gets to 0.02% and never gets beyond that but does not report any errors. Now, also, I am unable to stop the scrub. I use the zpool scrub -s command but this freezes the SSH connection. When I reboot, it is still trying to scrub but not making progress. I have the system set up to a battery back up with surge protection and I'm not aware of any spikes in electricity recently. I have not made any modifications to the system. All the drives have been run through SpinRite less than a couple months ago without any data errors. I can't figure out how this happened all of the sudden and how best to troubleshoot it. If you have any help or technical wisdom to offer, I'd appreciate it as this has been frustrating. look in /var/adm/messages (.*) to see whether there's anything interesting around the time you saw the loss of connectivity, and also since, then take it from there. HTH Michael -- regards/mit freundlichen Grüssen Michael Schuster ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting help on ZFS
Hi Steve, Anything in: cat /var/adm/messages fmdump -ev ? ..Remco On 1/20/11 1:47 AM, Steve Kellam wrote: I have a home media server set up using OpenSolaris. All my experience with OpenSolaris has been through setting up and maintaining this server so it is rather limited. I have run in to some problems recently and I am not sure how the best way to troubleshoot this. I was hoping to get some feedback on possible fixes for this. I am running SunOS 5.11 snv_134. It is running on a tower with 6 HDD configured in as raidz2 array. Motherboard: ECS 945GCD-M(1.0) Intel Atom 330 Intel 945GC Micro ATX Motherboard/CPU Combo. Memory: 4GB. I set this up about a year ago and have had very few problems. I was streaming a movie off the server a few days ago and it all of a sudden lost connectivity with the server. When I checked the server, there was no output on the display from the server but the power supply seemed to be running and the fans were going. The next day it started working again and I was able to log in. The SMB and NFS file server was connecting without problems. Now I am able to connect remotely via SSH. I am able to bring up a zpool status screen that shows no problems. It reports no known data errors. I am able to go to the top level data directories but when I cd into the sub-directories the SSH connection freezes. I have tried to do a ZFS scrub on the pool and it only gets to 0.02% and never gets beyond that but does not report any errors. Now, also, I am unable to stop the scrub. I use the zpool scrub -s command but this freezes the SSH connection. When I reboot, it is still trying to scrub but not making progress. I have the system set up to a battery back up with surge protection and I'm not aware of any spikes in electricity recently. I have not made any modifications to the system. All the drives have been run through SpinRite less than a couple months ago without any data errors. I can't figure out how this happened all of the sudden and how best to troubleshoot it. If you have any help or technical wisdom to offer, I'd appreciate it as this has been frustrating. Thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is my bottleneck RAM?
On Tue, Jan 18, 2011 at 07:07:50AM -0800, Richard Elling wrote: I'd expect more than 105290K/s on a sequential read as a peak for a single drive, let alone a striped set. The system has a relatively decent CPU, however only 2GB memory, do you think increasing this to 4GB would noticeably affect performance of my zpool? The memory is only DDR1. 2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, turn off dedup and enable compression. Assuming 4x 3 TByte drives and 8 GByte RAM, and a lowly dual-core 1.3 GHZ AMD Neo, should I do the same? Or should I even not bother with compression? The data set is a lot of scanned documents, already compressed (TIF and PDF). I presume the incidence of identical blocks will be very low under such circumstances. Oh, and with 4x 3 TByte SATA mirrored pool is pretty much without alternative, right? -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is my bottleneck RAM?
On Jan 20, 2011, at 11:18 AM, Eugen Leitl wrote: I'd expect more than 105290K/s on a sequential read as a peak for a single drive, let alone a striped set. The system has a relatively decent CPU, however only 2GB memory, do you think increasing this to 4GB would noticeably affect performance of my zpool? The memory is only DDR1. 2GB or 4GB of RAM + dedup is a recipe for pain. Do yourself a favor, turn off dedup and enable compression. Assuming 4x 3 TByte drives and 8 GByte RAM, and a lowly dual-core 1.3 GHZ AMD Neo, should I do the same? Or should I even not bother with compression? The data set is a lot of scanned documents, already compressed (TIF and PDF). I presume the incidence of identical blocks will be very low under such circumstances. This would seem very unlikely to benefit from dedup (unless you cp the individual files to multiple directories). But if you are just keeping lots of scans the odds of a given block being identical to a lot of other ones seem low. The thing about compression is it is easy to test out (whereas dedup can be more painful to test when it doesn't work out). So you might as well try, but it would seem like dedup is a waste of time and might well cause a lot of headaches. Good luck, Ware ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] sbdadm: unknown error (Solaris 11 Express)
Hi all, I've been following the Oracle Solaris ZFS Administration guide here: http://download.oracle.com/docs/cd/E19963-01/821-1448/ftyxh/index.html I am able to create my ZFS volume but am having trouble when I get to the step of creating a LUN using sbdadm, it will inevitably return the sbdadm: unknown error message: SAN:~# zfs create -V 128k CaviarBlue/Fraps SAN:~# zfs list NAMEUSED AVAIL REFER MOUNTPOINT CaviarBlue 1.07T 294G 53.1K /CaviarBlue CaviarBlue/Fraps384K 294G 23.9K - CaviarBlue/Storage 1.06T 294G 1.06T /CaviarBlue/Storage ... SAN:~# sbdadm create-lu /dev/zvol/rdsk/CaviarBlue/Fraps sbdadm: unknown error Could someone please advise on what I'm doing wrong? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sbdadm: unknown error (Solaris 11 Express)
My first inclination is 128k is too small for a pool component. You might try something more reasonable, like 1G, if you're just testing. Thanks, Cindy # zfs create -V 2g sanpool/vol1 # stmfadm create-lu /dev/zvol/rdsk/sanpool/vol1 Logical unit created: 600144F0C49A05004CC84BE20001 On 01/20/11 09:45, Benjamin Cheng wrote: Hi all, I've been following the Oracle Solaris ZFS Administration guide here: http://download.oracle.com/docs/cd/E19963-01/821-1448/ftyxh/index.html I am able to create my ZFS volume but am having trouble when I get to the step of creating a LUN using sbdadm, it will inevitably return the sbdadm: unknown error message: SAN:~# zfs create -V 128k CaviarBlue/Fraps SAN:~# zfs list NAMEUSED AVAIL REFER MOUNTPOINT CaviarBlue 1.07T 294G 53.1K /CaviarBlue CaviarBlue/Fraps384K 294G 23.9K - CaviarBlue/Storage 1.06T 294G 1.06T /CaviarBlue/Storage ... SAN:~# sbdadm create-lu /dev/zvol/rdsk/CaviarBlue/Fraps sbdadm: unknown error Could someone please advise on what I'm doing wrong? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is my bottleneck RAM?
On Thu, Jan 20, 2011 at 8:18 AM, Eugen Leitl eu...@leitl.org wrote: Oh, and with 4x 3 TByte SATA mirrored pool is pretty much without alternative, right? You can also use raidz2, which will have a little more resiliency. With mirroring, you can lose one disk without data loss, but losing a second disk might destroy your data. With raidz2, you can lose any 2 disks, but you pay for it with somewhat lower performance. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Troubleshooting help on ZFS
On Wed, Jan 19, 2011 at 4:47 PM, Steve Kellam opensolaris-sjksn...@sneakemail.com wrote: I set this up about a year ago and have had very few problems. I was streaming a movie off the server a few days ago and it all of a sudden lost connectivity with the server. When I checked the server, there was no output on the display from the server but the power supply seemed to be running and the fans were going. The next day it started working again and I was able to log in. The SMB and NFS file server was connecting without problems. What NIC are you using? This sounds exactly like the problem that I had with the Realtek controller on a D945GCLF2. Look into using the gani drivers instead of the shipped realtek drivers. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] NFS slow for small files: idle disks
The discussion is really old: writing many small files on an nfs mounted zfs filesystem is slow without ssd zil due to the sync nature of the nfs protocol itself. But there is something I don't really understand. My tests on an old opteron box with 2 small u160 scsi arrays and a zpool with 4 mirrored vdevs built from 146gb disks show mostly idle disks when untarring an archive with many small files over nfs. Any source package can be used for this test. I'm on zpool version 22 (still sxce b130, the client is opensolaris b130), nfs mount options are all default, NFSD_SERVERS=128. Configuration of the pool is like this: zpool status ib1 pool: ib1 state: ONLINE scrub: scrub completed after 0h52m with 0 errors on Sat Jan 15 14:19:02 2011 config: NAMESTATE READ WRITE CKSUM ib1 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 zpool iostat -v shows capacity operationsbandwidth poolalloc free read write read write -- - - - - - - ib1 268G 276G 0180 0 723K mirror95.4G 40.6G 0 44 0 180K c1t4d0 - - 0 44 0 180K c3t0d0 - - 0 44 0 180K mirror95.2G 40.8G 0 44 0 180K c1t6d0 - - 0 44 0 180K c4t0d0 - - 0 44 0 180K mirror39.0G 97.0G 0 45 0 184K c3t3d0 - - 0 45 0 184K c4t3d0 - - 0 45 0 184K mirror38.5G 97.5G 0 44 0 180K c3t4d0 - - 0 44 0 180K c4t4d0 - - 0 44 0 180K -- - - - - - - So each disk gets 40-50 iops, 180 ops on the whole pool (mirrored). Note that these u320 scsi disks should be able to handle about 150 iops per disk, so theres no iops aggregation. The strange thing is the following iostat -MindexC output: extended device statistics errors --- r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.00.00.00.0 0.0 0.00.00.0 0 0 0 14 0 14 c0 0.00.00.00.0 0.0 0.00.00.0 0 0 0 14 0 14 c0t0d0 0.0 186.00.00.4 0.0 0.00.00.1 0 2 0 0 0 0 c1 0.0 93.00.00.2 0.0 0.00.00.1 0 1 0 0 0 0 c1t4d0 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 c1t5d0 0.0 93.00.00.2 0.0 0.00.00.1 0 1 0 0 0 0 c1t6d0 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 c2 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 c2t0d0 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 c2t1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 c2t2d0 0.0 279.50.00.5 0.0 0.00.00.1 0 3 0 0 0 0 c3 0.0 93.00.00.2 0.0 0.00.00.1 0 1 0 0 0 0 c3t0d0 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 c3t1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 c3t2d0 0.0 93.00.00.2 0.0 0.00.00.1 0 1 0 0 0 0 c3t3d0 0.0 93.50.00.2 0.0 0.00.00.1 0 1 0 0 0 0 c3t4d0 0.0 279.00.00.5 0.0 0.00.00.2 0 5 0 0 0 0 c4 0.0 93.00.00.2 0.0 0.00.00.3 0 3 0 0 0 0 c4t0d0 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 c4t2d0 0.0 93.00.00.2 0.0 0.00.00.1 0 1 0 0 0 0 c4t4d0 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 c4t1d0 0.0 93.00.00.2 0.0 0.00.00.1 0 1 0 0 0 0 c4t3d0 Service times for the involved disks are around 0.1-0.3 msec, I think this is the sequential write nature of zfs. The disks are at most 3% busy. When writing synchronous I'd expect 100% busy disks. And when reading or writing locally the disks really get busy, about 50 MB/sec per disk due to the 160 MB/sec scsi bus limitation per channel (there are 2 u160 channels with 3 disks each, and 1
Re: [zfs-discuss] raidz2 read performance
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bueno Is it true that a raidz2 pool has a read capacity equal to the slowest disk's IOPs per second ?? No, but there's a grain of truth there. Random reads: * If you have a single process issuing random reads, and waiting for the results of each read before issuing the next one... then your performance will be even worse than a single disk. Possibly as bad as 50%. This is the situation you are asking about, so yes it's possible for this to happen. This might happen if you decide to tar up or copy a whole directory tree, or something like that. * If you have several processes, each issuing random reads, then each disk will have a bunch of commands queued up, and each disk will fetch them all as fast as possible. Each read request will only be satisfied when all (or enough) of the disks have obtained valid data. In my benchmarking, a 5-6 disk raidzN was able to do random reads approx 2x faster than a single disk. Sequential reads: * The raidzN will destroy the performance of a single disk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss