Re: slow raid5 performance
Does anyone have any insights here? How do I interpret the seemingly competing system iowait numbers... is my system both CPU and PCI bus bound? - Original Message From: nefilim To: linux-raid@vger.kernel.org Sent: Thursday, October 18, 2007 4:45:20 PM Subject: slow raid5 performance Hi Pretty new to software raid, I have the following setup in a file server: /dev/md0: Version : 00.90.03 Creation Time : Wed Oct 10 11:05:46 2007 Raid Level : raid5 Array Size : 976767872 (931.52 GiB 1000.21 GB) Used Dev Size : 488383936 (465.76 GiB 500.11 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Oct 18 15:02:16 2007 State : active Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 9dcbd480:c5ca0550:ca45cdab:f7c9f29d Events : 0.9 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 491 active sync /dev/sdd1 2 8 652 active sync /dev/sde1 3 x 500GB WD RE2 hard drives AMD Athlon XP 2400 (2.0Ghz), 1GB RAM /dev/sd[ab] are connected to Sil 3112 controller on PCI bus /dev/sd[cde] are connected to Sil 3114 controller on PCI bus Transferring large media files from /dev/sdb to /dev/md0 I see the following with iostat: avg-cpu: %user %nice %system %iowait %steal %idle 1.010.00 55.56 40.400.003.03 Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sda 0.00 0.00 0.00 0 0 sdb 261.6231.09 0.00 30 0 sdc 148.48 0.1516.40 0 16 sdd 102.02 0.4116.14 0 15 sde 113.13 0.2916.18 0 16 md08263.64 0.0032.28 0 31 which is pretty much what I see with hdparm etc. 32MB/s seems pretty slow for drives that can easily do 50MB/s each. Read performance is better around 85MB/s (although I expected somewhat higher). So it doesn't seem that PCI bus is limiting factor here (127MB/s theoretical throughput.. 100MB/s real world?) quite yet... I see a lot of time being spent in the kernel.. and a significant iowait time. The CPU is pretty old but where exactly is the bottleneck? Any thoughts, insights or recommendations welcome! Cheers Peter -- View this message in context: http://www.nabble.com/slow-raid5-performance-tf4650085.html#a13284909 Sent from the linux-raid mailing list archive at Nabble.com. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow raid5 performance
- Original Message From: Peter Grandi [EMAIL PROTECTED] Thank you for your insightful response Peter (Yahoo spam filter hid it from me until now). Most 500GB drives can do 60-80MB/s on the outer tracks (30-40MB/s on the inner ones), and 3 together can easily swamp the PCI bus. While you see the write rates of two disks, the OS is really writing to all three disks at the same time, and it will do read-modify-write unless the writes are exactly stripe aligned. When RMW happens write speed is lower than writing to a single disk. I can understand that if a RMW happens it will effectively lower the write throughput substantially but I'm not sure entirely sure why this would happen while writing new content, I don't know enough about RAID internals. Would this be the case the majority of time? The system time is because the Linux page cache etc. is CPU bound (never mind RAID5 XOR computation, which is not that big). The IO wait is because IO is taking place. http://www.sabi.co.uk/blog/anno05-4th.html#051114 Almost all kernel developers of note have been hired by wealthy corporations who sell to people buying large servers. Then the typical system that these developers may have and also target are high ends 2-4 CPU workstations and servers, with CPUs many times faster than your PC, and on those system the CPU overhead of the page cache at speeds like yours less than 5%. My impression is that something that takes less than 5% on a developers's system does not get looked at, even if it takes 50% on your system. The Linux kernel was very efficient when most developers were using old cheap PCs themselves. scratch your itch rules. This is a rather unfortunate situation, it seems that some of the roots are forgotten, especially in a case like this where one would think running a file server on a modest CPU should be enough. I was waiting for Phenom and AM2+ motherboards to become available before relegating this X2 4600+ to file server duty, guess I'll need to stay with the slow performance for a few more months. Anyhow, try to bypass the page cache with 'O_DIRECT' or test with 'dd oflag=direct' and similar for an alterative code path. I'll give this a try, thanks. Misaligned writes and page cache CPU time most likely. What influence would adding more harddrives to this RAID have? I know in terms of a Netapp filer they always talk about spindle count for performance. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow raid5 performance
Thanks Justin, good to hear about some real world experience. - Original Message From: Justin Piszcz [EMAIL PROTECTED] To: Peter [EMAIL PROTECTED] Cc: linux-raid@vger.kernel.org Sent: Monday, October 22, 2007 9:58:16 AM Subject: Re: slow raid5 performance With SW RAID 5 on the PCI bus you are not going to see faster than 38-42 MiB/s. Especially with only three drives it may be slower than that. Forget / stop using the PCI bus and expect high transfer rates. For writes = 38-42 MiB/s sw raid5. For reads = you will get close to 120-122 MiB/s sw raid5. This is from a lot of testing going up to 400GB x 10 drives using PCI cards on a regular PCI bus. Then I went PCI-e and used faster disks to get 0.5gigabytes/sec SW raid5. Justin. On Mon, 22 Oct 2007, Peter wrote: Does anyone have any insights here? How do I interpret the seemingly competing system iowait numbers... is my system both CPU and PCI bus bound? - Original Message From: nefilim To: linux-raid@vger.kernel.org Sent: Thursday, October 18, 2007 4:45:20 PM Subject: slow raid5 performance Hi Pretty new to software raid, I have the following setup in a file server: /dev/md0: Version : 00.90.03 Creation Time : Wed Oct 10 11:05:46 2007 Raid Level : raid5 Array Size : 976767872 (931.52 GiB 1000.21 GB) Used Dev Size : 488383936 (465.76 GiB 500.11 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Oct 18 15:02:16 2007 State : active Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 9dcbd480:c5ca0550:ca45cdab:f7c9f29d Events : 0.9 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 491 active sync /dev/sdd1 2 8 652 active sync /dev/sde1 3 x 500GB WD RE2 hard drives AMD Athlon XP 2400 (2.0Ghz), 1GB RAM /dev/sd[ab] are connected to Sil 3112 controller on PCI bus /dev/sd[cde] are connected to Sil 3114 controller on PCI bus Transferring large media files from /dev/sdb to /dev/md0 I see the following with iostat: avg-cpu: %user %nice %system %iowait %steal %idle 1.010.00 55.56 40.400.003.03 Device:tpsMB_read/sMB_wrtn/sMB_read MB_wrtn sda 0.00 0.00 0.00 0 0 sdb 261.6231.09 0.00 30 0 sdc 148.48 0.1516.40 0 16 sdd 102.02 0.4116.14 0 15 sde 113.13 0.2916.18 0 16 md08263.64 0.0032.28 0 31 which is pretty much what I see with hdparm etc. 32MB/s seems pretty slow for drives that can easily do 50MB/s each. Read performance is better around 85MB/s (although I expected somewhat higher). So it doesn't seem that PCI bus is limiting factor here (127MB/s theoretical throughput.. 100MB/s real world?) quite yet... I see a lot of time being spent in the kernel.. and a significant iowait time. The CPU is pretty old but where exactly is the bottleneck? Any thoughts, insights or recommendations welcome! Cheers Peter -- View this message in context: http://www.nabble.com/slow-raid5-performance-tf4650085.html#a13284909 Sent from the linux-raid mailing list archive at Nabble.com. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow raid5 performance
Peter wrote: Thanks Justin, good to hear about some real world experience. Hi Peter, I recently built a 3 drive RAID5 using the onboard SATA controllers on an MCP55 based board and get around 115MB/s write and 141MB/s read. A fourth drive was added some time later and after growing the array and filesystem (XFS), saw 160MB/s write and 178MB/s read, with the array 60% full. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow raid5 performance
On Tue, 23 Oct 2007, Richard Scobie wrote: Peter wrote: Thanks Justin, good to hear about some real world experience. Hi Peter, I recently built a 3 drive RAID5 using the onboard SATA controllers on an MCP55 based board and get around 115MB/s write and 141MB/s read. A fourth drive was added some time later and after growing the array and filesystem (XFS), saw 160MB/s write and 178MB/s read, with the array 60% full. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Yes, your chipset must be PCI-e based and not PCI. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow raid5 performance
On Mon, 22 Oct 2007 15:33:09 -0400 (EDT), Justin Piszcz [EMAIL PROTECTED] said: [ ... speed difference between PCI and PCIe RAID HAs ... ] I recently built a 3 drive RAID5 using the onboard SATA controllers on an MCP55 based board and get around 115MB/s write and 141MB/s read. A fourth drive was added some time later and after growing the array and filesystem (XFS), saw 160MB/s write and 178MB/s read, with the array 60% full. jpiszcz Yes, your chipset must be PCI-e based and not PCI. Broadly speaking yes (the MCP55 is a PCIe chipset), but it is more complicated than that. The south bridge chipset host adapters often have a rather faster link to memory and the CPU interconnect than the PCI or PCIe buses can provide, even when they are externally ''PCI''. Also, when the RAID HA is not in-chipset it also matters a fair bit how many lanes the PCIe slot (or whether it is PCI-X 64 bit and 66MHz) it is plugged in has -- most PCIe RAID HAs can use 4 or 8 lanes (or equivalent for PCI-X). - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow raid5 performance
On Thu, 18 Oct 2007 16:45:20 -0700 (PDT), nefilim [EMAIL PROTECTED] said: [ ... ] 3 x 500GB WD RE2 hard drives AMD Athlon XP 2400 (2.0Ghz), 1GB RAM [ ... ] avg-cpu: %user %nice %system %iowait %steal %idle 1.010.00 55.56 40.400.003.03 [ ... ] which is pretty much what I see with hdparm etc. 32MB/s seems pretty slow for drives that can easily do 50MB/s each. Read performance is better around 85MB/s (although I expected somewhat higher). So it doesn't seem that PCI bus is limiting factor here Most 500GB drives can do 60-80MB/s on the outer tracks (30-40MB/s on the inner ones), and 3 together can easily swamp the PCI bus. While you see the write rates of two disks, the OS is really writing to all three disks at the same time, and it will do read-modify-write unless the writes are exactly stripe aligned. When RMW happens write speed is lower than writing to a single disk. I see a lot of time being spent in the kernel.. and a significant iowait time. The system time is because the Linux page cache etc. is CPU bound (never mind RAID5 XOR computation, which is not that big). The IO wait is because IO is taking place. http://www.sabi.co.uk/blog/anno05-4th.html#051114 Almost all kernel developers of note have been hired by wealthy corporations who sell to people buying large servers. Then the typical system that these developers may have and also target are high ends 2-4 CPU workstations and servers, with CPUs many times faster than your PC, and on those system the CPU overhead of the page cache at speeds like yours less than 5%. My impression is that something that takes less than 5% on a developers's system does not get looked at, even if it takes 50% on your system. The Linux kernel was very efficient when most developers were using old cheap PCs themselves. scratch your itch rules. Anyhow, try to bypass the page cache with 'O_DIRECT' or test with 'dd oflag=direct' and similar for an alterative code path. The CPU is pretty old but where exactly is the bottleneck? Misaligned writes and page cache CPU time most likely. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html