Re: slow raid5 performance

2007-10-22 Thread Peter
Does anyone have any insights here? How do I interpret the seemingly competing 
system  iowait numbers... is my system both CPU and PCI bus bound? 

- Original Message 
From: nefilim
To: linux-raid@vger.kernel.org
Sent: Thursday, October 18, 2007 4:45:20 PM
Subject: slow raid5 performance



Hi

Pretty new to software raid, I have the following setup in a file
 server:

/dev/md0:
Version : 00.90.03
  Creation Time : Wed Oct 10 11:05:46 2007
 Raid Level : raid5
 Array Size : 976767872 (931.52 GiB 1000.21 GB)
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Thu Oct 18 15:02:16 2007
  State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : 9dcbd480:c5ca0550:ca45cdab:f7c9f29d
 Events : 0.9

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1

3 x 500GB WD RE2 hard drives
AMD Athlon XP 2400 (2.0Ghz), 1GB RAM
/dev/sd[ab] are connected to Sil 3112 controller on PCI bus
/dev/sd[cde] are connected to Sil 3114 controller on PCI bus

Transferring large media files from /dev/sdb to /dev/md0 I see the
 following
with iostat:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.010.00   55.56   40.400.003.03

Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda   0.00 0.00 0.00  0  0
sdb 261.6231.09 0.00 30  0
sdc 148.48 0.1516.40  0 16
sdd 102.02 0.4116.14  0 15
sde 113.13 0.2916.18  0 16
md08263.64 0.0032.28  0 31
 
which is pretty much what I see with hdparm etc. 32MB/s seems pretty
 slow
for drives that can easily do 50MB/s each. Read performance is better
 around
85MB/s (although I expected somewhat higher). So it doesn't seem that
 PCI
bus is limiting factor here (127MB/s theoretical throughput.. 100MB/s
 real
world?) quite yet... I see a lot of time being spent in the kernel..
 and a
significant iowait time. The CPU is pretty old but where exactly is the
bottleneck? 

Any thoughts, insights or recommendations welcome!

Cheers
Peter
-- 
View this message in context:
 http://www.nabble.com/slow-raid5-performance-tf4650085.html#a13284909
Sent from the linux-raid mailing list archive at Nabble.com.

-
To unsubscribe from this list: send the line unsubscribe linux-raid
 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow raid5 performance

2007-10-22 Thread Peter

- Original Message 

From: Peter Grandi [EMAIL PROTECTED]



Thank you for your insightful response Peter (Yahoo spam filter hid it from me 
until now). 



 Most 500GB drives can do 60-80MB/s on the outer tracks

 (30-40MB/s on the inner ones), and 3 together can easily swamp

 the PCI bus. While you see the write rates of two disks, the OS

 is really writing to all three disks at the same time, and it

 will do read-modify-write unless the writes are exactly stripe

 aligned. When RMW happens write speed is lower than writing to a

 single disk.



I can understand that if a RMW happens it will effectively lower the write 
throughput substantially but I'm not sure entirely sure why this would happen 
while  writing new content, I don't know enough about RAID internals. Would 
this be the case the majority of time?



 The system time is because the Linux page cache etc. is CPU

 bound (never mind RAID5 XOR computation, which is not that

 big). The IO wait is because IO is taking place.



  http://www.sabi.co.uk/blog/anno05-4th.html#051114



 Almost all kernel developers of note have been hired by wealthy

 corporations who sell to people buying large servers. Then the

 typical system that these developers may have and also target

 are high ends 2-4 CPU workstations and servers, with CPUs many

 times faster than your PC, and on those system the CPU overhead

 of the page cache at speeds like yours less than 5%.



 My impression is that something that takes less than 5% on a

 developers's  system does not get looked at, even if it takes 50%

 on your system. The Linux kernel was very efficient when most

 developers were using old cheap PCs themselves. scratch your

 itch rules.



This is a rather unfortunate situation, it seems that some of the roots are 
forgotten, especially in a case like this where one would think running a file 
server on a modest CPU should be enough. I was waiting for Phenom and AM2+ 
motherboards to become available before relegating this X2 4600+ to file server 
duty, guess I'll need to stay with the slow performance for a few more months. 



 Anyhow, try to bypass the page cache with 'O_DIRECT' or test

 with 'dd oflag=direct' and similar for an alterative code path.



I'll give this a try, thanks.



 Misaligned writes and page cache CPU time most likely.



What influence would adding more harddrives to this RAID have? I know in terms 
of a Netapp filer they  always talk about spindle count for performance. 



-

To unsubscribe from this list: send the line unsubscribe linux-raid  in

the body of a message to [EMAIL PROTECTED]

More majordomo info at  http://vger.kernel.org/majordomo-info.html















-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow raid5 performance

2007-10-22 Thread Peter
Thanks Justin, good to hear about some real world experience. 

- Original Message 
From: Justin Piszcz [EMAIL PROTECTED]
To: Peter [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Monday, October 22, 2007 9:58:16 AM
Subject: Re: slow raid5 performance


With SW RAID 5 on the PCI bus you are not going to see faster than
 38-42 
MiB/s.  Especially with only three drives it may be slower than that. 
Forget / stop using the PCI bus and expect high transfer rates.

For writes = 38-42 MiB/s sw raid5.
For reads = you will get close to 120-122 MiB/s sw raid5.

This is from a lot of testing going up to 400GB x 10 drives using PCI 
cards on a regular PCI bus.

Then I went PCI-e and used faster disks to get 0.5gigabytes/sec SW
 raid5.

Justin.

On Mon, 22 Oct 2007, Peter wrote:

 Does anyone have any insights here? How do I interpret the seemingly
 competing system  iowait numbers... is my system both CPU and PCI bus
 bound?

 - Original Message 
 From: nefilim
 To: linux-raid@vger.kernel.org
 Sent: Thursday, October 18, 2007 4:45:20 PM
 Subject: slow raid5 performance



 Hi

 Pretty new to software raid, I have the following setup in a file
 server:

 /dev/md0:
Version : 00.90.03
  Creation Time : Wed Oct 10 11:05:46 2007
 Raid Level : raid5
 Array Size : 976767872 (931.52 GiB 1000.21 GB)
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 3
  Total Devices : 3
 Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Thu Oct 18 15:02:16 2007
  State : active
 Active Devices : 3
 Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : 9dcbd480:c5ca0550:ca45cdab:f7c9f29d
 Events : 0.9

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1

 3 x 500GB WD RE2 hard drives
 AMD Athlon XP 2400 (2.0Ghz), 1GB RAM
 /dev/sd[ab] are connected to Sil 3112 controller on PCI bus
 /dev/sd[cde] are connected to Sil 3114 controller on PCI bus

 Transferring large media files from /dev/sdb to /dev/md0 I see the
 following
 with iostat:

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.010.00   55.56   40.400.003.03

 Device:tpsMB_read/sMB_wrtn/sMB_read  
  MB_wrtn
 sda   0.00 0.00 0.00  0
  0
 sdb 261.6231.09 0.00 30
  0
 sdc 148.48 0.1516.40  0
 16
 sdd 102.02 0.4116.14  0
 15
 sde 113.13 0.2916.18  0
 16
 md08263.64 0.0032.28  0
 31

 which is pretty much what I see with hdparm etc. 32MB/s seems pretty
 slow
 for drives that can easily do 50MB/s each. Read performance is better
 around
 85MB/s (although I expected somewhat higher). So it doesn't seem that
 PCI
 bus is limiting factor here (127MB/s theoretical throughput.. 100MB/s
 real
 world?) quite yet... I see a lot of time being spent in the kernel..
 and a
 significant iowait time. The CPU is pretty old but where exactly is
 the
 bottleneck?

 Any thoughts, insights or recommendations welcome!

 Cheers
 Peter
 -- 
 View this message in context:
 http://www.nabble.com/slow-raid5-performance-tf4650085.html#a13284909
 Sent from the linux-raid mailing list archive at Nabble.com.

 -
 To unsubscribe from this list: send the line unsubscribe linux-raid
 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 -
 To unsubscribe from this list: send the line unsubscribe linux-raid
 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid
 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow raid5 performance

2007-10-22 Thread Richard Scobie

Peter wrote:
Thanks Justin, good to hear about some real world experience. 


Hi Peter,

I recently built a 3 drive RAID5 using the onboard SATA controllers on 
an MCP55 based board and get around 115MB/s write and 141MB/s read.


A fourth drive was added some time later and after growing the array and 
filesystem (XFS), saw 160MB/s write and 178MB/s read, with the array 60% 
full.


Regards,

Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow raid5 performance

2007-10-22 Thread Justin Piszcz



On Tue, 23 Oct 2007, Richard Scobie wrote:


Peter wrote:
Thanks Justin, good to hear about some real world experience. 


Hi Peter,

I recently built a 3 drive RAID5 using the onboard SATA controllers on an 
MCP55 based board and get around 115MB/s write and 141MB/s read.


A fourth drive was added some time later and after growing the array and 
filesystem (XFS), saw 160MB/s write and 178MB/s read, with the array 60% 
full.


Regards,

Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Yes, your chipset must be PCI-e based and not PCI.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow raid5 performance

2007-10-22 Thread Peter Grandi
 On Mon, 22 Oct 2007 15:33:09 -0400 (EDT), Justin Piszcz
 [EMAIL PROTECTED] said:

[ ... speed difference between PCI and PCIe RAID HAs ... ]

 I recently built a 3 drive RAID5 using the onboard SATA
 controllers on an MCP55 based board and get around 115MB/s
 write and 141MB/s read.  A fourth drive was added some time
 later and after growing the array and filesystem (XFS), saw
 160MB/s write and 178MB/s read, with the array 60% full.

jpiszcz Yes, your chipset must be PCI-e based and not PCI.

Broadly speaking yes (the MCP55 is a PCIe chipset), but it is
more complicated than that. The south bridge chipset host
adapters often have a rather faster link to memory and the CPU
interconnect than the PCI or PCIe buses can provide, even when
they are externally ''PCI''.

Also, when the RAID HA is not in-chipset it also matters a fair
bit how many lanes the PCIe slot (or whether it is PCI-X 64 bit
and 66MHz) it is plugged in has -- most PCIe RAID HAs can use 4
or 8 lanes (or equivalent for PCI-X).

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow raid5 performance

2007-10-20 Thread Peter Grandi
 On Thu, 18 Oct 2007 16:45:20 -0700 (PDT), nefilim
 [EMAIL PROTECTED] said:

[ ... ]

 3 x 500GB WD RE2 hard drives
 AMD Athlon XP 2400 (2.0Ghz), 1GB RAM
[ ... ]
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
1.010.00   55.56   40.400.003.03
[ ... ]
 which is pretty much what I see with hdparm etc. 32MB/s seems
 pretty slow for drives that can easily do 50MB/s each. Read
 performance is better around 85MB/s (although I expected
 somewhat higher).

 So it doesn't seem that PCI bus is limiting factor here

Most 500GB drives can do 60-80MB/s on the outer tracks
(30-40MB/s on the inner ones), and 3 together can easily swamp
the PCI bus. While you see the write rates of two disks, the OS
is really writing to all three disks at the same time, and it
will do read-modify-write unless the writes are exactly stripe
aligned. When RMW happens write speed is lower than writing to a
single disk.

 I see a lot of time being spent in the kernel.. and a
 significant iowait time.

The system time is because the Linux page cache etc. is CPU
bound (never mind RAID5 XOR computation, which is not that
big). The IO wait is because IO is taking place.

  http://www.sabi.co.uk/blog/anno05-4th.html#051114

Almost all kernel developers of note have been hired by wealthy
corporations who sell to people buying large servers. Then the
typical system that these developers may have and also target
are high ends 2-4 CPU workstations and servers, with CPUs many
times faster than your PC, and on those system the CPU overhead
of the page cache at speeds like yours less than 5%.

My impression is that something that takes less than 5% on a
developers's system does not get looked at, even if it takes 50%
on your system. The Linux kernel was very efficient when most
developers were using old cheap PCs themselves. scratch your
itch rules.

Anyhow, try to bypass the page cache with 'O_DIRECT' or test
with 'dd oflag=direct' and similar for an alterative code path.

 The CPU is pretty old but where exactly is the bottleneck?

Misaligned writes and page cache CPU time most likely.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html