Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread Forrest Taylor
On Wed, 2005-08-31 at 15:34, David M. Strang wrote: > - Original Message - > From: Michael Tokarev > To: David M. Strang > Cc: Linux RAID > Sent: Wednesday, August 31, 2005 5:29 PM > Subject: Re: raid5: Disk failure on sdm, disabling device > > Michael Tokarev wrote: > > David M. Strang w

Re: Where is the performance bottleneck?

2005-08-31 Thread Holger Kiehl
On Thu, 1 Sep 2005, Nick Piggin wrote: Holger Kiehl wrote: meminfo.dump: MemTotal: 8124172 kB MemFree: 23564 kB Buffers: 7825944 kB Cached: 19216 kB SwapCached: 0 kB Active: 25708 kB Inactive: 7835548 kB HighTotal:

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread David M. Strang
- Original Message - From: Michael Tokarev To: David M. Strang Cc: Linux RAID Sent: Wednesday, August 31, 2005 5:29 PM Subject: Re: raid5: Disk failure on sdm, disabling device Michael Tokarev wrote: David M. Strang wrote: [] > Is there something a little deeper to this error message

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread Michael Tokarev
David M. Strang wrote: [] Is there something a little deeper to this error message? Aug 31 04:48:15 abyss kernel: scsi2 (12:0): rejecting I/O to offline device Aug 31 04:48:15 abyss kernel: raid5: Disk failure on sdm, disabling device. If you reread my message, I hope you will find a bit of c

Re: Where is the performance bottleneck?

2005-08-31 Thread Dr. David Alan Gilbert
* Holger Kiehl ([EMAIL PROTECTED]) wrote: > There is however one difference, here I had set > /sys/block/sd?/queue/nr_requests to 4096. Well from that it looks like none of the queues get about 255 (hmm that's a round number) > avg-cpu: %user %nice%sys %iowait %idle >0.1

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread David M. Strang
Is there something a little deeper to this error message? Aug 31 04:48:15 abyss kernel: scsi2 (12:0): rejecting I/O to offline device Aug 31 04:48:15 abyss kernel: raid5: Disk failure on sdm, disabling device. This error would make me think that the disk is bad. Have you tested the disk, or

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread Forrest Taylor
On Wed, 2005-08-31 at 15:01, David M. Strang wrote: > Is there something a little deeper to this error message? > > Aug 31 04:48:15 abyss kernel: scsi2 (12:0): rejecting I/O to offline device > Aug 31 04:48:15 abyss kernel: raid5: Disk failure on sdm, disabling device. This error would make me th

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread Forrest Taylor
On Wed, 2005-08-31 at 14:53, David M. Strang wrote: > > The output is odd... It only sees 7 of the 28 devices. It looks like the script only checks for devices 0-7 (idsearch=`seq 0 7`) It looks like you can pass it a values for the number of devices: ./rescan-scsi-bus.sh -ids=27 See if that fi

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread David M. Strang
On Wed, 2005-08-31 at 14:55, Forrest Taylor wrote: On Wed, 2005-08-31 at 14:45, David M. Strang wrote: > On Wed, 2005-08-31 at 14:39, Forrest Taylor wrote: > If you are running 2.4 kernel, did you try running rescan-scsi-bus.sh > (http://www.garloff.de/kurt/linux/rescan-scsi-bus.sh)? uname -ar

Re: Where is the performance bottleneck?

2005-08-31 Thread Holger Kiehl
On Wed, 31 Aug 2005, Dr. David Alan Gilbert wrote: * Holger Kiehl ([EMAIL PROTECTED]) wrote: On Wed, 31 Aug 2005, Jens Axboe wrote: Full vmstat session can be found under: Have you got iostat? iostat -x 10 might be interesting to see for a period while it is going. The following is the re

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread Forrest Taylor
On Wed, 2005-08-31 at 14:45, David M. Strang wrote: > On Wed, 2005-08-31 at 14:39, Forrest Taylor wrote: > > If you are running 2.4 kernel, did you try running rescan-scsi-bus.sh > > (http://www.garloff.de/kurt/linux/rescan-scsi-bus.sh)? > > uname -ar > Linux abyss 2.6.11.12 #2 SMP Thu Jul 21 07:4

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread David M. Strang
On Wed, 2005-08-31 at 14:45, David M. Strang wrote: On Wed, 2005-08-31 at 14:39, Forrest Taylor wrote: If you are running 2.4 kernel, did you try running rescan-scsi-bus.sh (http://www.garloff.de/kurt/linux/rescan-scsi-bus.sh)? uname -ar Linux abyss 2.6.11.12 #2 SMP Thu Jul 21 07:49:40 UTC 20

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread David M. Strang
On Wed, 2005-08-31 at 14:39, Forrest Taylor wrote: If you are running 2.4 kernel, did you try running rescan-scsi-bus.sh (http://www.garloff.de/kurt/linux/rescan-scsi-bus.sh)? uname -ar Linux abyss 2.6.11.12 #2 SMP Thu Jul 21 07:49:40 UTC 2005 i686 unknown unknown GNU/Linux Will it work on 2

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread Forrest Taylor
On Wed, 2005-08-31 at 14:28, David M. Strang wrote: > It's a SCSI drive; in's a Dell 220F enclosure connected via a QLA2200 > adapter. I've pulled the bad disk (tho not 'yellow' at the hardware level; > and re-inserted it -- but to no avail. It did cause a reset; but the device > remains 'disabl

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread Michael Tokarev
David M. Strang wrote: [] It's a SCSI drive; in's a Dell 220F enclosure connected via a QLA2200 adapter. I've pulled the bad disk (tho not 'yellow' at the hardware level; and re-inserted it -- but to no avail. It did cause a reset; but the device remains 'disabled'. Aug 31 11:48:05 abyss kern

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread David M. Strang
Michael Tokarev wrote: Please don't top-post. My apologies; Outlook Express (/gasp) isn't the most condusive to mailing list replies. Well, you didn't mention that it's disappeared from the system (as opposed to the raid array only). And no, I for one don't know how to re-add it - there a

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread Michael Tokarev
David M. Strang wrote: mdadm --remove /dev/md0 /dev/sdm mdadm: hot removed /dev/sdm mdadm --add /dev/md0 /dev/sdm mdadm: Cannot open /dev/sdm: No such device or address The device is disabled at a 'hardware?' level -- I can't even cfdisk /dev/sdm Please don't top-post. Well, you didn't menti

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread David M. Strang
mdadm --remove /dev/md0 /dev/sdm mdadm: hot removed /dev/sdm mdadm --add /dev/md0 /dev/sdm mdadm: Cannot open /dev/sdm: No such device or address The device is disabled at a 'hardware?' level -- I can't even cfdisk /dev/sdm - Original Message - From: Michael Tokarev To: David M. Stra

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread David M. Strang
Aug 31 04:48:15 abyss kernel: md: excessive errors occurred during superblock update, exiting Aug 31 04:48:15 abyss kernel: scsi2 (12:0): rejecting I/O to offline device Aug 31 04:48:15 abyss kernel: raid5: Disk failure on sdm, disabling device. Operation continuing on 27 devices However, in t

Re: 2 partition kicked from 6 raid5

2005-08-31 Thread Michael Tokarev
Deak Krisztian wrote: Hi, i have a big problem. A had a sw raid5 array with 6 partition. The array has been damaged because problems of power connectors. 2 partitions/disks don't work, because during the time the SATA power connections didn't work. Some write/read operations where running. Now

Re: raid5: Disk failure on sdm, disabling device

2005-08-31 Thread Michael Tokarev
David M. Strang wrote: Okay, my array is degraded -- and the device /dev/sdm is disabled. Short of a reboot, is there a way to re-enable the device? It's already flagged as faulty in mdadm. man mdadm mdadm --remove /dev/mdN /dev/sdm mdadm --add/dev/mdN /dev/sdm /mjt - To unsubscribe from

Re: Where is the performance bottleneck?

2005-08-31 Thread Holger Kiehl
On Wed, 31 Aug 2005, Jens Axboe wrote: On Wed, Aug 31 2005, Holger Kiehl wrote: # ./oread /dev/sdX and it will read 128k chunks direct from that device. Run on the same drives as above, reply with the vmstat info again. Using kernel 2.6.12.5 again, here the results: [snip] Ok, reads as ex

Re: Where is the performance bottleneck?

2005-08-31 Thread Ming Zhang
forgot to attach lspci output. it is a 133MB PCI-X card but only run at 66MHZ. quick question, where I can check if it is running at 64bit? 66MHZ * 32Bit /8 * 80% bus utilization ~= 211MB/s then match the upper speed I meet now... Ming 02:01.0 SCSI storage controller: Marvell MV88SX5081 8-por

Re: Where is the performance bottleneck?

2005-08-31 Thread Ming Zhang
join the party. ;) 8 400GB SATA disk on same Marvel 8 port PCIX-133 card. P4 CPU. Supermicro SCT board. # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10] [faulty] md0 : active raid0 sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1] sda [0] 31256

Re: Where is the performance bottleneck?

2005-08-31 Thread Michael Tokarev
Holger Kiehl wrote: > On Wed, 31 Aug 2005, Jens Axboe wrote: > >> On Wed, Aug 31 2005, Holger Kiehl wrote: >> [] >>> I used the following command reading from all 8 disks in parallel: >>> >>>dd if=/dev/sd?1 of=/dev/null bs=256k count=78125 >>> >>> Here vmstat output (I just cut something out i

Re: Where is the performance bottleneck?

2005-08-31 Thread Jens Axboe
On Wed, Aug 31 2005, Holger Kiehl wrote: > ># ./oread /dev/sdX > > > >and it will read 128k chunks direct from that device. Run on the same > >drives as above, reply with the vmstat info again. > > > Using kernel 2.6.12.5 again, here the results: [snip] Ok, reads as expected, like the buffered io

Re: Where is the performance bottleneck?

2005-08-31 Thread Jens Axboe
On Wed, Aug 31 2005, jmerkey wrote: > > 512 is not enough. It has to be larger. I just tried 512 and it still > limits the data rates. Please don't top post. 512 wasn't the point, setting it properly is the point. If you need more than 512, go ahead. This isn't Holger's problem, though, the rea

Re: Where is the performance bottleneck?

2005-08-31 Thread Nick Piggin
Holger Kiehl wrote: meminfo.dump: MemTotal: 8124172 kB MemFree: 23564 kB Buffers: 7825944 kB Cached: 19216 kB SwapCached: 0 kB Active: 25708 kB Inactive: 7835548 kB HighTotal: 0 kB HighFree:0 kB

Re: Where is the performance bottleneck?

2005-08-31 Thread jmerkey
512 is not enough. It has to be larger. I just tried 512 and it still limits the data rates. Jeff Jens Axboe wrote: On Wed, Aug 31 2005, jmerkey wrote: I have seen an 80GB/sec limitation in the kernel unless this value is changed in the SCSI I/O layer for 3Ware and other controllers d

Re: Where is the performance bottleneck?

2005-08-31 Thread Jens Axboe
On Wed, Aug 31 2005, jmerkey wrote: > > > I have seen an 80GB/sec limitation in the kernel unless this value is > changed in the SCSI I/O layer > for 3Ware and other controllers during testing of 2.6.X series kernels. > > Change these values in include/linux/blkdev.h and performance goes from

Re: Where is the performance bottleneck?

2005-08-31 Thread jmerkey
I'll try this approach as well. On 2.4.X kernels, I had to change nr_requests to achieve performance, but I noticed it didn't seem to work as well on 2.6.X. I'll retry the change with nr_requests on 2.6.X. Thanks Jeff Tom Callahan wrote: From linux-kernel mailing list. Don't do th

Re: Where is the performance bottleneck?

2005-08-31 Thread Tom Callahan
>From linux-kernel mailing list. Don't do this. BLKDEV_MIN_RQ sets the size of the mempool reserved requests and will only get slightly used in low memory conditions, so most memory will probably be wasted. Change /sys/block/xxx/queue/nr_requests Tom Callahan TESSCO Technologies (443)-50

Re: Where is the performance bottleneck?

2005-08-31 Thread Holger Kiehl
On Wed, 31 Aug 2005, Jens Axboe wrote: On Wed, Aug 31 2005, Holger Kiehl wrote: On Wed, 31 Aug 2005, Jens Axboe wrote: Nothing sticks out here either. There's plenty of idle time. It smells like a driver issue. Can you try the same dd test, but read from the drives instead? Use a bigger block

Re: Where is the performance bottleneck?

2005-08-31 Thread jmerkey
I have seen an 80GB/sec limitation in the kernel unless this value is changed in the SCSI I/O layer for 3Ware and other controllers during testing of 2.6.X series kernels. Change these values in include/linux/blkdev.h and performance goes from 80MB/S to over 670MB/S on the 3Ware controller.

Re: Where is the performance bottleneck?

2005-08-31 Thread Holger Kiehl
On Wed, 31 Aug 2005, Nick Piggin wrote: Holger Kiehl wrote: 3236497 total 1.4547 2507913 default_idle 52248.1875 158752 shrink_zone 43.3275 121584 copy_user_generic_c 3199.5789

Re: Where is the performance bottleneck?

2005-08-31 Thread Jens Axboe
On Wed, Aug 31 2005, Holger Kiehl wrote: > On Wed, 31 Aug 2005, Jens Axboe wrote: > > >Nothing sticks out here either. There's plenty of idle time. It smells > >like a driver issue. Can you try the same dd test, but read from the > >drives instead? Use a bigger blocksize here, 128 or 256k. > > > I

Re: Where is the performance bottleneck?

2005-08-31 Thread Dr. David Alan Gilbert
* Holger Kiehl ([EMAIL PROTECTED]) wrote: > On Wed, 31 Aug 2005, Jens Axboe wrote: > > Full vmstat session can be found under: Have you got iostat? iostat -x 10 might be interesting to see for a period while it is going. Dave -- -Open up your eyes, open up your mind, open up your code

Re: Where is the performance bottleneck?

2005-08-31 Thread Holger Kiehl
On Wed, 31 Aug 2005, Jens Axboe wrote: Nothing sticks out here either. There's plenty of idle time. It smells like a driver issue. Can you try the same dd test, but read from the drives instead? Use a bigger blocksize here, 128 or 256k. I used the following command reading from all 8 disks in

Re: Where is the performance bottleneck?

2005-08-31 Thread Holger Kiehl
On Wed, 31 Aug 2005, Vojtech Pavlik wrote: On Tue, Aug 30, 2005 at 08:06:21PM +, Holger Kiehl wrote: How does one determine the PCI-X bus speed? Usually only the card (in your case the Symbios SCSI controller) can tell. If it does, it'll be most likely in 'dmesg'. There is nothing in dm

Re: Where is the performance bottleneck?

2005-08-31 Thread Nick Piggin
Holger Kiehl wrote: 3236497 total 1.4547 2507913 default_idle 52248.1875 158752 shrink_zone 43.3275 121584 copy_user_generic_c 3199.5789 34271 __wake_up_bit

Re: Where is the performance bottleneck?

2005-08-31 Thread Jens Axboe
On Wed, Aug 31 2005, Holger Kiehl wrote: > >>>Ok, I did run the following dd command in different combinations: > >>> > >>> dd if=/dev/zero of=/dev/sd?1 bs=4k count=500 > >> > >>I think a bs of 4k is way too small and will cause huge CPU overhead. > >>Can you try with something like 4M? Also,

Re: Where is the performance bottleneck?

2005-08-31 Thread Holger Kiehl
On Wed, 31 Aug 2005, Jens Axboe wrote: On Wed, Aug 31 2005, Vojtech Pavlik wrote: On Tue, Aug 30, 2005 at 08:06:21PM +, Holger Kiehl wrote: How does one determine the PCI-X bus speed? Usually only the card (in your case the Symbios SCSI controller) can tell. If it does, it'll be most lik

Re: Where is the performance bottleneck?

2005-08-31 Thread Jens Axboe
On Wed, Aug 31 2005, Vojtech Pavlik wrote: > On Tue, Aug 30, 2005 at 08:06:21PM +, Holger Kiehl wrote: > > >>How does one determine the PCI-X bus speed? > > > > > >Usually only the card (in your case the Symbios SCSI controller) can > > >tell. If it does, it'll be most likely in 'dmesg'. > > >

Re: Where is the performance bottleneck?

2005-08-31 Thread Vojtech Pavlik
On Tue, Aug 30, 2005 at 08:06:21PM +, Holger Kiehl wrote: > >>How does one determine the PCI-X bus speed? > > > >Usually only the card (in your case the Symbios SCSI controller) can > >tell. If it does, it'll be most likely in 'dmesg'. > > > There is nothing in dmesg: > >Fusion MPT base dr