Re: Trying to start dirty, degraded RAID6 array

2006-04-26 Thread Christopher Smith

Neil Brown wrote:

The '-f' is meant to make this work.  However it seems there is a bug.

Could you please test this patch?  It isn't exactly the right fix, but
it definitely won't hurt.


Thanks, Neil, I'll give this a go when I get home tonight.

Is there any way to start an array without kicking off a rebuild ?

CS
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Trying to start dirty, degraded RAID6 array

2006-04-26 Thread Christopher Smith

The short version:

I have a 12-disk RAID6 array that has lost a device and now whenever I 
try to start it with:


mdadm -Af /dev/md0 /dev/sd[abcdefgijkl]1

I get:

mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

And in dmesg:

md: bind
md: bind
md: bind
md: bind
md: bind
md: bind
md: bind
md: bind
md: bind
md: bind
md: bind
md: md0: raid array is not clean -- starting background reconstruction
raid6: device sdl1 operational as raid disk 0
raid6: device sdc1 operational as raid disk 11
raid6: device sda1 operational as raid disk 10
raid6: device sdd1 operational as raid disk 9
raid6: device sdb1 operational as raid disk 8
raid6: device sdg1 operational as raid disk 6
raid6: device sdf1 operational as raid disk 5
raid6: device sde1 operational as raid disk 4
raid6: device sdj1 operational as raid disk 3
raid6: device sdi1 operational as raid disk 2
raid6: device sdk1 operational as raid disk 1
raid6: cannot start dirty degraded array for md0
RAID6 conf printout:
 --- rd:12 wd:11 fd:1
 disk 0, o:1, dev:sdl1
 disk 1, o:1, dev:sdk1
 disk 2, o:1, dev:sdi1
 disk 3, o:1, dev:sdj1
 disk 4, o:1, dev:sde1
 disk 5, o:1, dev:sdf1
 disk 6, o:1, dev:sdg1
 disk 8, o:1, dev:sdb1
 disk 9, o:1, dev:sdd1
 disk 10, o:1, dev:sda1
 disk 11, o:1, dev:sdc1
raid6: failed to run raid set md0
md: pers->run() failed ...


I'm 99% sure the data is ok and I'd like to know how to force the array 
online.




Longer version:

A couple of days ago I started having troubles with my fileserver 
mysteriously hanging during boot (I was messing with trying to get Xen 
running at the time, so lots of reboots were involved).  I finally 
nailed it down to the autostarting of the RAID array.


After several hours of pulling CPUs, SATA cards, RAM (not to mention 
some scary problems with memtest86+ that turned out to be because "USB 
Legacy" was enabled) I finally managed to figure out that one of my 
drives would simply stop transferring data after about the first gig 
(tested with dd, monitoring with iostat).  About 30 seconds after the 
drive "stops", the rest of the machine also hangs.


Interestingly, there are no error messages anywhere I could find 
indicating the drive was having problem.  Even its SMART test (smartctl 
-t long) says it's ok.  This made the problem substantially more 
difficult to figure out.


I then tried to start the array without the broken disk and had the 
problem mentioned in the short version above - the array wouldn't start, 
presumably because its rebuild had been started and (uncleanly) stopped 
about a dozen times since it last succeeeded.  I finally managed to get 
the array online by starting it with all the disks, then immediately 
knocking the one I knew to be bad offline with 'mdadm /dev/md0 -f 
/dev/sdh1' before it hit the point where it would hang.  After that the 
rebuild completed without error (I didn't touch the machine at all while 
it was rebuilding).


However, a few hours after the rebuild completed, a power failure killed 
the machine again and now I can't start the array, as outlined in the 
"short version" above.  I must admit I find it a bit weird that the 
array is "dirty and degraded" after it had successfully completed a rebuild.


Unfortunately the original failed drive (/dev/sdh) is no longer 
available, so I can't do my original trick again.  I'm pretty sure - 
based on the rebuild completing previously - that the data will be fine 
if I can just get the array back online, is there some sort of 
--really-force switch to mdadm ?  Can the array be brought back online 
*without* triggering a rebuild, so I can get as much data as possible 
off and then start from scratch again ?


CS

Here is the 'mdadm --examine /dev/sdX' output for each of the remaining 
drives, if it is helpful:


/dev/sda1:
 Magic : a92b4efc
   Version : 00.90.02
  UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298
 Creation Time : Wed Feb  1 01:09:11 2006
Raid Level : raid6
   Device Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 2441959040 (2328.83 GiB 2500.57 GB)
  Raid Devices : 12
 Total Devices : 11
Preferred Minor : 0

   Update Time : Wed Apr 26 22:30:01 2006
 State : active
 Active Devices : 11
Working Devices : 11
 Failed Devices : 1
 Spare Devices : 0
  Checksum : 1685ebfc - correct
Events : 0.11176511


 Number   Major   Minor   RaidDevice State
this10   81   10  active sync   /dev/sda1

  0 0   8  1770  active sync   /dev/sdl1
  1 1   8  1611  active sync   /dev/sdk1
  2 2   8  1292  active sync   /dev/sdi1
  3 3   8  1453  active sync   /dev/sdj1
  4 4   8   654  active sync   /dev/sde1
  5 5   8   815  active sync   /dev/sdf1
  6 6   8   976  active sync   /dev/sdg1
  7 7   007  faulty removed
  8 8   8   178  

Re: Recommendations for supported 4-port SATA PCI card ?

2006-04-01 Thread Christopher Smith

Brad Campbell wrote:
I've been running 3 together in one box for about 18 months, and four in 
another for a year now... the on board BIOS will only pickup 8 drives, 
but they work just fine under Linux and recognise all connected drives.


What distro and kernel ?

I tried this about 2 - 3 months ago and had problems whenever more than 
two cards were in my system, even if only a single drive was installed. 
 I posted about it here (search for "multiple promise sata150 tx4 
cards" back in January).


The symptoms were ata timeouts:

ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x0c { DriveStatusError }
ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x0c { DriveStatusError }
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x0c { DriveStatusError }
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x0c { DriveStatusError }

This would happen consistently with three controllers in, never with 
two.  I tried every possible combination of controllers and drives I 
could think of, to eliminate any potential of broken hardware being the 
cause.


I should probably give it another go, given there have been a couple of 
minor kernel versions since then, but I'm surprised to hear you've had 
it working for so long - no-one was able to give me a solution to my 
problem at the time (I ended up getting a pair of two-channel SATA 
cards) and I assumed it was a driver bug of some description.  Promise, 
of course, were useless, saying more than a single controller was an 
unsupported configuration.


I have 11 drives in one box (on promise.. 2 on the on-board VIA and 1 on 
PATA) and 15 drives in another (all across 4 promise cards).. all on 
SATA150-TX4 cards..


Performance sucks.. but then when you put 15 drives on a single PCI 
33Mhz bus what do you expect ?
Great for streaming media though.. (and cheap, and very reliable). The 
only media errors I get are on the VIA controller. The promise 
controllers have not had a single media error since they were installed.


I have no complaints about the performance (relatively speaking, of 
course), but I've got the cards in a machine with multiple PCI-X busses, 
so it's not really bottlenecked there.


CS
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problems with multiple Promise SATA150 TX4 cards

2006-01-25 Thread Christopher Smith

Erik Slagter wrote:

On Tue, 2006-01-24 at 17:40 +, David Greaves wrote:

sounds like a spinup time on marginal power to me.



No, it's a limitation of the Promise BIOS on the cards, it will only
detect a maximum of 8 drives. I had a quick convo with tech support
from Promise over this and they told me they don't support more than
one card in a machine in any case. (Which is odd given they advertise
the ability to RAID-5 across 2 cards!)






I'd really consider the PSU. I had all sorts of weird problems with my
promise SATA150 TX2plus until I replaced the PSU. Apparently it doesn't
suffice to supply _enough_ power.


I've since setup the machine so only the motherboard and boot drive are 
powered from the system PSU and the 12 SATA drives are powered from a 
separate PSU.


Since the machine has previously been running with 8 drives fine on just 
the system PSU, I feel confident saying power supply has nothing to do 
with my problems.


CS
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problems with multiple Promise SATA150 TX4 cards

2006-01-24 Thread Christopher Smith

Christopher Smith wrote:

Brad Campbell wrote:
Can you send an lspci -vv please? I did have some strange problems 
with the BIOS setting up weird timing modes on some of the cards. This 
did not present a reliability problem for me, just performance however.


I attached lspci output to my original post.  I have also included it on 
the end of this one (with a slight difference regarding which slots the 
cards were in, but that makes no difference to the problem).


Oops, forgot to attach lspci output - that's what you get for posting 
just before bed :).


[EMAIL PROTECTED] ~]# lspci -vv
00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01)
Subsystem: Intel Corporation E7501 Memory Controller Hub
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
Latency: 0
Capabilities: [40] Vendor Specific Information

00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B 
PCI-to-PCI Bridge (rev 01) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
Latency: 64
Bus: primary=00, secondary=01, subordinate=03, sec-latency=0
I/O behind bridge: a000-bfff
Memory behind bridge: fc40-fc6f
Prefetchable memory behind bridge: ff30-ff5f
Secondary status: 66Mhz+ FastB2B+ ParErr- DEVSEL=medium 
>TAbort- 
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-

00:03.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface C 
PCI-to-PCI Bridge (rev 01) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
Latency: 64
Bus: primary=00, secondary=04, subordinate=06, sec-latency=0
I/O behind bridge: c000-cfff
Memory behind bridge: fc70-fc9f
Prefetchable memory behind bridge: ff60-ff8f
Secondary status: 66Mhz+ FastB2B+ ParErr- DEVSEL=medium 
>TAbort- 
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-

00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 
02) (prog-if 00 [UHCI])

Subsystem: Gateway 2000: Unknown device 891f
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium 
>TAbort- SERR- 
Latency: 0
Interrupt: pin A routed to IRQ 209
Region 4: I/O ports at ec00 [size=32]

00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 
02) (prog-if 00 [UHCI])

Subsystem: Gateway 2000: Unknown device 891f
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium 
>TAbort- SERR- 
Latency: 0
Interrupt: pin B routed to IRQ 217
Region 4: I/O ports at e880 [size=32]

00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 
02) (prog-if 00 [UHCI])

Subsystem: Gateway 2000: Unknown device 891f
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium 
>TAbort- SERR- 
Latency: 0
Interrupt: pin C routed to IRQ 169
Region 4: I/O ports at e800 [size=32]

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42) (prog-if 
00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
Latency: 0
Bus: primary=00, secondary=07, subordinate=07, sec-latency=32
I/O behind bridge: d000-dfff
Memory behind bridge: fca0-feaf
Prefetchable memory behind bridge: ff90-ff9f
Secondary status: 66Mhz- FastB2B+ ParErr- DEVSEL=medium 
>TAbort- 
BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-

00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller 
(rev 02)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium 
>TAbort- SERR- 
Latency: 0

00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage 
Controller (rev 02) (prog-if 8a [Master SecP PriP])

Subsystem: Gateway 2000: Unknown device 891f
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: 

Re: Problems with multiple Promise SATA150 TX4 cards

2006-01-24 Thread Christopher Smith

Brad Campbell wrote:


I have 3 cards with 12 drives in one box, and 4 card with 15 drives in 
another.
They work just dandy. They are not the fastest machines in the world, 
and the PCI but sometime groans under the strain, but it's reliable and 
error-free.


Are these the same cards I have ?

Can you send an lspci -vv please? I did have some strange problems with 
the BIOS setting up weird timing modes on some of the cards. This did 
not present a reliability problem for me, just performance however.


I attached lspci output to my original post.  I have also included it on 
the end of this one (with a slight difference regarding which slots the 
cards were in, but that makes no difference to the problem).


My 1st quick and dirty test would be to boot with a UP kernel. (Only 
because that is all I have also) And to try a vanilla kernel.org kernel 
rather than the Redhat one. (I have one machine on 2.6.10 and one on 
2.6.15-git11. Both are solid)


I have tried the latest Fedora Core 4 kernel, both SMP and UP.  I have 
also tried their 2.6.11 UP kernel (2.6.11-1.1369_FC4).


All exhibit the problem, although it appears that the 2.6.11 kernel 
takes slightly longer for it to appear (maybe 10 vs 5 seconds).


I have not tested with a vanilla kernel.  I'll try to do it tomorrow 
(although I suspect it won't help).



bklaptop:~>ssh storage1 uname -a
Linux storage1 2.6.15-git11 #1 Sun Jan 15 22:25:19 GST 2006 i686 GNU/Linux
bklaptop:~>ssh srv uname -a
Linux srv 2.6.10 #4 Mon Feb 14 23:10:38 GST 2005 i686 GNU/Linux

Are you using the cards in standard PCI 33Mhz Slots? I recall an issue a 
while ago where someone had a big problem with the cards in 66Mhz Slots.


The cards were all in PCI-X slots ranging from 64/66 to 64/133.

I tried placing one of the cards in the only regular 32/33 PCI slot my 
motherboard has and it does not help (this is the configuration where 
the attached lspci was taken).


Another test I'd like you to try if you would, is place one or two 
drives on each controller, so you only have 3 in the system.. and then 
try to reproduce the error.


This configuration also produces the error.

Something else I tried was some crappy dual-port SIL-based SATA card 
with two of the Promise TX4s, and that worked without a problem.  While 
I'm waiting to find out what this is, I might buy another one and use 
the two of them temporarily so I can build my RAID array, at least.


Thanks for your help.

CS
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problems with multiple Promise SATA150 TX4 cards

2006-01-23 Thread Christopher Smith

Mark Hahn wrote:
I have shuffled the cards,cables and physical drives around to determine 
that this is not a problem with any of them individually - no matter the 
combination, it only ever happens to drives that are at sd[abcd] (ie: if 
I rejig the hardware so the drive at /dev/sdh, which was working fine, 
is on a different cable and controller, but appears at /dev/sdb, it will 
produce the errors).


did you test the case where all disks had power, but only 8 were plugged 
into controllers? 

no individual card, cable or drive was responsible.  The errors _only_ 
occur with three cards in the system, _only_ with whichever drives are 
attached to the "first" controller (ie: sd[abcd]) and _regardless_ of 
other system activity.


the "first" card would correspond to position on the PCI bus (slot),
so perhaps that card is getting iffy power.  but did you actually move
around which power cables are supplying which disks?


Power supply was also one of my suspicions, so I tried powering up half 
a dozen of the drives off another ATX power supply I had and the 
remainder off the system PSU.  The same problems occurred, which I think 
rules out the possibility of insufficient power (I did try with all 
drives powered, but only two cards installed and that worked fine - but 
I haven't moved around the power plugs of individual drives).


I'll try again tonight with all the drives powered off their own PSU, 
just to be sure.


CS
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Problems with multiple Promise SATA150 TX4 cards

2006-01-23 Thread Christopher Smith
This is probably not entirely the right list for this query, but I 
figure there's enough people here who have experience in the right 
places and have probably tried this sort of thing themselves.  Apologies 
to anyone whose time I waste.


Or, at the very least, I'm sure someone here can point me in the right 
direction :).


I currently have a machine with two Promise SATA150 TX4 cards (PDC20318, 
http://www.promise.com/product/product_detail_eng.asp?segment=undefined&product_id=98#) 
and a 4-drive RAID5 array attached to each.  Since I was nearly out of 
space, I decided it was time to add another 4 drives and expand again. 
So, I bought another TX4 card and another 4 SATA drives and plonked them 
in the machine, thinking it would be as easy as the last time I did it 
(going from 4 to 8 drives).


The first problem is that the Promise cards' onboard BIOS(es) only 
recognise(s) (or, at least, list) 8 of the 12 drives in the machine at 
boot.  However, once Linux has booted it detects all three cards and all 
twelve drives, so this is a relatively insignificant issue.


The second (major) problem is whenever I try to access drives attached 
to the "first" controller (ie: /dev/sd[abcd], I get ATA timeout errors 
like these:


ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x0c { DriveStatusError }
ata3: status=0x51 { DriveReady SeekComplete Error }
ata3: error=0x0c { DriveStatusError }
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x0c { DriveStatusError }
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x0c { DriveStatusError }

This _only_ happens when accessing drives attached to the "first" 
controller.  I can have 8 simultaneous 'dd if=/dev/sdX of=/dev/null' 
happening to the other 8 drives for minutes at a time without a problem, 
but as soon as I fire up a dd to /dev/sd[abcd], I get the errors listed 
above within seconds.


Additionally, a dd to only /dev/sd[abcd] with no other system activity 
also produces the errors - again within seconds.


I have shuffled the cards,cables and physical drives around to determine 
that this is not a problem with any of them individually - no matter the 
combination, it only ever happens to drives that are at sd[abcd] (ie: if 
I rejig the hardware so the drive at /dev/sdh, which was working fine, 
is on a different cable and controller, but appears at /dev/sdb, it will 
produce the errors).


Removing cards so there are only one or two in the system results in no 
errors.  Similarly, I tried shuffling the hardware around to verify that 
no individual card, cable or drive was responsible.  The errors _only_ 
occur with three cards in the system, _only_ with whichever drives are 
attached to the "first" controller (ie: sd[abcd]) and _regardless_ of 
other system activity.


I'm trying to locate where the problem is so I can suggest to the right 
people that they fix it :).  Any suggestions people might have that 
could possibly workaround (PCI timings ?) or further contacts will be 
gratefully accepted and tried.


Cheers,
CS

PS: Here are the outputs of 'dmesg' and 'lspci -vvv' for my system:

dmesg:
[EMAIL PROTECTED] ~]# dmesg
Linux version 2.6.14-1.1656_FC4smp ([EMAIL PROTECTED]) 
(gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #1 SMP Thu Jan 5 22:24:06 
EST 2006

BIOS-provided physical RAM map:
 BIOS-e820:  - 0009d800 (usable)
 BIOS-e820: 0009d800 - 0009f800 (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 7fff (usable)
 BIOS-e820: 7fff - 7000 (ACPI data)
 BIOS-e820: 7000 - 8000 (ACPI NVS)
 BIOS-e820: fec0 - fed0 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: fff8 - 0001 (reserved)
1151MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000ff780
Using x86 segment limits to approximate NX protection
On node 0 totalpages: 524272
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 225280 pages, LIFO batch:31
  HighMem zone: 294896 pages, LIFO batch:31
DMI 2.3 present.
Using APIC driver default
ACPI: RSDP (v000 ACPIAM) @ 0x000f62f0
ACPI: RSDT (v001 A M I  OEMRSDT  0x01000412 MSFT 0x0097) @ 0x7fff
ACPI: FADT (v001 A M I  OEMFACP  0x01000412 MSFT 0x0097) @ 0x7fff0200
ACPI: MADT (v001 A M I  OEMAPIC  0x01000412 MSFT 0x0097) @ 0x7fff0300
ACPI: OEMB (v001 A M I  OEMBIOS  0x01000412 MSFT 0x0097) @ 0x7040
ACPI: DSDT (v001  0AAYB 0AAYB007 0x0007 MSFT 0x010d) @ 0x
ACPI: PM-Timer IO Port: 0x408
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
Processor #6 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
Processor #1 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x04] l

Re: Raid sync observations

2005-12-20 Thread Christopher Smith

Sebastian Kuzminsky wrote:

I just created a RAID array (4-disk RAID-6).  When "mdadm -C" returned,
/proc/mdstat showed it syncing the new array at about 17 MB/s.  "vmstat 1"
showed hardly any blocks in or out, and an almost completely idle cpu.


This isn't really relevant to your questions but...

Why would you use RAID6 and not RAID10 with four disks ?

CS
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split RAID1 during backups?

2005-10-26 Thread Christopher Smith

Jeff Breidenbach wrote:

Hi all,



[...]


So - I'm thinking of the following backup scenario.  First, remount
/dev/md0 readonly just to be safe. Then mount the two component
paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
component partition, and tell the backup process to work from the
other component partition. Once the backup is complete, point the
webserver back at /dev/md0, unmount the component partitions, then
switch read-write mode back on.


Isn't this just the sort of scenario LVM snapshots are meant for ?  It 
might not help with the duration aspect, but it will mean your services 
aren't down/non-redundant for the entire time it takes to backup.



Everything on this system seems bottlenecked by disk I/O. That
includes the rate web pages are served as well as the backup process
described above. While I'm always hungry for perforance tips, faster
backups are the current focus. For those interested in gory details
such as drive types, NCQ settings, kernel version and whatnot, I
dumped a copy of dmesg output here: http://www.jab.org/dmesg


I think this might be one of those situations where SCSI really does 
offer a significant performance advantage, although if you're actually 
filling up that 500G, it'll be a quite a bit more expensive.


See if you can get hold of a reasonably sized array using SCSI drives 
and do some comparitive benchmarking.


You might also want to experiment with different filesystems, although 
that may not be feasible...


CS
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware RAID (was Re: RAID resync stalled at 99.7% ?)

2005-09-02 Thread Christopher Smith

Daniel Pittman wrote:

Christopher Smith <[EMAIL PROTECTED]> writes:

[...]



The components are 12x400GB drives attached to a 3ware 9500s-12
controller.  They are configured as "single disks" on the controller,
ie: no hardware RAID is involved.



A quick question for you, because I have a client looking at 3ware RAID
hardware at the moment:

Why are you running this as software RAID, rather than using the
hardware on the 3ware card?


Because after doing some preliminary benchmarks, I've found Linux's 
software RAID to be significantly faster than 3ware's hardware RAID (at 
the sacrifice of higher CPU usage, but since the machine has a fairly 
fast CPU and doesn't do anything else, that's a sacrifice I'm happy to 
make).


I have some iozone and bonnie++ results, but they're at work and I'm at 
home - I'll post them tomorrow.


CS
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID resync stalled at 99.7% ?

2005-09-02 Thread Christopher Smith
In doing some benchmarking, I've found a curious problem - after
creating an array the resync has stalled at 99.7%:

[EMAIL PROTECTED] ~]# cat /proc/mdstat
Personalities : [raid6]
md0 : active raid6 sdm1[11] sdl1[10] sdk1[9] sdj1[8] sdi1[7] sdh1[6]
sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
  4963200 blocks level 6, 32k chunk, algorithm 2 [12/12]
[]
  [===>.]  resync = 99.7% (496320/496349)
finish=0.0min speed=628K/sec
unused devices: 
[EMAIL PROTECTED] ~]# 

It's been sitting like this for some time now, and since the resync up
until this point progress at about 15M/sec, I can't see any reason to
think it will suddenly finish.

mdadm -S /dev/md0 simply hangs.

This problem is reproducible as well - if I reboot the machine the
resync will complete successfully, then if I delete it and try to create
another array, exactly the same thing will happen.

It's also not a problem with, for example, bad sectors on one of the
components, as creating a larger array stalls right near the end as well
(the exact percentage varies, but it's always around the 99% part).

Does anyone have any ideas ?



Some relevant info:

The command used to create the aray was:

mdadm -C /dev/md0 -l6 -n12 -c 32 -z 496349 /dev/sd[b-m]1

It's a Fedora Core 4 box:

[EMAIL PROTECTED] ~]# uname -a
Linux justinstalled.syd.nighthawkrad.net 2.6.12-1.1398_FC4smp #1 SMP Fri
Jul 15 01:30:13 EDT 2005 i686 i686 i386 GNU/Linux
[EMAIL PROTECTED]
~]# 

The components are 12x400GB drives attached to a 3ware 9500s-12
controller.  They are configured as "single disks" on the controller,
ie: no hardware RAID is involved.

Regards,
Chris Smith

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Migrating from non-RAID to RAID-1

2005-08-09 Thread Christopher Smith

Shaun Jackman wrote:

I have a single non-RAID SATA drive with Debian (Sarge) installed and
data on it. I also have a duplicate blank drive. I would like to
migrate from my non-RAID system to a RAID-1 (mirrored) system for
redundancy, with each disk being an exact duplicate of the other. Is
it possible to do this without having to wipe the first drive clean?
I'd appreciate a pointer to a HOWTO or recipe if it answers this
specific question.


This should get you started:

http://xtronics.com/reference/SATA-RAID-debian-for-2.6.html
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html