Re-building an array

2007-07-13 Thread mail
Hi List,

I am very new to raid, and I am having a problem.

I made a raid10 array, but I only used 2 disks.  Since then, one failed,
and my system crashes with a kernel panic.

I copied all the data, and I would like to start over.  How can I start
from scratch? I need to get rid of my /dev/md0, fully test the discs,
and build them over again as raid1 ?

Thanks!
Rick


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux Software RAID is really RAID?

2007-07-12 Thread Johny Mail list

2007/7/4, Mark Lord [EMAIL PROTECTED]:

Tejun Heo wrote:
 Mark Lord wrote:
 I believe he said it was ICH5 (different post/thread).

 My observation on ICH5 is that if one unplugs a drive,
 then the chipset/cpu locks up hard when toggling SRST
 in the EH code.

 Specifically, it locks up at the instruction
 which restores SRST back to the non-asserted state,
 which likely corresponds to the chipset finally actually
 sending a FIS to the drive.

 A hard(ware) lockup, not software.
 That's why Intel says ICH5 doesn't do hotplug.

 OIC.  I don't think there's much left to do from the driver side then.
 Or is there any workaround?

The workaround I have, for 2.6.18.8, is to provide an offline() method
for ICH5 that polls for device present before attempting SRST.

I hope to eventually clean this up and submit it for you,
after your existing polling-hp code goes upstream.

Here's my present hack (below).  Feel free to use/ignore.

***

Implement ICH5 chipset handling for drive hot insertion/removal.
This cannot go upstream, as it conflicts with a more generic
polled-hotplug framework that is currently in development.

Hot-inserted drives are automatically detected within a second or two,
and are ready-to-use within 30 seconds or so.

Hot-removed drives are *not* noticed by the kernel until the next
time they are accessed.  If you want this to happen quickly,
then just launch a script like this from /etc/inittab at boot time:

   #!/bin/bash
   ( while ( /bin/true ) ; do /sbin/hdparm -C /dev/sd[a-z] ; sleep 5 ; done ) 
/dev/null 

Signed-off-by: Mark Lord [EMAIL PROTECTED]
---



Hello,
Thanks this patch work in my case after unplug-in my hard drive.
But the situation is strange.
The first time it functioned fine and i get this messages :
[  290.452296] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[  290.452378] ata2.00: tag 0 cmd 0xea Emask 0x4 stat 0x40 err 0x0 (timeout)
[  290.452635] ata2 (port 1): status=d0 pcs=0x0013 offline=1 delay=100 usecs
[  290.452697] ata2: soft resetting port
[  290.452787] ata2: SATA link down (SStatus 0 SControl 0)
[  290.463065] ATA: abnormal status 0x7F on port 0xCCA7
[  290.463154] ata2: EH complete
[  290.463224] sd 1:0:0:0: SCSI error: return code = 0x0004
[  290.463286] end_request: I/O error, dev sdb, sector 156248058
[  290.463362] raid1: Disk failure on sdb2, disabling device.
[  290.463365]  Operation continuing on 1 devices
[  290.465590] RAID1 conf printout:
[  290.465651]  --- wd:1 rd:2
[  290.465710]  disk 0, wo:0, o:1, dev:sda3
[  290.465767]  disk 1, wo:1, o:0, dev:sdb2
[  290.480225] RAID1 conf printout:
[  290.480281]  --- wd:1 rd:2
[  290.480370]  disk 0, wo:0, o:1, dev:sda3
[  290.619960] ata2: pcs_hotplug_poll: old=0033 new=0013
[  290.620045] ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x2 frozen
[  290.620114] ata2: (hotplug event)
[  290.620178] ata2: soft resetting port
[  290.620242] ata2: SATA link down (SStatus 0 SControl 0)
[  290.630518] ATA: abnormal status 0x7F on port 0xCCA7
[  290.630588] ata2: EH complete
[  290.630652] ata2.00: detaching (SCSI 1:0:0:0)

But with a second try when i unplug the disk (with the while loop in
background task) the unplug function wad not started, i need to change
the partition table with fdisk to get the disk offline :
[  397.764666] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[  397.770229] ata2.00: (BMDMA stat 0x21)
[  397.775771] ata2.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
[  397.781502] ata2 (port 1): status=d0 pcs=0x0013 offline=1 delay=100 usecs
[  397.787046] ata2: soft resetting port
[  397.792614] ata2: SATA link up unknown (SStatus 0 SControl 0)
[  397.808327] ATA: abnormal status 0x7F on port 0xCCA7
[  397.813910] ata2: EH complete
[  397.819501] sd 1:0:0:0: SCSI error: return code = 0x0004
[  397.825049] end_request: I/O error, dev sdb, sector 32
[  397.830613] Buffer I/O error on device sdb, logical block 4
[  397.836177] Buffer I/O error on device sdb, logical block 5
[  397.841748] Buffer I/O error on device sdb, logical block 6
[  397.847315] Buffer I/O error on device sdb, logical block 7
[  397.852874] Buffer I/O error on device sdb, logical block 8
[  397.858440] Buffer I/O error on device sdb, logical block 9
[  397.864021] Buffer I/O error on device sdb, logical block 10
[  397.869579] Buffer I/O error on device sdb, logical block 11
[  397.875177] sd 1:0:0:0: SCSI error: return code = 0x0004
[  397.880734] end_request: I/O error, dev sdb, sector 0
[  397.886312] Buffer I/O error on device sdb, logical block 0
[  397.891889] lost page write due to I/O error on sdb
[  398.283654] ata2: pcs_hotplug_poll: old=0033 new=0013
[  398.289250] ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x2 frozen
[  398.294843] ata2: (hotplug event)
[  398.300433] ata2: soft resetting port
[  398.305997] ata2: SATA link up unknown (SStatus 0 SControl 0)
[  398.321732] ATA: abnormal status 0x7F on port 0xCCA7
[  398.327329] ata2: EH 

Re: Linux Software RAID is really RAID?

2007-06-27 Thread Johny Mail list

2007/6/26, Brad Campbell [EMAIL PROTECTED]:

Johny Mail list wrote:
 Hello list,
 I have a little question about software RAID on Linux.
 I have installed Software Raid on all my SC1425 servers DELL by
 believing that the md raid was a strong driver.
 And recently i make some test on a server and try to view if the RAID
 hard drive power failure work fine, so i power up my server and after
 booting and the prompt appear I disconnected the power cable of my
 SATA hard drive. Normaly the MD should eleminate the failure hard
 drive of the logical drive it build, and the server continue to work
 fine like nothing happen. Oddly the server stop to respond and i get
 this messages :
 ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
 ata4.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
 ata4: port is slow to respond, please be patient (Status 0xd0)
 ata4: port failed to respond (30sec, Status 0xd0)
 ata4: soft resetting port

 After that my system is frozen. Normaly in a basic RAID the device is
 disable in the logical RAID device (md0) and it only use the last
 disk.

cc to linux-ide added.

Unfortunately this is not an artifact of the linux raid driver, rather it 
appears to be an issue
with the SATA driver and related error recovery. Some information about what 
kernel, configuration,
drives, controller cards and other relevant system information would be good.

See the information at this URL for the sort of extra information that would be 
handy.

http://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html


Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams



Ok no problem.
I have a DELL SC1425 with a no card controler, only chipset for SATA
and FakeRAID (Intel Chipset and Adaptec for FakeRAID but i have
disable this function) :
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE
Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
Specs of the server here :
http://www.dell.com/downloads/global/products/pedge/en/sc1425_specs.pdf

Hard Drivers are 2 S-ATA Model=WDC WD800JD-75LSA0.

I use , in my kernel configuration 2.6.21.5 :
- ATA device support (CONFIG_ATA)
-- Intel ESB, ICH, PIIX3, PIIX4 PATA/SATA support Driver (CONFIG_ATA_PIIX)

And get this messages during my boot :
[   30.961007] libata version 2.20 loaded.
[   31.867977] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[   31.868037] ide: Assuming 33MHz system bus speed for PIO modes;
override with idebus=xx
[   31.868369] Probing IDE interface ide0...
[   32.432715] Probing IDE interface ide1...
[   32.995829] ata_piix :00:1f.1: version 2.10ac1
[   32.995843] PCI: Enabling device :00:1f.1 ( - 0003)
[   32.995905] ACPI: PCI Interrupt :00:1f.1[A] - GSI 18 (level,
low) - IRQ 18
[   32.996027] PCI: Setting latency timer of device :00:1f.1 to 64
[   32.996094] ata1: PATA max UDMA/133 cmd 0x000101f0 ctl
0x000103f6 bmdma 0x0001fc00 irq 14
[   32.996213] ata2: PATA max UDMA/133 cmd 0x00010170 ctl
0x00010376 bmdma 0x0001fc08 irq 15
[   32.996305] scsi0 : ata_piix
[   32.996443] ata1: port disabled. ignoring.
[   32.996467] scsi1 : ata_piix
[   32.996582] ata2: port disabled. ignoring.
[   32.996613] ata_piix :00:1f.2: MAP [ P0 -- P1 -- ]
[   32.996858] ACPI: PCI Interrupt :00:1f.2[A] - GSI 18 (level,
low) - IRQ 18
[   32.996977] PCI: Setting latency timer of device :00:1f.2 to 64
[   32.997014] ata3: SATA max UDMA/133 cmd 0x0001ccb8 ctl
0x0001ccb2 bmdma 0x0001cc80 irq 18
[   32.997125] ata4: SATA max UDMA/133 cmd 0x0001cca0 ctl
0x0001cc9a bmdma 0x0001cc88 irq 18
[   32.997210] scsi2 : ata_piix
[   33.167491] ata3.00: ATA-7: WDC WD800JD-75LSA0, 09.01D09, max UDMA/133
[   33.167551] ata3.00: 15625 sectors, multi 8: LBA48
[   33.179467] ata3.00: configured for UDMA/133
[   33.179527] scsi3 : ata_piix
[   33.351130] ata4.00: ATA-7: WDC WD800JD-75LSA0, 09.01D09, max UDMA/133
[   33.351190] ata4.00: 15625 sectors, multi 8: LBA48
[   33.363112] ata4.00: configured for UDMA/133
[   33.363277] scsi 2:0:0:0: Direct-Access ATA  WDC
WD800JD-75LS 09.0 PQ: 0 ANSI: 5
[   33.363476] SCSI device sda: 15625 512-byte hdwr sectors (8 MB)
[   33.363550] sda: Write Protect is off
[   33.363606] sda: Mode Sense: 00 3a 00 00
[   33.363635] SCSI device sda: write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   33.363770] SCSI device sda: 15625 512-byte hdwr sectors (8 MB)
[   33.363843] sda: Write Protect is off
[   33.363899] sda: Mode Sense: 00 3a 00 00
[   33.363928] SCSI device sda: write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[   33.364002]  sda: sda1

Linux Software RAID is really RAID?

2007-06-26 Thread Johny Mail list

Hello list,
I have a little question about software RAID on Linux.
I have installed Software Raid on all my SC1425 servers DELL by
believing that the md raid was a strong driver.
And recently i make some test on a server and try to view if the RAID
hard drive power failure work fine, so i power up my server and after
booting and the prompt appear I disconnected the power cable of my
SATA hard drive. Normaly the MD should eleminate the failure hard
drive of the logical drive it build, and the server continue to work
fine like nothing happen. Oddly the server stop to respond and i get
this messages :
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata4.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: port failed to respond (30sec, Status 0xd0)
ata4: soft resetting port

After that my system is frozen. Normaly in a basic RAID the device is
disable in the logical RAID device (md0) and it only use the last
disk.

I have make the same test on Windows with the fakeraid in the
bios/sata chip and with one disk disconnected the system is always up
with a logical device.

I use Linux RAID + LVM + ext3 a basic combinaison, i have make this
test on debian etch defautl kernel and with the last version of the
kernel. If anyone have an idea why the driver don't manage electric
failure of an hard drive ...

Thanks
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html