Re: [PATCH] enable auto=yes by default when using udev

2006-07-03 Thread Frank Blendinger
On Mon, Jul 03, 2006 at 09:14:38AM +1000, Neil Brown wrote:
 I'm worried that this test is not very robust.
 On my Debian/unstable system running used, there is no
  /dev/.udevdb
 though there is a
  /dev/.udev/db
 
 I guess I could test for both, but then udev might change
 again I'd really like a more robust check.
 
 Maybe I could test if /dev was a mount point?
 
 Any other ideas?

Maybe checking for a running 'udevd' process?


Frank

-- 
Frank Blendinger | fb(at)intoxicatedmind.net | GPG: 0x0BF2FE7A
Fingerprint: BB64 F2B8 DFD8 BF90 0F2E 892B 72CF 7A41 0BF2 FE7A
   Just because I don't care doesn't mean I don't undestand.
   (Homer Simpson)


signature.asc
Description: Digital signature


Re: Two disk failure in RAID5 during resync, wrong superblocks

2006-03-16 Thread Frank Blendinger
Hi again,

I've just seen I still had a wrong superblock in the subject of my
mail. Please just ignore, I fixed that while writing the last mail and
forgot to remove it. :)


Greets,
Frank


signature.asc
Description: Digital signature


Re: will mdadm work with a raid created using raidtools

2006-02-16 Thread Frank Blendinger
On Thu, Feb 16, 2006 at 09:15:27AM -0600, Andrew Nelson wrote:
 Feb 14 21:58:36 localhost kernel: hde: dma_timer_expiry: dma status == 0x21
 Feb 14 21:58:46 localhost kernel: hde: DMA timeout error
 Feb 14 21:58:46 localhost kernel: hde: dma timeout error: status=0x51 { 
 DriveRea
 dy SeekComplete Error }
 Feb 14 21:58:46 localhost kernel: hde: dma timeout error: error=0x40 { 
 Uncorrect
 ableError }, LBAsect=1594001, high=0, low=1594001, sector=1593535
 Feb 14 21:58:46 localhost kernel: end_request: I/O error, dev hde, sector 
 159353
 5
 
 Does anyone have any idea what I'm doing wrong?

It's probably not your fault - blame /dev/hde! This sounds like a bad
error on the disk - you should really get a new one, and try to copy
/dev/hde to the new disk (with dd_rescue for example). This _might_ save
the data.

Then you can try to create the array with the new disk and hope that
it will work.


Greetings,
Frank


signature.asc
Description: Digital signature


Re: will mdadm work with a raid created using raidtools

2006-02-16 Thread Frank Blendinger
On Thu, Feb 16, 2006 at 11:31:06AM -0600, Andrew Nelson wrote:
  It's probably not your fault - blame /dev/hde! This sounds like a bad
  error on the disk - you should really get a new one, and try to copy
  /dev/hde to the new disk (with dd_rescue for example). This _might_ save
  the data.
  
  Then you can try to create the array with the new disk and hope that
  it will work.
 
 
 I thought the whole idea of a raid 1 was that if one drive went bad I could 
 just
 plug a new drive in and the raid would rebuild without problems.

That is certainly right. I just wanted to tell you, that your /dev/hde
probably has a serious hardware error, and that you should replace it!

Of course you can just throw it out, put in a new drive, rebuild the
array with your other drive and the new one, and then resync. This
should work just fine.

I guess my first answer was quite confusing, sorry. It's absolutely not
necessary to copy the old hde with dd_rescue.


Greetings,
Frank


signature.asc
Description: Digital signature


Failed RAID-5 with 4 disks

2005-07-26 Thread Frank Blendinger
Hi,

I have a RAID-5 set up with the following raidtab:

raiddev /dev/md0
raid-level  5
nr-raid-disks   4
nr-spare-disks  0
persistent-superblock   1
parity-algorithmleft-symmetric
chunk-size  256
device  /dev/hde
raid-disk   0
device  /dev/hdg
raid-disk   1
device  /dev/hdi
raid-disk   2
device  /dev/hdk
raid-disk   3

My hde has failed some time ago, leaving some 
hde: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hde: dma_intr: error=0x84 { DriveStatusError BadCRC }
messages in the syslog.

I wanted to get sure it really was damaged, so I did a badblocks
(read-only) scan on /dev/hde. It actually found a bad sector on the
disk.


I wanted to take the disk out to get me a new one, but unfortunately my
hdg seems to have run into trouble too, now. I also have some
SeekComplete/BadCRC errors in my log for that disk, too.

Furthermore, i got this:

Jul 25 10:35:49 blackbox kernel: ide: failed opcode was: unknown
Jul 25 10:35:49 blackbox kernel: hdg: DMA disabled
Jul 25 10:35:49 blackbox kernel: PDC202XX: Secondary channel reset.
Jul 25 10:35:49 blackbox kernel: PDC202XX: Primary channel reset.
Jul 25 10:35:49 blackbox kernel: hde: lost interrupt
Jul 25 10:35:49 blackbox kernel: ide3: reset: master: error (0x00?)
Jul 25 10:35:49 blackbox kernel: end_request: I/O error, dev hdg, sector 
488396928
Jul 25 10:35:49 blackbox kernel: end_request: I/O error, dev hdg, sector 
159368976
Jul 25 10:35:49 blackbox kernel: end_request: I/O error, dev hdg, sector 
159368984
Jul 25 10:35:49 blackbox kernel: end_request: I/O error, dev hdg, sector 
159368992
Jul 25 10:35:49 blackbox kernel: end_request: I/O error, dev hdg, sector 
159369000
Jul 25 10:35:49 blackbox kernel: end_request: I/O error, dev hdg, sector 
159369008
Jul 25 10:35:49 blackbox kernel: end_request: I/O error, dev hdg, sector 
159369016
Jul 25 10:35:49 blackbox kernel: end_request: I/O error, dev hdg, sector 
159369024
Jul 25 10:35:49 blackbox kernel: end_request: I/O error, dev hdg, sector 
159369032
Jul 25 10:35:49 blackbox kernel: md: write_disk_sb failed for device hdg
Jul 25 10:35:49 blackbox kernel: lost page write due to I/O error on md0
Jul 25 10:35:49 blackbox kernel: lost page write due to I/O error on md0
Jul 25 10:35:49 blackbox kernel: RAID5 conf printout:
Jul 25 10:35:49 blackbox kernel:  --- rd:4 wd:2 fd:2
Jul 25 10:35:49 blackbox kernel:  disk 0, o:1, dev:hdk
Jul 25 10:35:49 blackbox kernel:  disk 1, o:1, dev:hdi
Jul 25 10:35:49 blackbox kernel:  disk 2, o:0, dev:hdg
Jul 25 10:35:49 blackbox kernel: RAID5 conf printout:
Jul 25 10:35:49 blackbox kernel:  --- rd:4 wd:2 fd:2
Jul 25 10:35:49 blackbox kernel:  disk 0, o:1, dev:hdk
Jul 25 10:35:49 blackbox kernel:  disk 1, o:1, dev:hdi
Jul 25 10:35:49 blackbox kernel: lost page write due to I/O error on md0


Well, now it seems I have to failed disks in my RAID-5, which of course
would be fatal. I am still hoping to somehow rescue the data on the
array somehow, but I am not sure what would be the best approach. I don't
want to cause any more damage.

When booting my system with all four disks connected, hde and hdg as
expected won't get added:

Jul 26 18:07:59 blackbox kernel: md: hdg has invalid sb, not importing!
Jul 26 18:07:59 blackbox kernel: md: autorun ...
Jul 26 18:07:59 blackbox kernel: md: considering hdi ...
Jul 26 18:07:59 blackbox kernel: md:  adding hdi ...
Jul 26 18:07:59 blackbox kernel: md:  adding hdk ...
Jul 26 18:07:59 blackbox kernel: md:  adding hde ...
Jul 26 18:07:59 blackbox kernel: md: created md0
Jul 26 18:07:59 blackbox kernel: md: bindhde
Jul 26 18:07:59 blackbox kernel: md: bindhdk
Jul 26 18:07:59 blackbox kernel: md: bindhdi
Jul 26 18:07:59 blackbox kernel: md: running: hdihdkhde
Jul 26 18:07:59 blackbox kernel: md: kicking non-fresh hde from array!
Jul 26 18:07:59 blackbox kernel: md: unbindhde
Jul 26 18:07:59 blackbox kernel: md: export_rdev(hde)
Jul 26 18:07:59 blackbox kernel: raid5: device hdi operational as raid disk 1
Jul 26 18:07:59 blackbox kernel: raid5: device hdk operational as raid disk 0
Jul 26 18:07:59 blackbox kernel: RAID5 conf printout:
Jul 26 18:07:59 blackbox kernel:  --- rd:4 wd:2 fd:2
Jul 26 18:07:59 blackbox kernel:  disk 0, o:1, dev:hdk
Jul 26 18:07:59 blackbox kernel:  disk 1, o:1, dev:hdi
Jul 26 18:07:59 blackbox kernel: md: do_md_run() returned -22
Jul 26 18:07:59 blackbox kernel: md: md0 stopped.
Jul 26 18:07:59 blackbox kernel: md: unbindhdi
Jul 26 18:07:59 blackbox kernel: md: export_rdev(hdi)
Jul 26 18:07:59 blackbox kernel: md: unbindhdk
Jul 26 18:07:59 blackbox kernel: md: export_rdev(hdk)
Jul 26 18:07:59 blackbox kernel: md: ... autorun DONE.

So hde is not fresh (it has been removed from the array for quite some
time now) and hdg has an