Thanks! Was:[Re: strange RAID5 problem]

2006-05-10 Thread Maurice Hilarius
Thanks to Neil, Luca, and CaT, who were all a big help.



-- 

With our best regards,


Maurice W. HilariusTelephone: 01-780-456-9771
Hard Data Ltd.  FAX:   01-780-456-9772
11060 - 166 Avenue email:[EMAIL PROTECTED]
Edmonton, AB, Canada   http://www.harddata.com/
   T5X 1Y3

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-09 Thread Luca Berra

On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:

[EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1

But, I get this error message:
mdadm: hot add failed for /dev/sdw1: No such device

What? We just made the partition on sdw a moment ago in fdisk. It IS there!


I don't believe you, prove it (/proc/partitions)


So. we look around a bit:
# /cat/proc/mdstat

md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11]
sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2]
sdr1[1]
 5860631040 blocks

Yup, that looks correct, missing sdw1[6]


no, it does not, it is 'inactive'


[EMAIL PROTECTED] ~]# cat /proc/mdstat
Personalities : [raid1] [raid5]

...

md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11]
sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2]
sdr1[1]
 5860631040 blocks

...

[EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1
mdadm: hot add failed for /dev/sdw1: No such device

OK, let's mount the degraded RAID and try to copy the files to somewhere
else, so we can make it from scratch:

[EMAIL PROTECTED] ~]# mount /dev/md3 /all/boxw16/
/dev/md3: Invalid argument
mount: /dev/md3: can't read superblock


it is still inactive, no wonder you cannot access it.

try running the array, or really stop it before assembling.

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-09 Thread CaT
On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:
 [EMAIL PROTECTED] ~]# mdadm
 --assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1
 /dev/sdv1 /dev/sdw1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1
 /dev/sdac1 /dev/sdad1 /dev/sdae1 /dev/sdaf1
 mdadm: superblock on /dev/sdw1 doesn't match others - assembly aborted

Have you tried zeroing the superblock with

mdadm --misc --zero-superblock /dev/sdw1

and then adding it in?

 [EMAIL PROTECTED] ~]# mount /dev/md3 /all/boxw16/
 /dev/md3: Invalid argument
 mount: /dev/md3: can't read superblock

Wow that looks messy. ummm. about the only thing I can think of is
failing /dev/sdw1 and removing it (I know it says it's not there
but...)

Also, not biggest expert on raid around here. ;)

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-09 Thread Maurice Hilarius
Luca Berra wrote:
 On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:
 [EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1

 But, I get this error message:
 mdadm: hot add failed for /dev/sdw1: No such device

 What? We just made the partition on sdw a moment ago in fdisk. It IS
 there!

 I don't believe you, prove it (/proc/partitions)


I understand. Here we go then. Devices in question bracketed with **:

[EMAIL PROTECTED] ~]# cat /proc/partitions
major minor  #blocks  name

   3 0  117220824 hda
   3 1 104391 hda1
   3 22008125 hda2
   3 3  115105725 hda3
   364  117220824 hdb
   365 104391 hdb1
   3662008125 hdb2
   367  115105725 hdb3
   8 0  390711384 sda
   8 1  390708801 sda1
   816  390711384 sdb
   817  390708801 sdb1
   832  390711384 sdc
   833  390708801 sdc1
   848  390711384 sdd
   849  390708801 sdd1
   864  390711384 sde
   865  390708801 sde1
   880  390711384 sdf
   881  390708801 sdf1
   896  390711384 sdg
   897  390708801 sdg1
   8   112  390711384 sdh
   8   113  390708801 sdh1
   8   128  390711384 sdi
   8   129  390708801 sdi1
   8   144  390711384 sdj
   8   145  390708801 sdj1
   8   160  390711384 sdk
   8   161  390708801 sdk1
   8   176  390711384 sdl
   8   177  390708801 sdl1
   8   192  390711384 sdm
   8   193  390708801 sdm1
   8   208  390711384 sdn
   8   209  390708801 sdn1
   8   224  390711384 sdo
   8   225  390708801 sdo1
   8   240  390711384 sdp
   8   241  390708801 sdp1
  65 0  390711384 sdq
  65 1  390708801 sdq1
  6516  390711384 sdr
  6517  390708801 sdr1
  6532  390711384 sds
  6533  390708801 sds1
  6548  390711384 sdt
  6549  390708801 sdt1
  6564  390711384 sdu
  6565  390708801 sdu1
  6580  390711384 sdv
  6581  390708801 sdv1
**
  6596  390711384 sdw
  6597  390708801 sdw1
**
  65   112  390711384 sdx
  65   113  390708801 sdx1
  65   128  390711384 sdy
  65   129  390708801 sdy1
  65   144  390711384 sdz
  65   145  390708801 sdz1
  65   160  390711384 sdaa
  65   161  390708801 sdaa1
  65   176  390711384 sdab
  65   177  390708801 sdab1
  65   192  390711384 sdac
  65   193  390708801 sdac1
  65   208  390711384 sdad
  65   209  390708801 sdad1
  65   224  390711384 sdae
  65   225  390708801 sdae1
  65   240  390711384 sdaf
  65   241  390708801 sdaf1
**
   9 0 104320 md0
**
   9 2 5860631040 md2
   9 1  115105600 md1



-- 

Regards,
Maurice

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-09 Thread Luca Berra

On Tue, May 09, 2006 at 10:16:25AM -0600, Maurice Hilarius wrote:

Luca Berra wrote:

On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:

[EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1

But, I get this error message:
mdadm: hot add failed for /dev/sdw1: No such device

What? We just made the partition on sdw a moment ago in fdisk. It IS
there!


I don't believe you, prove it (/proc/partitions)



I understand. Here we go then. Devices in question bracketed with **:


ok, now i do.
is the /dev/sdw1 device file correctly created?
you could try straceing mdadm to see what happens

what about the other suggestion? trying to stop the array and restart
it, since it is marked as inactive.
L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-09 Thread Maurice Hilarius
Luca Berra wrote:
 ..
 I don't believe you, prove it (/proc/partitions)

 I understand. Here we go then. Devices in question bracketed with **:

 ok, now i do.
 is the /dev/sdw1 device file correctly created?
 you could try straceing mdadm to see what happens

 what about the other suggestion? trying to stop the array and restart
 it, since it is marked as inactive.
 L.

Here is what we ended up doing that fixed it.
Thanks to Neil on the --force, however even with that,
ALL parameters were needed on the mdadm -C or it still refused.
We used EVMS  to rebuild as that is what originally created the RAID.

mdadm -C /dev/md3 --chunk=256 --level=5 --parity=ls --raid-devices=16
--force /dev/evms/.nodes/sdq1 /dev/evms/.nodes/sdr1
/dev/evms/.nodes/sds1 /dev/evms/.nodes/sdt1 /dev/evms/.nodes/sdu1
/dev/evms/.nodes/sdv1 missing /dev/evms/.nodes/sdx1
/dev/evms/.nodes/sdy1 /dev/evms/.nodes/sdz1 /dev/evms/.nodes/sdaa1
/dev/evms/.nodes/sdab1 /dev/evms/.nodes/sdac1 /dev/evms/.nodes/sdad1
/dev/evms/.nodes/sdae1 /dev/evms/.nodes/sdaf1

Notice we are assembling a device with a missing member, and the
devices are in order per: mdamd -D /dev/md3

This was the *only* that it would come up. It was mountable, data seems
intact.
We started the rebuild with no errors by simply adding the device
as I mentioned before with -a.

Then sped it up via:

echo 10  /proc/sys/dev/raid/speed_limit_min

Because frankly we have the resources to do so and need it going as fast
as possible.

-- 

Regards,
Maurice

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-08 Thread Neil Brown
On Monday May 8, [EMAIL PROTECTED] wrote:
 Good evening.
 
 I am having a bit of a problem with a largish RAID5 set.
 Now it is looking more and more like I am about to lose all the data on
 it, so I am asking (begging?) to see if anyone can help me sort this out.
 

Very thorough description, but you omitted the 'dmesg' output
corresponding to :

 
 [EMAIL PROTECTED] ~]# mdadm
 --assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1
 /dev/sdv1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1 /dev/sdac1
 /dev/sdad1 /dev/sdae1 /dev/sdaf1
 mdadm: failed to RUN_ARRAY /dev/md3: Invalid argument


Also, you don't seem to have tried '--force' with '--assemble'.  It
might help.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html