Re: strange RAID5 problem

2006-05-09 Thread Luca Berra

On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:

[EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1

But, I get this error message:
mdadm: hot add failed for /dev/sdw1: No such device

What? We just made the partition on sdw a moment ago in fdisk. It IS there!


I don't believe you, prove it (/proc/partitions)


So. we look around a bit:
# /cat/proc/mdstat

md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11]
sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2]
sdr1[1]
 5860631040 blocks

Yup, that looks correct, missing sdw1[6]


no, it does not, it is 'inactive'


[EMAIL PROTECTED] ~]# cat /proc/mdstat
Personalities : [raid1] [raid5]

...

md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11]
sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2]
sdr1[1]
 5860631040 blocks

...

[EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1
mdadm: hot add failed for /dev/sdw1: No such device

OK, let's mount the degraded RAID and try to copy the files to somewhere
else, so we can make it from scratch:

[EMAIL PROTECTED] ~]# mount /dev/md3 /all/boxw16/
/dev/md3: Invalid argument
mount: /dev/md3: can't read superblock


it is still inactive, no wonder you cannot access it.

try running the array, or really stop it before assembling.

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-09 Thread CaT
On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:
 [EMAIL PROTECTED] ~]# mdadm
 --assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1
 /dev/sdv1 /dev/sdw1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1
 /dev/sdac1 /dev/sdad1 /dev/sdae1 /dev/sdaf1
 mdadm: superblock on /dev/sdw1 doesn't match others - assembly aborted

Have you tried zeroing the superblock with

mdadm --misc --zero-superblock /dev/sdw1

and then adding it in?

 [EMAIL PROTECTED] ~]# mount /dev/md3 /all/boxw16/
 /dev/md3: Invalid argument
 mount: /dev/md3: can't read superblock

Wow that looks messy. ummm. about the only thing I can think of is
failing /dev/sdw1 and removing it (I know it says it's not there
but...)

Also, not biggest expert on raid around here. ;)

-- 
To the extent that we overreact, we proffer the terrorists the
greatest tribute.
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slackware -current softraid5 boot problem - additional info

2006-05-09 Thread Dexter Filmore
Am Dienstag, 9. Mai 2006 07:50 schrieb Luca Berra:
 you don't give a lot of information about your setup,

You're sure right here, I was a bit off track yesterday from tinkering till 
night - info below.

 in any case it could be something like udev and the /dev/sdd device node
 not being available at boot?

Ok: 
Slackware-current with kernel 2.6.14.6, *no* udev, plain old hotplug
I had to put the raid start script in a reasonable place myself (not preconfed 
in Slack) so I have to figure yet if sees /etc/mdadm.conf  when the script is 
called. (If presence of mdadm.conf is totally uninteresting, let me know, I 
just started on raid.)
The other disks are seen fine, and since they are all the same type on the 
same controller there's no reason why it is not seen then.
(Unless for some reason mdadm talks to the *last* disk first and then stops - 
else it should complain about sda rather.)


* mdadm -E info *


# mdadm -E /dev/sdd
/dev/sdd:
  Magic : a92b4efc
Version : 00.90.02
   UUID : db7e5b65:e35c69dc:7c267a5a:e676c929
  Creation Time : Mon May  8 00:05:16 2006
 Raid Level : raid5
Device Size : 244198464 (232.89 GiB 250.06 GB)
 Array Size : 732595392 (698.66 GiB 750.18 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

Update Time : Tue May  9 00:43:46 2006
  State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
   Checksum : 61f0ffd6 - correct
 Events : 0.24796

 Layout : left-symmetric
 Chunk Size : 32K

  Number   Major   Minor   RaidDevice State
this 3   8   483  active sync   /dev/sdd

   0 0   800  active sync   /dev/sda
   1 1   8   161  active sync   /dev/sdb
   2 2   8   322  active sync   /dev/sdc
   3 3   8   483  active sync   /dev/sdd

* mdstat *

Once I started the array manually (which works fine) mdstats look like:

# cat /proc/mdstat
Personalities : [raid5]
md0 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1]
  732563712 blocks level 5, 32k chunk, algorithm 2 [4/4] []

unused devices: none

-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C+++() UL+ P+++ L+++ E-- W++ N o? K-
w--(---) !O M+ V- PS++(+) PE(-) Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D G++ e* h++ r%* y?
--END GEEK CODE BLOCK--

http://www.stop1984.com
http://www.againsttcpa.com
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slackware -current softraid5 boot problem - additional info

2006-05-09 Thread Mike Hardy

Something fishy here

Dexter Filmore wrote:

 # mdadm -E /dev/sdd

Device /dev/sdd

 # cat /proc/mdstat
 Personalities : [raid5]
 md0 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1]
   732563712 blocks level 5, 32k chunk, algorithm 2 [4/4] []

Components that are all the first partition.

Are you using the whole disk, or the first partition?

It appears that to some extent, you are using both.

Perhaps some confusion on that point between your boot scripts and your
manual run explains things?


-Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-09 Thread Maurice Hilarius
Luca Berra wrote:
 On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:
 [EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1

 But, I get this error message:
 mdadm: hot add failed for /dev/sdw1: No such device

 What? We just made the partition on sdw a moment ago in fdisk. It IS
 there!

 I don't believe you, prove it (/proc/partitions)


I understand. Here we go then. Devices in question bracketed with **:

[EMAIL PROTECTED] ~]# cat /proc/partitions
major minor  #blocks  name

   3 0  117220824 hda
   3 1 104391 hda1
   3 22008125 hda2
   3 3  115105725 hda3
   364  117220824 hdb
   365 104391 hdb1
   3662008125 hdb2
   367  115105725 hdb3
   8 0  390711384 sda
   8 1  390708801 sda1
   816  390711384 sdb
   817  390708801 sdb1
   832  390711384 sdc
   833  390708801 sdc1
   848  390711384 sdd
   849  390708801 sdd1
   864  390711384 sde
   865  390708801 sde1
   880  390711384 sdf
   881  390708801 sdf1
   896  390711384 sdg
   897  390708801 sdg1
   8   112  390711384 sdh
   8   113  390708801 sdh1
   8   128  390711384 sdi
   8   129  390708801 sdi1
   8   144  390711384 sdj
   8   145  390708801 sdj1
   8   160  390711384 sdk
   8   161  390708801 sdk1
   8   176  390711384 sdl
   8   177  390708801 sdl1
   8   192  390711384 sdm
   8   193  390708801 sdm1
   8   208  390711384 sdn
   8   209  390708801 sdn1
   8   224  390711384 sdo
   8   225  390708801 sdo1
   8   240  390711384 sdp
   8   241  390708801 sdp1
  65 0  390711384 sdq
  65 1  390708801 sdq1
  6516  390711384 sdr
  6517  390708801 sdr1
  6532  390711384 sds
  6533  390708801 sds1
  6548  390711384 sdt
  6549  390708801 sdt1
  6564  390711384 sdu
  6565  390708801 sdu1
  6580  390711384 sdv
  6581  390708801 sdv1
**
  6596  390711384 sdw
  6597  390708801 sdw1
**
  65   112  390711384 sdx
  65   113  390708801 sdx1
  65   128  390711384 sdy
  65   129  390708801 sdy1
  65   144  390711384 sdz
  65   145  390708801 sdz1
  65   160  390711384 sdaa
  65   161  390708801 sdaa1
  65   176  390711384 sdab
  65   177  390708801 sdab1
  65   192  390711384 sdac
  65   193  390708801 sdac1
  65   208  390711384 sdad
  65   209  390708801 sdad1
  65   224  390711384 sdae
  65   225  390708801 sdae1
  65   240  390711384 sdaf
  65   241  390708801 sdaf1
**
   9 0 104320 md0
**
   9 2 5860631040 md2
   9 1  115105600 md1



-- 

Regards,
Maurice

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: softraid5 boot problem - partly my fault, solved

2006-05-09 Thread Dexter Filmore
Mystery solved: had to probe another module. 
Wait, wait, I can defend myself :)

What led me to believe the controller was autoprobed during boot is that mdadm 
complained about *sdd*, but not about sd[abc], hence I assumed [abc] were all 
fine.
Plus, I didn't have to probe the module manually after boot was completed 
(appears that at that point some other module inserted it as a dependency).

So - is that how mdadm (or the kernel?) handle raid, is the last disk checked 
first by design?

Dex

-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C+++() UL+ P+++ L+++ E-- W++ N o? K-
w--(---) !O M+ V- PS++(+) PE(-) Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D G++ e* h++ r%* y?
--END GEEK CODE BLOCK--

http://www.stop1984.com
http://www.againsttcpa.com
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-09 Thread Luca Berra

On Tue, May 09, 2006 at 10:16:25AM -0600, Maurice Hilarius wrote:

Luca Berra wrote:

On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote:

[EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1

But, I get this error message:
mdadm: hot add failed for /dev/sdw1: No such device

What? We just made the partition on sdw a moment ago in fdisk. It IS
there!


I don't believe you, prove it (/proc/partitions)



I understand. Here we go then. Devices in question bracketed with **:


ok, now i do.
is the /dev/sdw1 device file correctly created?
you could try straceing mdadm to see what happens

what about the other suggestion? trying to stop the array and restart
it, since it is marked as inactive.
L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange RAID5 problem

2006-05-09 Thread Maurice Hilarius
Luca Berra wrote:
 ..
 I don't believe you, prove it (/proc/partitions)

 I understand. Here we go then. Devices in question bracketed with **:

 ok, now i do.
 is the /dev/sdw1 device file correctly created?
 you could try straceing mdadm to see what happens

 what about the other suggestion? trying to stop the array and restart
 it, since it is marked as inactive.
 L.

Here is what we ended up doing that fixed it.
Thanks to Neil on the --force, however even with that,
ALL parameters were needed on the mdadm -C or it still refused.
We used EVMS  to rebuild as that is what originally created the RAID.

mdadm -C /dev/md3 --chunk=256 --level=5 --parity=ls --raid-devices=16
--force /dev/evms/.nodes/sdq1 /dev/evms/.nodes/sdr1
/dev/evms/.nodes/sds1 /dev/evms/.nodes/sdt1 /dev/evms/.nodes/sdu1
/dev/evms/.nodes/sdv1 missing /dev/evms/.nodes/sdx1
/dev/evms/.nodes/sdy1 /dev/evms/.nodes/sdz1 /dev/evms/.nodes/sdaa1
/dev/evms/.nodes/sdab1 /dev/evms/.nodes/sdac1 /dev/evms/.nodes/sdad1
/dev/evms/.nodes/sdae1 /dev/evms/.nodes/sdaf1

Notice we are assembling a device with a missing member, and the
devices are in order per: mdamd -D /dev/md3

This was the *only* that it would come up. It was mountable, data seems
intact.
We started the rebuild with no errors by simply adding the device
as I mentioned before with -a.

Then sped it up via:

echo 10  /proc/sys/dev/raid/speed_limit_min

Because frankly we have the resources to do so and need it going as fast
as possible.

-- 

Regards,
Maurice

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html