Re: mdadm: bitmaps not supported by this kernel?

2006-10-25 Thread Paul Clements

Michael Tokarev wrote:

Another 32/64 bits issue, it seems.
Running 2.6.18.1 x86-64 kernel and mdadm 2.5.3 (32 bit).

# mdadm -G /dev/md1 --bitmap=internal
mdadm: bitmaps not supported by this kernel.

# mdadm -G /dev/md1 --bitmap=none
mdadm: bitmaps not supported by this kernel.

etc.

Recompiling mdadm in 64bit mode eliminates the problem.



I think this is due to the bug I reported a month or so ago. We were 
missing a COMPATIBLE_IOCTL entry for the GET_BITMAP_FILE ioctl. Neil has 
sent in the patch.




So far, only bitmap manipulation is broken this way.
I dunno if other things are broken too - at least
--assemble, --create, --stop, --detail works.



Yeah, I think everything else works.

--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 refuses to accept replacement drive.

2006-10-25 Thread Neil Brown
On Wednesday October 25, [EMAIL PROTECTED] wrote:
> Good morning to everyone, hope everyone's day is going well.
> 
> Neil, I sent this to your SUSE address a week ago but it may have
> gotten trapped in a SPAM filter or lost in the shuffle.

Yes, resending is always a good idea if I seem to be ignoring you.

(people who are really on-the-ball will probably start telling me it is a
resend the first time they mail me. I probably wouldn't notice.. :-)

> 
> I've used MD based RAID since it first existed.  First time I've run
> into a situation like this.
> 
> Environment:
>   Kernel: 2.4.33.3
>   MDADM:  2.4.1/2.5.3
>   MD: Three drive RAID5 (md3)

Old kernel, new mdadm.  Not a tested combination unfortunately.  I
guess I should try booting 2.4 somewhere and try it out...

> 
> A 'silent' disk failure was experienced in a SCSI hot-swap chassis
> during a yearly system upgrade.  Machine failed to boot until 'nobd'
> directive was given to LILO.  Drive was mechanically dead but
> electrically alive.
> 
> Drives were shuffled to get the machine operational.  The machine came
> up with md3 degraded.  The md3 device refuses to accept a replacement
> partition using the following syntax:
> 
> mdadm --manage /dev/md3 -a /dev/sde1
> 
> No output from mdadm, nothing in the logfiles.  Tail end of strace is
> as follows:
> 
> open("/dev/md3", O_RDWR)= 3
> fstat64(0x3, 0xb8fc)= 0
> ioctl(3, 0x800c0910, 0xb9f8)= 0

Those last to lines are a called to md_get_version. 
Probably the one in open_mddev

> _exit(0)= ?

But I can see no way that it would exit...

Are you comfortable with gdb?
Would you be interested in single stepping around and seeing what path
leads to the exit?

Another option is to use mdadm-1.9.0.  That is likely to be more
reliable.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mdadm: bitmaps not supported by this kernel?

2006-10-25 Thread Michael Tokarev
Another 32/64 bits issue, it seems.
Running 2.6.18.1 x86-64 kernel and mdadm 2.5.3 (32 bit).

# mdadm -G /dev/md1 --bitmap=internal
mdadm: bitmaps not supported by this kernel.

# mdadm -G /dev/md1 --bitmap=none
mdadm: bitmaps not supported by this kernel.

etc.

Recompiling mdadm in 64bit mode eliminates the problem.

So far, only bitmap manipulation is broken this way.
I dunno if other things are broken too - at least
--assemble, --create, --stop, --detail works.

Thanks.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 refuses to accept replacement drive.

2006-10-25 Thread Eli Stair



A tangentially-related suggestion:

If you layer dm-multipath on top of the raw block (SCSI,FC) layer, you 
add some complexity but also the good quality of enabling periodic 
readsector0() checks... so if your spindle powers down unexpectedly but 
the controller thinks it's still alive, you will still get a drive 
disconnect issued from below MD, as device-mapper will fail the drive 
automatically and MD will see it as faulty.


Sorry, no useful suggestion on the recovery task...


/eli


[EMAIL PROTECTED] wrote:

Good morning to everyone, hope everyone's day is going well.

Neil, I sent this to your SUSE address a week ago but it may have
gotten trapped in a SPAM filter or lost in the shuffle.

I've used MD based RAID since it first existed.  First time I've run
into a situation like this.

Environment:
Kernel: 2.4.33.3
MDADM:  2.4.1/2.5.3
MD: Three drive RAID5 (md3)

A 'silent' disk failure was experienced in a SCSI hot-swap chassis
during a yearly system upgrade.  Machine failed to boot until 'nobd'
directive was given to LILO.  Drive was mechanically dead but
electrically alive.

Drives were shuffled to get the machine operational.  The machine came
up with md3 degraded.  The md3 device refuses to accept a replacement
partition using the following syntax:

mdadm --manage /dev/md3 -a /dev/sde1

No output from mdadm, nothing in the logfiles.  Tail end of strace is
as follows:

open("/dev/md3", O_RDWR)= 3
fstat64(0x3, 0xb8fc)= 0
ioctl(3, 0x800c0910, 0xb9f8)= 0
_exit(0)= ?

I 'zeroed' the superblock on /dev/sde1 to make sure there was nothing
to interfere.  No change in behavior.

I know the 2.4 kernels are not in vogue but this is from a group of
machines which are expected to run a year at a time.  Stability and
known behavior are the foremost goals.

Details on the MD device and component drives are included below.

We've handled a lot of MD failures, first time anything like this has
happened.  I feel like there is probably a 'brown paper bag' solution
to this but I can't see it.

Thoughts?

Greg

---
/dev/md3:
Version : 00.90.00
  Creation Time : Fri Jun 23 19:51:43 2006
 Raid Level : raid5
 Array Size : 5269120 (5.03 GiB 5.40 GB)
Device Size : 2634560 (2.51 GiB 2.70 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Wed Oct 11 04:33:06 2006
  State : active, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd
 Events : 0.25

Number   Major   Minor   RaidDevice State
   0   8   490  active sync   /dev/sdd1
   1   001  removed
   2   8   332  active sync   /dev/sdc1
---


Details for raid device 0:

---
/dev/sdd1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd
  Creation Time : Fri Jun 23 19:51:43 2006
 Raid Level : raid5
Device Size : 2634560 (2.51 GiB 2.70 GB)
 Array Size : 5269120 (5.03 GiB 5.40 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3

Update Time : Wed Oct 11 04:33:06 2006
  State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
   Checksum : 52b602d5 - correct
 Events : 0.25

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 0   8   490  active sync   /dev/sdd1

   0 0   8   490  active sync   /dev/sdd1
   1 1   001  faulty removed
   2 2   8   332  active sync   /dev/sdc1
---


Details for RAID device 2:

---
/dev/sdc1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd
  Creation Time : Fri Jun 23 19:51:43 2006
 Raid Level : raid5
Device Size : 2634560 (2.51 GiB 2.70 GB)
 Array Size : 5269120 (5.03 GiB 5.40 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3

Update Time : Wed Oct 11 04:33:06 2006
  State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
   Checksum : 52b602c9 - correct
 Events : 0.25

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 2   8   332 

RAID5 refuses to accept replacement drive.

2006-10-25 Thread greg
Good morning to everyone, hope everyone's day is going well.

Neil, I sent this to your SUSE address a week ago but it may have
gotten trapped in a SPAM filter or lost in the shuffle.

I've used MD based RAID since it first existed.  First time I've run
into a situation like this.

Environment:
Kernel: 2.4.33.3
MDADM:  2.4.1/2.5.3
MD: Three drive RAID5 (md3)

A 'silent' disk failure was experienced in a SCSI hot-swap chassis
during a yearly system upgrade.  Machine failed to boot until 'nobd'
directive was given to LILO.  Drive was mechanically dead but
electrically alive.

Drives were shuffled to get the machine operational.  The machine came
up with md3 degraded.  The md3 device refuses to accept a replacement
partition using the following syntax:

mdadm --manage /dev/md3 -a /dev/sde1

No output from mdadm, nothing in the logfiles.  Tail end of strace is
as follows:

open("/dev/md3", O_RDWR)= 3
fstat64(0x3, 0xb8fc)= 0
ioctl(3, 0x800c0910, 0xb9f8)= 0
_exit(0)= ?

I 'zeroed' the superblock on /dev/sde1 to make sure there was nothing
to interfere.  No change in behavior.

I know the 2.4 kernels are not in vogue but this is from a group of
machines which are expected to run a year at a time.  Stability and
known behavior are the foremost goals.

Details on the MD device and component drives are included below.

We've handled a lot of MD failures, first time anything like this has
happened.  I feel like there is probably a 'brown paper bag' solution
to this but I can't see it.

Thoughts?

Greg

---
/dev/md3:
Version : 00.90.00
  Creation Time : Fri Jun 23 19:51:43 2006
 Raid Level : raid5
 Array Size : 5269120 (5.03 GiB 5.40 GB)
Device Size : 2634560 (2.51 GiB 2.70 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Wed Oct 11 04:33:06 2006
  State : active, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd
 Events : 0.25

Number   Major   Minor   RaidDevice State
   0   8   490  active sync   /dev/sdd1
   1   001  removed
   2   8   332  active sync   /dev/sdc1
---


Details for raid device 0:

---
/dev/sdd1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd
  Creation Time : Fri Jun 23 19:51:43 2006
 Raid Level : raid5
Device Size : 2634560 (2.51 GiB 2.70 GB)
 Array Size : 5269120 (5.03 GiB 5.40 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3

Update Time : Wed Oct 11 04:33:06 2006
  State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
   Checksum : 52b602d5 - correct
 Events : 0.25

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 0   8   490  active sync   /dev/sdd1

   0 0   8   490  active sync   /dev/sdd1
   1 1   001  faulty removed
   2 2   8   332  active sync   /dev/sdc1
---


Details for RAID device 2:

---
/dev/sdc1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd
  Creation Time : Fri Jun 23 19:51:43 2006
 Raid Level : raid5
Device Size : 2634560 (2.51 GiB 2.70 GB)
 Array Size : 5269120 (5.03 GiB 5.40 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3

Update Time : Wed Oct 11 04:33:06 2006
  State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
   Checksum : 52b602c9 - correct
 Events : 0.25

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 2   8   332  active sync   /dev/sdc1

   0 0   8   490  active sync   /dev/sdd1
   1 1   001  faulty removed
   2 2   8   332  active sync   /dev/sdc1
---

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   

Re: Bug with RAID1 hot spares?

2006-10-25 Thread Mario 'BitKoenig' Holbe
Chase Venters <[EMAIL PROTECTED]> wrote:
> The main idea is to not exercise the spare as much as the other disks. All 

Btw. you can also keep the spare-disk spinned down most of the time.
You should probably just make sure to spin it up from time to time to
see if it's still okay - I spin up my spares one hour per night when
smartd issues short selftests and a few more hours when smartd issues
long selftests.


regards
   Mario
-- 
 Oh well, config
 one actually wonders what force in the universe is holding it
 and makes it working
 chances and accidents :)

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html