Re: mdadm: bitmaps not supported by this kernel?
Michael Tokarev wrote: Another 32/64 bits issue, it seems. Running 2.6.18.1 x86-64 kernel and mdadm 2.5.3 (32 bit). # mdadm -G /dev/md1 --bitmap=internal mdadm: bitmaps not supported by this kernel. # mdadm -G /dev/md1 --bitmap=none mdadm: bitmaps not supported by this kernel. etc. Recompiling mdadm in 64bit mode eliminates the problem. I think this is due to the bug I reported a month or so ago. We were missing a COMPATIBLE_IOCTL entry for the GET_BITMAP_FILE ioctl. Neil has sent in the patch. So far, only bitmap manipulation is broken this way. I dunno if other things are broken too - at least --assemble, --create, --stop, --detail works. Yeah, I think everything else works. -- Paul - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 refuses to accept replacement drive.
On Wednesday October 25, [EMAIL PROTECTED] wrote: > Good morning to everyone, hope everyone's day is going well. > > Neil, I sent this to your SUSE address a week ago but it may have > gotten trapped in a SPAM filter or lost in the shuffle. Yes, resending is always a good idea if I seem to be ignoring you. (people who are really on-the-ball will probably start telling me it is a resend the first time they mail me. I probably wouldn't notice.. :-) > > I've used MD based RAID since it first existed. First time I've run > into a situation like this. > > Environment: > Kernel: 2.4.33.3 > MDADM: 2.4.1/2.5.3 > MD: Three drive RAID5 (md3) Old kernel, new mdadm. Not a tested combination unfortunately. I guess I should try booting 2.4 somewhere and try it out... > > A 'silent' disk failure was experienced in a SCSI hot-swap chassis > during a yearly system upgrade. Machine failed to boot until 'nobd' > directive was given to LILO. Drive was mechanically dead but > electrically alive. > > Drives were shuffled to get the machine operational. The machine came > up with md3 degraded. The md3 device refuses to accept a replacement > partition using the following syntax: > > mdadm --manage /dev/md3 -a /dev/sde1 > > No output from mdadm, nothing in the logfiles. Tail end of strace is > as follows: > > open("/dev/md3", O_RDWR)= 3 > fstat64(0x3, 0xb8fc)= 0 > ioctl(3, 0x800c0910, 0xb9f8)= 0 Those last to lines are a called to md_get_version. Probably the one in open_mddev > _exit(0)= ? But I can see no way that it would exit... Are you comfortable with gdb? Would you be interested in single stepping around and seeing what path leads to the exit? Another option is to use mdadm-1.9.0. That is likely to be more reliable. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
mdadm: bitmaps not supported by this kernel?
Another 32/64 bits issue, it seems. Running 2.6.18.1 x86-64 kernel and mdadm 2.5.3 (32 bit). # mdadm -G /dev/md1 --bitmap=internal mdadm: bitmaps not supported by this kernel. # mdadm -G /dev/md1 --bitmap=none mdadm: bitmaps not supported by this kernel. etc. Recompiling mdadm in 64bit mode eliminates the problem. So far, only bitmap manipulation is broken this way. I dunno if other things are broken too - at least --assemble, --create, --stop, --detail works. Thanks. /mjt - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 refuses to accept replacement drive.
A tangentially-related suggestion: If you layer dm-multipath on top of the raw block (SCSI,FC) layer, you add some complexity but also the good quality of enabling periodic readsector0() checks... so if your spindle powers down unexpectedly but the controller thinks it's still alive, you will still get a drive disconnect issued from below MD, as device-mapper will fail the drive automatically and MD will see it as faulty. Sorry, no useful suggestion on the recovery task... /eli [EMAIL PROTECTED] wrote: Good morning to everyone, hope everyone's day is going well. Neil, I sent this to your SUSE address a week ago but it may have gotten trapped in a SPAM filter or lost in the shuffle. I've used MD based RAID since it first existed. First time I've run into a situation like this. Environment: Kernel: 2.4.33.3 MDADM: 2.4.1/2.5.3 MD: Three drive RAID5 (md3) A 'silent' disk failure was experienced in a SCSI hot-swap chassis during a yearly system upgrade. Machine failed to boot until 'nobd' directive was given to LILO. Drive was mechanically dead but electrically alive. Drives were shuffled to get the machine operational. The machine came up with md3 degraded. The md3 device refuses to accept a replacement partition using the following syntax: mdadm --manage /dev/md3 -a /dev/sde1 No output from mdadm, nothing in the logfiles. Tail end of strace is as follows: open("/dev/md3", O_RDWR)= 3 fstat64(0x3, 0xb8fc)= 0 ioctl(3, 0x800c0910, 0xb9f8)= 0 _exit(0)= ? I 'zeroed' the superblock on /dev/sde1 to make sure there was nothing to interfere. No change in behavior. I know the 2.4 kernels are not in vogue but this is from a group of machines which are expected to run a year at a time. Stability and known behavior are the foremost goals. Details on the MD device and component drives are included below. We've handled a lot of MD failures, first time anything like this has happened. I feel like there is probably a 'brown paper bag' solution to this but I can't see it. Thoughts? Greg --- /dev/md3: Version : 00.90.00 Creation Time : Fri Jun 23 19:51:43 2006 Raid Level : raid5 Array Size : 5269120 (5.03 GiB 5.40 GB) Device Size : 2634560 (2.51 GiB 2.70 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 3 Persistence : Superblock is persistent Update Time : Wed Oct 11 04:33:06 2006 State : active, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd Events : 0.25 Number Major Minor RaidDevice State 0 8 490 active sync /dev/sdd1 1 001 removed 2 8 332 active sync /dev/sdc1 --- Details for raid device 0: --- /dev/sdd1: Magic : a92b4efc Version : 00.90.00 UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd Creation Time : Fri Jun 23 19:51:43 2006 Raid Level : raid5 Device Size : 2634560 (2.51 GiB 2.70 GB) Array Size : 5269120 (5.03 GiB 5.40 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 3 Update Time : Wed Oct 11 04:33:06 2006 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0 Checksum : 52b602d5 - correct Events : 0.25 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 490 active sync /dev/sdd1 0 0 8 490 active sync /dev/sdd1 1 1 001 faulty removed 2 2 8 332 active sync /dev/sdc1 --- Details for RAID device 2: --- /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd Creation Time : Fri Jun 23 19:51:43 2006 Raid Level : raid5 Device Size : 2634560 (2.51 GiB 2.70 GB) Array Size : 5269120 (5.03 GiB 5.40 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 3 Update Time : Wed Oct 11 04:33:06 2006 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0 Checksum : 52b602c9 - correct Events : 0.25 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 332
RAID5 refuses to accept replacement drive.
Good morning to everyone, hope everyone's day is going well. Neil, I sent this to your SUSE address a week ago but it may have gotten trapped in a SPAM filter or lost in the shuffle. I've used MD based RAID since it first existed. First time I've run into a situation like this. Environment: Kernel: 2.4.33.3 MDADM: 2.4.1/2.5.3 MD: Three drive RAID5 (md3) A 'silent' disk failure was experienced in a SCSI hot-swap chassis during a yearly system upgrade. Machine failed to boot until 'nobd' directive was given to LILO. Drive was mechanically dead but electrically alive. Drives were shuffled to get the machine operational. The machine came up with md3 degraded. The md3 device refuses to accept a replacement partition using the following syntax: mdadm --manage /dev/md3 -a /dev/sde1 No output from mdadm, nothing in the logfiles. Tail end of strace is as follows: open("/dev/md3", O_RDWR)= 3 fstat64(0x3, 0xb8fc)= 0 ioctl(3, 0x800c0910, 0xb9f8)= 0 _exit(0)= ? I 'zeroed' the superblock on /dev/sde1 to make sure there was nothing to interfere. No change in behavior. I know the 2.4 kernels are not in vogue but this is from a group of machines which are expected to run a year at a time. Stability and known behavior are the foremost goals. Details on the MD device and component drives are included below. We've handled a lot of MD failures, first time anything like this has happened. I feel like there is probably a 'brown paper bag' solution to this but I can't see it. Thoughts? Greg --- /dev/md3: Version : 00.90.00 Creation Time : Fri Jun 23 19:51:43 2006 Raid Level : raid5 Array Size : 5269120 (5.03 GiB 5.40 GB) Device Size : 2634560 (2.51 GiB 2.70 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 3 Persistence : Superblock is persistent Update Time : Wed Oct 11 04:33:06 2006 State : active, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd Events : 0.25 Number Major Minor RaidDevice State 0 8 490 active sync /dev/sdd1 1 001 removed 2 8 332 active sync /dev/sdc1 --- Details for raid device 0: --- /dev/sdd1: Magic : a92b4efc Version : 00.90.00 UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd Creation Time : Fri Jun 23 19:51:43 2006 Raid Level : raid5 Device Size : 2634560 (2.51 GiB 2.70 GB) Array Size : 5269120 (5.03 GiB 5.40 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 3 Update Time : Wed Oct 11 04:33:06 2006 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0 Checksum : 52b602d5 - correct Events : 0.25 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 490 active sync /dev/sdd1 0 0 8 490 active sync /dev/sdd1 1 1 001 faulty removed 2 2 8 332 active sync /dev/sdc1 --- Details for RAID device 2: --- /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : cdd418a1:4bc3da6b:1ec17a15:e73ecadd Creation Time : Fri Jun 23 19:51:43 2006 Raid Level : raid5 Device Size : 2634560 (2.51 GiB 2.70 GB) Array Size : 5269120 (5.03 GiB 5.40 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 3 Update Time : Wed Oct 11 04:33:06 2006 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0 Checksum : 52b602c9 - correct Events : 0.25 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 332 active sync /dev/sdc1 0 0 8 490 active sync /dev/sdd1 1 1 001 faulty removed 2 2 8 332 active sync /dev/sdc1 --- As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102development. PH: 701-281-1686 FAX: 701-281-3949
Re: Bug with RAID1 hot spares?
Chase Venters <[EMAIL PROTECTED]> wrote: > The main idea is to not exercise the spare as much as the other disks. All Btw. you can also keep the spare-disk spinned down most of the time. You should probably just make sure to spin it up from time to time to see if it's still okay - I spin up my spares one hour per night when smartd issues short selftests and a few more hours when smartd issues long selftests. regards Mario -- Oh well, config one actually wonders what force in the universe is holding it and makes it working chances and accidents :) - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html