Re: [PATCH] Use new sb type
On Feb 10 2008 10:34, David Greaves wrote: Jan Engelhardt wrote: On Jan 29 2008 18:08, Bill Davidsen wrote: IIRC there was a discussion a while back on renaming mdadm options (google Time to deprecate old RAID formats?) and the superblocks to emphasise the location and data structure. Would it be good to introduce the new names at the same time as changing the default format/on-disk-location? Yes, I suggested some layout names, as did a few other people, and a few changes to separate metadata type and position were discussed. BUT, changing the default layout, no matter how better it seems, is trumped by breaks existing setups and user practice. Layout names are a different matter from what the default sb type should be. Indeed they are. Or rather should be. However the current default sb includes a layout element. If the default sb is changed then it seems like an opportunity to detach the data format from the on-disk location. I do not see anything wrong by specifying the SB location as a metadata version. Why should not location be an element of the raid type? It's fine the way it is IMHO. (Just the default is not :) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use new sb type
On Feb 10 2008 12:27, David Greaves wrote: I do not see anything wrong by specifying the SB location as a metadata version. Why should not location be an element of the raid type? It's fine the way it is IMHO. (Just the default is not :) There was quite a discussion about it. For me the main argument is that for most people seeing superblock versions (even the manpage terminology is version and subversion) will correlate incremental versions with improvement. They will therefore see v1.2 as 'the latest and best'. Feel free to argue that the manpage is clear on this - but as we know, not everyone reads the manpages in depth... That is indeed suboptimal (but I would not care since I know the implications of an SB at the front); Naming it [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] or so would address this. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use new sb type
On Jan 29 2008 18:08, Bill Davidsen wrote: IIRC there was a discussion a while back on renaming mdadm options (google Time to deprecate old RAID formats?) and the superblocks to emphasise the location and data structure. Would it be good to introduce the new names at the same time as changing the default format/on-disk-location? Yes, I suggested some layout names, as did a few other people, and a few changes to separate metadata type and position were discussed. BUT, changing the default layout, no matter how better it seems, is trumped by breaks existing setups and user practice. Layout names are a different matter from what the default sb type should be. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Use new sb type
This makes 1.0 the default sb type for new arrays. Signed-off-by: Jan Engelhardt [EMAIL PROTECTED] --- Create.c |6 -- super0.c |4 +--- super1.c |2 +- 3 files changed, 2 insertions(+), 10 deletions(-) Index: mdadm-2.6.4/Create.c === --- mdadm-2.6.4.orig/Create.c +++ mdadm-2.6.4/Create.c @@ -241,12 +241,6 @@ int Create(struct supertype *st, char *m fprintf(stderr, Name : internal error - no default metadata style\n); exit(2); } - if (st-ss-major != 0 || - st-minor_version != 90) - fprintf(stderr, Name : Defaulting to version -%d.%d metadata\n, - st-ss-major, - st-minor_version); } freesize = st-ss-avail_size(st, ldsize 9); if (freesize == 0) { Index: mdadm-2.6.4/super0.c === --- mdadm-2.6.4.orig/super0.c +++ mdadm-2.6.4/super0.c @@ -820,9 +820,7 @@ static struct supertype *match_metadata_ st-minor_version = 90; st-max_devs = MD_SB_DISKS; if (strcmp(arg, 0) == 0 || - strcmp(arg, 0.90) == 0 || - strcmp(arg, default) == 0 - ) + strcmp(arg, 0.90) == 0) return st; st-minor_version = 9; /* flag for 'byte-swapped' */ Index: mdadm-2.6.4/super1.c === --- mdadm-2.6.4.orig/super1.c +++ mdadm-2.6.4/super1.c @@ -1143,7 +1143,7 @@ static struct supertype *match_metadata_ st-ss = super1; st-max_devs = 384; - if (strcmp(arg, 1.0) == 0) { + if (strcmp(arg, 1.0) == 0 || strcmp(arg, default) == 0) { st-minor_version = 0; return st; } - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use new sb type
On Jan 28 2008 18:19, David Greaves wrote: Jan Engelhardt wrote: This makes 1.0 the default sb type for new arrays. IIRC there was a discussion a while back on renaming mdadm options (google Time to deprecate old RAID formats?) and the superblocks to emphasise the location and data structure. Would it be good to introduce the new names at the same time as changing the default format/on-disk-location? The -e 1.0/1.1/1.2 is sufficient for me, I would not need --metadata 1 --metadata-layout XXX. So renaming options should definitely be a separate patch. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] md: constify function pointer tables
Signed-off-by: Jan Engelhardt [EMAIL PROTECTED] --- drivers/md/md.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index cef9ebd..6295b90 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -5033,7 +5033,7 @@ static int md_seq_show(struct seq_file *seq, void *v) return 0; } -static struct seq_operations md_seq_ops = { +static const struct seq_operations md_seq_ops = { .start = md_seq_start, .next = md_seq_next, .stop = md_seq_stop, -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Dec 7 2007 07:30, Nix wrote: On 6 Dec 2007, Jan Engelhardt verbalised: On Dec 5 2007 19:29, Nix wrote: On Dec 1 2007 06:19, Justin Piszcz wrote: RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) Well, your kernels must be on a 0.90-superblocked RAID-0 or RAID-1 device. It can't handle booting off 1.x superblocks nor RAID-[56] (not that I could really hope for the latter). If the superblock is at the end (which is the case for 0.90 and 1.0), then the offsets for a specific block on /dev/mdX match the ones for /dev/sda, so it should be easy to use lilo on 1.0 too, no? Sure, but you may have to hack /sbin/lilo to convince it to create the superblock there at all. It's likely to recognise that this is an md device without a v0.90 superblock and refuse to continue. (But I haven't tested it.) In that case, see above - move to a different bootloader. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Dec 5 2007 19:29, Nix wrote: On Dec 1 2007 06:19, Justin Piszcz wrote: RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) Well, your kernels must be on a 0.90-superblocked RAID-0 or RAID-1 device. It can't handle booting off 1.x superblocks nor RAID-[56] (not that I could really hope for the latter). If the superblock is at the end (which is the case for 0.90 and 1.0), then the offsets for a specific block on /dev/mdX match the ones for /dev/sda, so it should be easy to use lilo on 1.0 too, no? (Yes, it will not work with 1.1 or 1.2.) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Dec 1 2007 06:19, Justin Piszcz wrote: RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) , and then: /dev/sda1+sdb1 - /dev/md0 - swap /dev/sda2+sdb2 - /dev/md1 - /boot (ext3) /dev/sda3+sdb3 - /dev/md2 - / (xfs) All works fine, no issues... Quick question though, I turned off the machine, disconnected /dev/sda from the machine, boot from /dev/sdb, no problems, shows as degraded RAID1. Turn the machine off. Re-attach the first drive. When I boot my first partition either re-synced by itself or it was not degraded, was is this? If md0 was not touched (written to) after you disconnected sda, it also should not be in a degraded state. So two questions: 1) If it rebuilt by itself, how come it only rebuilt /dev/md0? So md1/md2 was NOT rebuilt? 2) If it did not rebuild, is it because the kernel knows it does not need to re-calculate parity etc for swap? Kernel does not know what's inside an md usually. And it should not try to be smart. I had to: mdadm /dev/md1 -a /dev/sda2 and mdadm /dev/md2 -a /dev/sda3 To rebuild the /boot and /, which worked fine, I am just curious though why it works like this, I figured it would be all or nothing. Devices are not automatically readded. Who knows, maybe you inserted a different disk into sda which you don't want to be overwritten. More info: Not using ANY initramfs/initrd images, everything is compiled into 1 kernel image (makes things MUCH simpler and the expected device layout etc is always the same, unlike initrd/etc). My expected device layout is also always the same, _with_ initrd. Why? Simply because mdadm.conf is copied to the initrd, and mdadm will use your defined order. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Dec 1 2007 07:12, Justin Piszcz wrote: On Sat, 1 Dec 2007, Jan Engelhardt wrote: On Dec 1 2007 06:19, Justin Piszcz wrote: RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) I like LILO :) LILO cares much less about disk layout / filesystems than GRUB does, so I would have expected LILO to cope with all sorts of superblocks. OTOH I would suspect GRUB to only handle 0.90 and 1.0, where the MDSB is at the end of the disk = the filesystem SB is at the very beginning. So two questions: 1) If it rebuilt by itself, how come it only rebuilt /dev/md0? So md1/md2 was NOT rebuilt? Correct. Well it should, after they are readded using -a. If they still don't, then perhaps another resync is in progress. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
On Dec 1 2007 06:26, Justin Piszcz wrote: I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) Why would you care about what's on the disk? fdisk, mkfs and the day-to-day operation will overwrite it _anyway_. (If you think the disk is not empty, you should look at it and copy off all usable warez beforehand :-) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PAGE_SIZE=8K and bitmap
Hi, a while back I reported a bug for 2.6.21 where creating an MD raid array with internal bitmap on a sparc64 system does not work. I have not yet heard back (or I forget); has this been addressed yet? (mdadm -C /dev/md0 -l 1 -n 2 -e 1.0 -b internal /dev/ram[01]) thanks, Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch v3 1/1] md: Software Raid autodetect dev list not array
On Aug 28 2007 06:08, Michael Evans wrote: Oh, I see. I forgot about the changelogs. I'd send out version 5 now, but I'm not sure what kernel version to make the patch against. 2.6.23-rc4 is on kernel.org and I don't see any git snapshots. 2.6.23-rc4 is a snapshot in itself, a tagged one at that. Just use git pull to get the latest, which is always good. Or git fetch; git checkout 2.6.23-rc4; if you need that particular one. Additionally I never could tell what git tree was the 'mainline' as it isn't labeled with such a keyword (at least in the list of git trees I saw). /torvalds/linux-2.6.git or so; yes, it's not clearly marked. Then again, why? /Mainline is tarballs for most people, and they don't want to go deeper ;-) Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch v2 1/1] md: Software Raid autodetect dev list not array
On Aug 26 2007 04:51, Michael J. Evans wrote: { - if (dev_cnt = 0 dev_cnt 127) - detected_devices[dev_cnt++] = dev; + struct detected_devices_node *node_detected_dev; + node_detected_dev = kzalloc(sizeof(*node_detected_dev), GFP_KERNEL);\ What's the \ good for, besides escaping the newline that is ignored as whitespace anyway? :-) @@ -5772,3 +5790,8 @@ static void autostart_arrays(int part) - for (i = 0; i dev_cnt; i++) { - dev_t dev = detected_devices[i]; - + /* FIXME: max 'int' #DEFINEd somewhere? not 0x7FFF ? */ + while (!list_empty(all_detected_devices) i_scanned 0x7FFF) { I doubt someone really has _that_ many devices. Of course, to be on the safer side, make it an unsigned int. That way, people could put in about 0xFFFE devs (which is even less likely than 0x7FFF) + i_scanned++; + node_detected_dev = list_entry(all_detected_devices.next, + struct detected_devices_node, list); + list_del(node_detected_dev-list); + dev = node_detected_dev-dev; + kfree(node_detected_dev); Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
On Aug 12 2007 20:21, [EMAIL PROTECTED] wrote: per the message below MD (or DM) would need to be modified to work reasonably well with one of the disk components being over an unreliable link (like a network link) Does not dm-multipath do something like that? are the MD/DM maintainers interested in extending their code in this direction? or would they prefer to keep it simpler by being able to continue to assume that the raid components are connected over a highly reliable connection? Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
On Aug 12 2007 09:39, [EMAIL PROTECTED] wrote: now, I am not an expert on either option, but three are a couple things that I would question about the DRDB+MD option 1. when the remote machine is down, how does MD deal with it for reads and writes? I suppose it kicks the drive and you'd have to re-add it by hand unless done by a cronjob. 2. MD over local drive will alternate reads between mirrors (or so I've been told), doing so over the network is wrong. Certainly. In which case you set write_mostly (or even write_only, not sure of its name) on the raid component that is nbd. 3. when writing, will MD wait for the network I/O to get the data saved on the backup before returning from the syscall? or can it sync the data out lazily Can't answer this one - ask Neil :) Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partion table recovery
On Jul 20 2007 07:35, Willy Tarreau wrote: On Fri, Jul 20, 2007 at 08:13:03AM +0300, Al Boldi wrote: As always, a good friend of mine managed to scratch my partion table by cat'ing /dev/full into /dev/sda. I was able to push him out of the way, but at least the first 100MB are gone. I can probably live without the first partion, but there are many partitions after that, which I hope should easily be recoverable. Go use GPT, it's got a backup copy of the ptable at then end of the disk ;-) I tried parted, but it's not working out for me. Does anybody know of a simple partition recovery tool, that would just scan the disk for lost partions? The best one is simply fdisk, because you can manually enter your cylinders numbers. You have to find by hand the beginning of each partition, and for this, you have to remember what filesystems you used and see how to identify them (using a magic). Then with an hex editor, you scan the disk to find such entries and note the possible sectors on a paper. Then comes fdisk. You create the part, exit and try to mount it. If it fails, fdisk again and try other values. Pretty easy: XFSB (offset +0), ReIsErFs2 (offset +0x10034), SWAPSPACE2 (offset +0xff6), FAT32 (offset +0x52h, maybe harder) / mkdofs (+0x3) ext2/3 is TOUGH. (sequence 0x53ef at +0x438 - quite ambiguous!) I've saved many disks that way, it may sound harder than it really is. It should not take you more than half an hour to get the first part. Knowing your approximate partitions size will help too. Good luck! Willy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Some RAID levels do not support bitmap
Hi, RAID levels 0 and 4 do not seem to like the -b internal. Is this intentional? Runs 2.6.20.2 on i586. (BTW, do you already have a PAGE_SIZE=8K fix?) 14:47 ichi:/dev # mdadm -C /dev/md0 -l 4 -e 1.0 -b internal -n 2 /dev/ram[01] mdadm: RUN_ARRAY failed: Input/output error mdadm: stopped /dev/md0 14:47 ichi:/dev # mdadm -C /dev/md0 -l 0 -e 1.0 -b internal -n 2 /dev/ram[01] mdadm: RUN_ARRAY failed: Cannot allocate memory mdadm: stopped /dev/md0 Right... md: bitmaps not supported for this level. Thanks, Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID SB 1.x autodetection
On May 31 2007 09:00, Bill Davidsen wrote: Hardly, with all the Fedora specific cruft. Anyway, there was a simple patch posted in RH bugzilla, so I've gone with that. I'm not sure what Fedora has to do with it, I like highly modularized systems. And that requires an initramfs to load all the required modules. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Creating RAID1 with bitmap fails
Hi, the following command strangely gives -EIO ... 12:27 sun:~ # mdadm -C /dev/md4 -l 1 -n 2 -e 1.0 -b internal /dev/ram0 missing md: md4: raid array is not clean -- starting background reconstruction md4: failed to create bitmap (-5) md: pers-run() failed ... mdadm: RUN_ARRAY failed: Input/output error mdadm: stopped /dev/md4 Leaving out -b internal creates the array. /dev/ram0 or /dev/sda5 - EIO happens on both. (But the disk is fine, like ram0) Where could I start looking? Linux sun 2.6.21-1.3149.al3.8smp #3 SMP Wed May 30 09:43:00 CEST 2007 sparc64 sparc64 sparc64 GNU/Linux mdadm 2.5.4 Thanks, Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Creating RAID1 with bitmap fails
On May 30 2007 22:05, Neil Brown wrote: the following command strangely gives -EIO ... 12:27 sun:~ # mdadm -C /dev/md4 -l 1 -n 2 -e 1.0 -b internal /dev/ram0 missing md: md4: raid array is not clean -- starting background reconstruction md4: failed to create bitmap (-5) md: pers-run() failed ... mdadm: RUN_ARRAY failed: Input/output error mdadm: stopped /dev/md4 Leaving out -b internal creates the array. /dev/ram0 or /dev/sda5 - EIO happens on both. (But the disk is fine, like ram0) Where could I start looking? Linux sun 2.6.21-1.3149.al3.8smp #3 SMP Wed May 30 09:43:00 CEST 2007 sparc64 sparc64 sparc64 GNU/Linux mdadm 2.5.4 I'm fairly sure this is fixed in 2.6.2. It is certainly worth a try. The same command works on a x86_64 with mdadm 2.5.3... Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID SB 1.x autodetection
On May 30 2007 16:35, Bill Davidsen wrote: On 29 May 2007, Jan Engelhardt uttered the following: from your post at http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07384.html I read that autodetecting arrays with a 1.x superblock is currently impossible. Does it at least work to force the kernel to always assume a 1.x sb? There are some 'broken' distros out there that still don't use mdadm in initramfs, and recreating the initramfs each time is a bit cumbersome... The kernel build system should be able to do that for you, shouldn't it? That would be an improvement, yes. Hardly, with all the Fedora specific cruft. Anyway, there was a simple patch posted in RH bugzilla, so I've gone with that. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Creating RAID1 with bitmap fails
On May 31 2007 09:09, Neil Brown wrote: the following command strangely gives -EIO ... 12:27 sun:~ # mdadm -C /dev/md4 -l 1 -n 2 -e 1.0 -b internal /dev/ram0 missing Where could I start looking? Linux sun 2.6.21-1.3149.al3.8smp #3 SMP Wed May 30 09:43:00 CEST 2007 sparc64 sparc64 sparc64 GNU/Linux mdadm 2.5.4 I'm fairly sure this is fixed in 2.6.2. It is certainly worth a try. The same command works on a x86_64 with mdadm 2.5.3... [ with 2.6.18.8 ] Are you sure? I suspect that the difference is more in the kernel version. mdadm used to create some arrays with the bitmap positioned so that it overlapped the data. Recent kernels check for that and reject the array if there is an overlap. mdadm-2.6.2 makes sure not to create any overlap. Regarding above x86_64/2.5.3/2.6.18.8 created array, is there a way to check whether it overlaps? Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID SB 1.x autodetection
Hi, from your post at http://www.mail-archive.com/linux-raid@vger.kernel.org/msg07384.html I read that autodetecting arrays with a 1.x superblock is currently impossible. Does it at least work to force the kernel to always assume a 1.x sb? There are some 'broken' distros out there that still don't use mdadm in initramfs, and recreating the initramfs each time is a bit cumbersome... Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 002 of 2] md: Improve the is_mddev_idle test
On May 10 2007 16:22, NeilBrown wrote: diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-10 15:51:54.0 +1000 +++ ./drivers/md/md.c 2007-05-10 16:05:10.0 +1000 @@ -5095,7 +5095,7 @@ static int is_mddev_idle(mddev_t *mddev) * * Note: the following is an unsigned comparison. */ - if ((curr_events - rdev-last_events + 4096) 8192) { + if ((long)curr_events - (long)rdev-last_events 4096) { rdev-last_events = curr_events; idle = 0; } What did really change? Unless I am seriously mistaken, curr_events - last_evens + 4096 8192 is mathematically equivalent to curr_events - last_evens 4096 The casting to (long) may however force a signed comparison which turns things quite upside down, and the comment does not apply anymore. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 002 of 2] md: Improve the is_mddev_idle test
On May 10 2007 20:04, Neil Brown wrote: - if ((curr_events - rdev-last_events + 4096) 8192) { + if ((long)curr_events - (long)rdev-last_events 4096) { rdev-last_events = curr_events; idle = 0; } /* sync IO will cause sync_io to increase before the disk_stats * as sync_io is counted when a request starts, and * disk_stats is counted when it completes. * So resync activity will cause curr_events to be smaller than * when there was no such activity. * non-sync IO will cause disk_stat to increase without * increasing sync_io so curr_events will (eventually) * be larger than it was before. Once it becomes * substantially larger, the test below will cause * the array to appear non-idle, and resync will slow * down. * If there is a lot of outstanding resync activity when * we set last_event to curr_events, then all that activity * completing might cause the array to appear non-idle * and resync will be slowed down even though there might * not have been non-resync activity. This will only * happen once though. 'last_events' will soon reflect * the state where there is little or no outstanding * resync requests, and further resync activity will * always make curr_events less than last_events. * */ Does that read at all well? It is a more verbose explanation of your patch description, yes. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please revert 5b479c91da90eef605f851508744bfe8269591a0 (md partition rescan)
On May 9 2007 18:51, Linus Torvalds wrote: (But Andrew never saw your email, I suspect: [EMAIL PROTECTED] is probably some strange mixup of Andrew Morton and Andi Kleen in your mind ;) What do the letters kp stand for? Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Chaining sg lists for big I/O commands: Question
On May 9 2007 15:38, Jens Axboe wrote: I am a mdadm/disk/hard drive fanatic, I was curious: On i386, we can at most fit 256 scatterlist elements into a page, and on x86-64 we are stuck with 128. So that puts us somewhere between 512kb and 1024kb for a single IO. How come 32bit is 256 and 64 is only 128? I am sure it is something very fundamental/simple but I was curious, I would think x86_64 would fit/support more scatterlists in a page. Because of the size of the scatterlist structure. As pointers are bigger on 64-bit archs, the scatterlist structure ends up being bigger. The page size on x86-64 is 4kb, hence the number of structures you can fit in a page is smaller. I take it this problem goes away on arches with 8KB page_size? Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/36] Use menuconfig objects II - MD
Change Kconfig objects from menu, config into menuconfig so that the user can disable the whole feature without having to enter the menu first. Signed-off-by: Jan Engelhardt [EMAIL PROTECTED] --- drivers/md/Kconfig | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) --- linux-2.6.21-mm_20070428.orig/drivers/md/Kconfig +++ linux-2.6.21-mm_20070428/drivers/md/Kconfig @@ -2,20 +2,18 @@ # Block device driver configuration # -if BLOCK - -menu Multi-device support (RAID and LVM) - -config MD +menuconfig MD bool Multiple devices driver support (RAID and LVM) + depends on BLOCK select ASYNC_TX_DMA help Support multiple physical spindles through a single logical device. Required for RAID and logical volume management. +if MD + config BLK_DEV_MD tristate RAID support - depends on MD ---help--- This driver lets you combine several hard disk partitions into one logical block device. This can be used to simply append one @@ -190,7 +188,6 @@ config MD_FAULTY config BLK_DEV_DM tristate Device mapper support - depends on MD ---help--- Device-mapper is a low level volume manager. It works by allowing people to specify mappings for ranges of logical sectors. Various @@ -272,6 +269,4 @@ config DM_DELAY If unsure, say N. -endmenu - -endif +endif # MD - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID rebuild on Create
Hi list, when a user does `mdadm -C /dev/md0 -l any -n whatever fits devices`, the array gets rebuilt for at least RAID1 and RAID5, even if the disk contents are most likely not of importance (otherwise we would not be creating a raid array right now). Could not this needless resync be skipped - what do you think? Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID rebuild on Create
On Apr 30 2007 11:19, Dan Williams wrote: when a user does `mdadm -C /dev/md0 -l any -n whatever fits devices`, the array gets rebuilt for at least RAID1 and RAID5, even if the disk contents are most likely not of importance (otherwise we would not be creating a raid array right now). Could not this needless resync be skipped - what do you think? If you want his behavior you can always create the array with a 'missing' device to hold off the resync process. Otherwise, if all disks are available, why not let the array make forward progress to a protected state? Having a device missing in the array does not create a protected state as you call it. What I was out at - can't the array wait with syncing blocks until I have actually written something there for the first time? This should not impact protection. _Especially_ if one starts out with blank disks. Then the resync process copies zeroes to zeroes (before we even run mkfs). And it chews a bit on PCI bandwidth. Also, the resync thread automatically yields to new data coming into the array, so you can effectively sync an array by writing to all the blocks. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RAID rebuild on Create
On Apr 30 2007 13:54, [EMAIL PROTECTED] wrote: But then the array needs to keep track of where data is so that it knows what is good and what is bad. I assume it knows that, because you can reboot while an array is still syncing and it Does The Right Thing. Furthermore, there is also the relatively new mdadm -b option. Instead it takes the array to a known good state to start out with and you don't have to start out with blank disks. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid10 kernel panic on sparc64
On Apr 12 2007 14:26, David Miller wrote: From: Jan Engelhardt [EMAIL PROTECTED] Date: Mon, 2 Apr 2007 02:15:57 +0200 (MEST) Kernel is kernel-smp-2.6.16-1.2128sp4.sparc64.rpm from Aurora Corona. Perhaps it helps, otherwise hold your breath until I reproduce it. Jan, if you can reproduce this with the current 2.6.20 vanilla kernel I'd be very interested in a full trace so that I can try to fix this. With the combination of an old kernel and only part of the crash trace, there isn't much I can do with this report. Hi David, I have not forgotten this issue, but the fact that there is not any serial console attached right now makes it kinda hard to get the system back up in case I let it oops. Apologies for the delay it takes. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/30] Use menuconfig objects - MD
Use menuconfigs instead of menus, so the whole menu can be disabled at once instead of going through all options. Signed-off-by: Jan Engelhardt [EMAIL PROTECTED] Index: linux-2.6.21-rc5/drivers/md/Kconfig === --- linux-2.6.21-rc5.orig/drivers/md/Kconfig +++ linux-2.6.21-rc5/drivers/md/Kconfig @@ -2,19 +2,17 @@ # Block device driver configuration # -if BLOCK - -menu Multi-device support (RAID and LVM) - -config MD +menuconfig MD bool Multiple devices driver support (RAID and LVM) + depends on BLOCK help Support multiple physical spindles through a single logical device. Required for RAID and logical volume management. +if MD + config BLK_DEV_MD tristate RAID support - depends on MD ---help--- This driver lets you combine several hard disk partitions into one logical block device. This can be used to simply append one @@ -189,7 +187,6 @@ config MD_FAULTY config BLK_DEV_DM tristate Device mapper support - depends on MD ---help--- Device-mapper is a low level volume manager. It works by allowing people to specify mappings for ranges of logical sectors. Various @@ -262,6 +259,4 @@ config DM_MULTIPATH_EMC ---help--- Multipath support for EMC CX/AX series hardware. -endmenu - -endif +endif # MD #EOF - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid1 does not seem faster
Hello list, normally, I'd think that combining drives into a raid1 array would give me at least a little improvement in read speed. In my setup however, this does not seem to be the case. 14:16 opteron:/var/log # hdparm -t /dev/sda Timing buffered disk reads: 170 MB in 3.01 seconds = 56.52 MB/sec 14:17 opteron:/var/log # hdparm -t /dev/md3 Timing buffered disk reads: 170 MB in 3.01 seconds = 56.45 MB/sec (and dd_rescue shows the same numbers) The raid array was created using # mdadm -C /dev/md3 -b internal -e 1.0 -l 1 -n 2 /dev/sd[ab]3 Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid10 kernel panic on sparc64
Hi, just when I did # mdadm -C /dev/md2 -b internal -e 1.0 -l 10 -n 4 /dev/sd[cdef]4 (created) # mdadm -D /dev/md2 Killed dmesg filled up with a kernel oops. A few seconds later, the box locked solid. Since I was only in by ssh and there is not (yet) any possibility to reset it remotely, this is all I can give right now, the last 80x25 screen: l4: l5: l6: l7: 0i0: f8007f218d18 i1: f8002e3d9608 i2: 0047f974 i3: 0i4: i5: 006e2800 i6: f80008c12a41 i7: 00526 I7: elv_next_request+0x94/0x188 Caller[005263e8]: elv_next_request+0x94/0x188 Caller[10086618]: scsi_request_fn+0x60/0x3f4 [scsi_mod] Caller[00529b70]: __generic_unplug_device+0x34/0x3c Caller[0052a7d4]: generic_unplug_device+0x14/0x2c Caller[00526e48]: blk_backing_dev_unplug+0x20/0x28 Caller[004a464c]: block_sync_page+0x64/0x6c Caller[0047f9d0]: sync_page+0x64/0x74 Caller[00677e48]: __wait_on_bit_lock+0x58/0x90 Caller[0047f86c]: __lock_page+0x54/0x5c Caller[004802ec]: do_generic_mapping_read+0x204/0x49c Caller[00480d68]: __generic_file_aio_read+0x120/0x18c Caller[00481fdc]: generic_file_read+0x70/0x94 Caller[004a3920]: vfs_read+0xa0/0x14c Caller[004a3c8c]: sys_read+0x34/0x60 Caller[00406c54]: linux_sparc_syscall32+0x3c/0x40 Caller[0003c6b4]: 0x3c6bc Instruction DUMP: 921022bd 7c0e4ea2 90122098 91d02005 80a0a020 1848000c 80 [10281cdc] sync_request+0x898/0x8e4 [raid10] [005f6fb4] md_do_sync+0x454/0x89c [005f69ec] md_thread+0x100/0x11c Kernel is kernel-smp-2.6.16-1.2128sp4.sparc64.rpm from Aurora Corona. Perhaps it helps, otherwise hold your breath until I reproduce it. Thanks, Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reshaping raid0/10
On Mar 10 2007 12:21, H. Peter Anvin wrote: Neil Brown wrote: If I wanted to reshape a raid0, I would just morph it into a raid4 with a missing parity drive, then use the raid5 code to restripe it. Then morph it back to regular raid0. Wow, that made my brain hurt. Given the fact that we're going to have to do this on kernel.org soon, what would be the concrete steps involved (we're going to have to change 3-member raid0 into 4-member raid0)... Iff it was in the kernel, mdadm -G would suffice. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reshaping raid0/10
On Feb 22 2007 06:59, Neil Brown wrote: On Wednesday February 21, [EMAIL PROTECTED] wrote: are there any plans to support reshaping on raid0 and raid10? No concrete plans. It largely depends on time and motivation. I expect that the various flavours of raid5/raid6 reshape will come first. Then probably converting raid0-raid5. I really haven't given any thought to how you might reshape a raid10... It should not be any different from raid0/raid5 reshaping, should it? Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20: stripe_cache_size goes boom with 32mb
On Feb 23 2007 06:41, Justin Piszcz wrote: I was able to Alt-SysRQ+b but I could not access the console/X/etc, it appeared to be frozen. No sysrq+t? (Ah, unblanking might hang.) Well, netconsole/serial to the rescue, then ;-) Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Reshaping raid0/10
Hello, are there any plans to support reshaping on raid0 and raid10? Jan -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
unknown ioctl32 cmd
Hi, this line in mdadm-2.5.4 Detail.c: 185: ioctl(fd, GET_BITMAP_FILE, bmf) == 0 causes a dmesg warning when running `mdadm -D /dev/md0`: ioctl32(mdadm:2946): Unknown cmd fd(7) cmd(5915){10} arg(ff2905d0) on /dev/md0 on Aurora Linux corona_2.90 with 2.6.18-1.2798.al3.1smp(sparc64). The raid array was created using `mdadm -C /dev/md0 -l 1 -n 2 missing /dev/sdb2 -e 1.0`. Given that case GET_BITMAP_FILE is handled in (2.6.18.5), I wonder what exactly is causing this. Keep me on Cc, but you always do that. Thanks :) -`J' -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 002 of 4] md: Define -congested_fn for raid1, raid10, and multipath
Since we're all about nits, I'll do my part: diff .prev/drivers/md/multipath.c ./drivers/md/multipath.c --- .prev/drivers/md/multipath.c 2006-08-29 14:52:50.0 +1000 +++ ./drivers/md/multipath.c 2006-08-29 14:33:34.0 +1000 @@ -228,6 +228,28 @@ static int multipath_issue_flush(request rcu_read_unlock(); return ret; } +static int multipath_congested(void *data, int bits) Missing white line. diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c --- .prev/drivers/md/raid1.c 2006-08-29 14:52:50.0 +1000 +++ ./drivers/md/raid1.c 2006-08-29 14:26:59.0 +1000 @@ -601,6 +601,32 @@ static int raid1_issue_flush(request_que return ret; } +static int raid1_congested(void *data, int bits) +{ + mddev_t *mddev = data; + conf_t *conf = mddev_to_conf(mddev); + int i, ret = 0; + + rcu_read_lock(); + for (i = 0; i mddev-raid_disks; i++) { + mdk_rdev_t *rdev = rcu_dereference(conf-mirrors[i].rdev); + if (rdev !test_bit(Faulty, rdev-flags)) { + request_queue_t *q = bdev_get_queue(rdev-bdev); + + /* Note the '|| 1' - when read_balance prefers + * non-congested targets, it can be removed + */ + if ((bits (1BDI_write_congested)) || 1) + ret |= bdi_congested(q-backing_dev_info, bits); + else + ret = bdi_congested(q-backing_dev_info, bits); + } + } + rcu_read_unlock(); + return ret; +} + + /* Barriers And one white line too much, but YMMV ;-) Jan Engelhardt -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: modifying degraded raid 1 then re-adding other members is bad
Why we're updating it BACKWARD in the first place? To avoid writing to spares when it isn't needed - some people want their spare drives to go to sleep. That sounds a little dangerous. What if it decrements below 0? Jan Engelhardt -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 Reshape Status + xfs_growfs = Success! (2.6.17.3)
On Jul 11 2006 12:03, Justin Piszcz wrote: Subject: Raid5 Reshape Status + xfs_growfs = Success! (2.6.17.3) Now we just need shrink-reshaping and xfs_shrinkfs... :) Jan Engelhardt -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)
Hm, what's superblock 0.91? It is not mentioned in mdadm.8. Not sure, the block version perhaps? Well yes of course, but what characteristics? The manual only lists 0, 0.90, default Use the original 0.90 format superblock. This format limits arrays to 28 componenet devices and limits compoâ– nent devices of levels 1 and greater to 2 terabytes. 1, 1.0, 1.1, 1.2 Use the new version-1 format superblock. This has few restrictions.The different subversion store the superblock at different locations on the device, either at the end (for 1.0), at the start (for 1.1) or 4K from the start (for 1.2). No 0.91 :( (My mdadm is 2.2, but the problem remains in 2.5.2) Jan Engelhardt --
Re: Kernel 2.6.17 and RAID5 Grow Problem (critical section backup)
md3 : active raid5 sdc1[7] sde1[6] sdd1[5] hdk1[2] hdi1[4] hde1[3] hdc1[1] hda1[0] 2344252416 blocks super 0.91 level 5, 512k chunk, algorithm 2 [8/8] [] [] reshape = 0.2% (1099280/390708736) finish=1031.7min speed=6293K/sec It is working, thanks! Hm, what's superblock 0.91? It is not mentioned in mdadm.8. Jan Engelhardt -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 000 of 5] md: Introduction
personally, I think this this useful functionality, but my personal preference is that this would be in DM/LVM2 rather than MD. but given Neil is the MD author/maintainer, I can see why he'd prefer to do it in MD. :) Why don't MD and DM merge some bits? Jan Engelhardt -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html