Re: let md auto-detect 128+ raid members, fix potential race condition
On Tue, Aug 01, 2006 at 06:32:33PM -0300, Alexandre Oliva wrote: Sure enough the LVM subsystem could make things better for one to not need all of the PVs in the root-containing VG in order to be able to mount root read-write, or at all, but if you think about it, if initrd it shouldn't need all of the PVs you just need all the pv where the rootfs is. is set up such that you only bring up the devices that hold the actual root device within the VG and then you change that, say by taking a snapshot of root, moving it around, growing it, etc, you'd be better off if you could still boot. So you do want all of the VG members to be around, just in case. in this case just regenerate the initramfs after modifying the vg that contains root. I am fairly sure that kernel upgrades are far more frequent than the addirion of PVs to the root VG. Yes, this is an argument against root on LVM, but there are arguments *for* root on LVM as well, and there's no reason to not support both behaviors equally well and let people figure out what works best for them. No, this is just an argument against misusing root on lvm. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: let md auto-detect 128+ raid members, fix potential race condition
On Tue, Aug 01, 2006 at 05:46:38PM -0300, Alexandre Oliva wrote: Using the mkinitrd patch that I posted before, the result was that mdadm did try to bring up all raid devices but, because the raid456 module was not loaded in initrd, the raid devices were left inactive. probably your initrd is broken, it should not have even tried to bring up an md array that was not needed to mount root. Then, when rc.sysinit tried to bring them up with mdadm -A -s, that did nothing to the inactive devices, since they didn't have to be assembled. Adding --run didn't help. My current work-around is to add raid456 to initrd, but that's ugly. Scanning /proc/mdstat for inactive devices in rc.sysinit and doing mdadm --run on them is feasible, but it looks ugly and error-prone. Would it be reasonable to change mdadm so as to, erhm, disassemble ;-) the raid devices it tried to bring up but that, for whatever reason, it couldn't activate? (say, missing module, not enough members, whatever) this would make sense if it were an option, patches welcome :) L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: let md auto-detect 128+ raid members, fix potential race condition
On Aug 1, 2006, Bill Davidsen <[EMAIL PROTECTED]> wrote: > I rarely think you are totally wrong about anything RAID, but I do > believe you have missed the point of autodetect. It is intended to > work as it does now, building the array without depending on some user > level functionality. Well, it clearly depends on at least some user level functionality (the ioctl that triggers autodetect). Going from that to a full-fledged mdadm doesn't sound like such a big deal to me. > I don't personally see the value of autodetect for putting together > the huge number of drives people configure. I see this as a way to > improve boot reliability, if someone needs 64 drives for root and > boot, they need to read a few essays on filesystem > configuration. However, I'm aware that there are some really bizarre > special cases out there. There's LVM. If you have to keep root out of the VG just because people say so, you lose lots of benefits from LVM, such as being able to grow root with the system running, take snapshots of root, etc. Sure enough the LVM subsystem could make things better for one to not need all of the PVs in the root-containing VG in order to be able to mount root read-write, or at all, but if you think about it, if initrd is set up such that you only bring up the devices that hold the actual root device within the VG and then you change that, say by taking a snapshot of root, moving it around, growing it, etc, you'd be better off if you could still boot. So you do want all of the VG members to be around, just in case. This is trivially-accomplished for regular disks whose drivers are loaded by initrd, but for raid devices, you need to tentatively bring up every raid member you can, just in case some piece of root is there, otherwise you may end up unable to boot. Yes, this is an argument against root on LVM, but there are arguments *for* root on LVM as well, and there's no reason to not support both behaviors equally well and let people figure out what works best for them. -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ Secretary for FSF Latin Americahttp://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org} - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: let md auto-detect 128+ raid members, fix potential race condition
On Aug 1, 2006, Michael Tokarev <[EMAIL PROTECTED]> wrote: > Alexandre Oliva wrote: > [] >> If mdadm can indeed scan all partitions to bring up all raid devices >> in them, like nash's raidautorun does, great. I'll give that a try, > Never, ever, try to do that (again). Mdadm (or vgscan, or whatever) > should NOT assemble ALL arrays found, but only those which it has > been told to assemble. This is it again: you bring another disk into > a system (disk which comes from another machine), and mdadm finds > FOREIGN arrays and brings them up as /dev/md0, where YOUR root > filesystem should be. That's what 'homehost' option is for, for > example. Exactly. So make it /all/all local/, if you must. It's the same as far as I'm concerned. > If initrd should be reconfigured after some changes (be it raid > arrays, LVM volumes, hostname, whatever), -- I for one am fine > with that. Feel free to be fine with it, as long as you also let me be free to not be fine with it and try to cut a better deal :-) -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ Secretary for FSF Latin Americahttp://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org} - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: let md auto-detect 128+ raid members, fix potential race condition
On Aug 1, 2006, Alexandre Oliva <[EMAIL PROTECTED]> wrote: >> I'll give it a try some time tomorrow, since I won't turn on that >> noisy box today any more; my daughter is already asleep :-) > But then, I could use my own desktop to test it :-) But then, I wouldn't be testing quite the same scenario. My boot-required RAID devices were all raid 1, whereas the larger, separate volume group was all raid 6. Using the mkinitrd patch that I posted before, the result was that mdadm did try to bring up all raid devices but, because the raid456 module was not loaded in initrd, the raid devices were left inactive. Then, when rc.sysinit tried to bring them up with mdadm -A -s, that did nothing to the inactive devices, since they didn't have to be assembled. Adding --run didn't help. My current work-around is to add raid456 to initrd, but that's ugly. Scanning /proc/mdstat for inactive devices in rc.sysinit and doing mdadm --run on them is feasible, but it looks ugly and error-prone. Would it be reasonable to change mdadm so as to, erhm, disassemble ;-) the raid devices it tried to bring up but that, for whatever reason, it couldn't activate? (say, missing module, not enough members, whatever) -- Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/ Secretary for FSF Latin Americahttp://www.fsfla.org/ Red Hat Compiler Engineer [EMAIL PROTECTED], gcc.gnu.org} Free Software Evangelist [EMAIL PROTECTED], gnu.org} - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 9] md: Factor out part of raid10d into a separate function.
On Tuesday August 1, [EMAIL PROTECTED] wrote: > don't think this is better, NeilBrown wrote: > > >raid10d has t many nested block, so take the fix_read_error > >functionality out into a separate function. > > > > > > Definite improvement in readability. Will all versions of the compiler > do something appropriate WRT inlining or not? As the separated function is called about once in a blue moon, it hardly matters. I'd probably rather it wasn't inlined so as to be sure it doesn't clutter the L-1 cache when it isn't needed, but that's the sort of thing I really want to leave to the compiler. Maybe it would be good to stick an 'unlikely' or 'likely' in raid10d to tell the compiler how likely a read error is... NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: let md auto-detect 128+ raid members, fix potential race condition
Neil Brown wrote: [linux-raid added to cc. Background: patch was submitted to remove the current hard limit of 127 partitions that can be auto-detected - limit set by 'detected_devices array in md.c. ] My first inclination is not to fix this problem. I consider md auto-detect to be a legacy feature. I don't use it and I recommend that other people don't use it. However I cannot justify removing it, so it stays there. Having this limitation could be seen as a good motivation for some more users to stop using it. Why not use auto-detect? I have three issues with it. 1/ It just isn't "right". We don't mount filesystems from partitions just because they have type 'Linux'. We don't enable swap on partitions just because they have type 'Linux swap'. So why do we assemble md/raid from partitions that have type 'Linux raid autodetect'? I rarely think you are totally wrong about anything RAID, but I do believe you have missed the point of autodetect. It is intended to work as it does now, building the array without depending on some user level functionality. The name "autodetect" clearly differentiates this type from the others you mentioned, there is no implication that swap or Linux partitions should do anything automatically. This is not a case of my using a feature and defending it, I don't use it currently. for all of the reasons you enumerate. That doesn't mean that I haven't used the autodetect in the past or that I won't in the future, particularly with embedded systems. 2/ It can cause problems when moving devices. If you have two machines, both with an 'md0' array and you move the drives from one on to the other - say because the first lost a powersupply - and then reboot the machine that received the drives, which array gets assembled as 'md0' ?? You might be lucky, you might not. This isn't purely theoretical - there have been pleas for help on linux-raid resulting from exactly this - though they have been few. 3/ The information redundancy can cause a problem when it gets out of sync. i.e. you add a partition to a raid array without setting the partition type to 'fd'. This works, but on the next reboot the partition doesn't get added back into the array and you have to manually add it yourself. This too is not purely theory - it has been reported slightly more often than '2'. So my preferred solution to the problem is to tell people not to use autodetect. Quite possibly this should be documented in the code, and maybe even have a KERN_INFO message if more than 64 devices are autodetected. I don't personally see the value of autodetect for putting together the huge number of drives people configure. I see this as a way to improve boot reliability, if someone needs 64 drives for root and boot, they need to read a few essays on filesystem configuration. However, I'm aware that there are some really bizarre special cases out there. Maybe the limit should be in KCONFIG, with a default of 16 or so. -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 005 of 9] md: Replace magic numbers in sb_dirty with well defined bit flags
Ingo Oeser wrote: Hi Neil, I think the names in this patch don't match the description at all. May I suggest different ones? On Monday, 31. July 2006 09:32, NeilBrown wrote: Instead of magic numbers (0,1,2,3) in sb_dirty, we have some flags instead: MD_CHANGE_DEVS Some device state has changed requiring superblock update on all devices. MD_SB_STALE or MD_SB_NEED_UPDATE I think STALE is better, it is unambigous. MD_CHANGE_CLEAN The array has transitions from 'clean' to 'dirty' or back, requiring a superblock update on active devices, but possibly not on spares Maybe split this into MD_SB_DIRTY and MD_SB_CLEAN ? I don't think the split is beneficial, but I don't care for the name much. Some name like SB_UPDATE_NEEDED or the like might be better. MD_CHANGE_PENDING A superblock update is underway. MD_SB_PENDING_UPDATE I would have said UPDATE_PENDING, but either is more descriptive than the original. Neil - the logic in this code is pretty complex, all the help you can give the occasional reader, by using very descriptive names for things, is helpful to the reader and reduces your "question due to misunderstanding" load. -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 004 of 9] md: Factor out part of raid10d into a separate function.
don't think this is better, NeilBrown wrote: raid10d has t many nested block, so take the fix_read_error functionality out into a separate function. Definite improvement in readability. Will all versions of the compiler do something appropriate WRT inlining or not? -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: let md auto-detect 128+ raid members, fix potential race condition
Alexandre Oliva wrote: [] > If mdadm can indeed scan all partitions to bring up all raid devices > in them, like nash's raidautorun does, great. I'll give that a try, Never, ever, try to do that (again). Mdadm (or vgscan, or whatever) should NOT assemble ALL arrays found, but only those which it has been told to assemble. This is it again: you bring another disk into a system (disk which comes from another machine), and mdadm finds FOREIGN arrays and brings them up as /dev/md0, where YOUR root filesystem should be. That's what 'homehost' option is for, for example. If initrd should be reconfigured after some changes (be it raid arrays, LVM volumes, hostname, whatever), -- I for one am fine with that. Hopefully no one will argue that if you forgot to install an MBR into your replacement drive, it was entirely your own fault that your system become unbootable, after all ;) /mjt - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html