Re: let md auto-detect 128+ raid members, fix potential race condition

2006-08-01 Thread Luca Berra

On Tue, Aug 01, 2006 at 06:32:33PM -0300, Alexandre Oliva wrote:

Sure enough the LVM subsystem could make things better for one to not
need all of the PVs in the root-containing VG in order to be able to
mount root read-write, or at all, but if you think about it, if initrd

it shouldn't need all of the PVs you just need all the pv where the
rootfs is.


is set up such that you only bring up the devices that hold the actual
root device within the VG and then you change that, say by taking a
snapshot of root, moving it around, growing it, etc, you'd be better
off if you could still boot.  So you do want all of the VG members to
be around, just in case.

in this case just regenerate the initramfs after modifying the vg that
contains root. I am fairly sure that kernel upgrades are far more
frequent than the addirion of PVs to the root VG.


Yes, this is an argument against root on LVM, but there are arguments
*for* root on LVM as well, and there's no reason to not support both
behaviors equally well and let people figure out what works best for
them.


No, this is just an argument against misusing root on lvm.

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: let md auto-detect 128+ raid members, fix potential race condition

2006-08-01 Thread Luca Berra

On Tue, Aug 01, 2006 at 05:46:38PM -0300, Alexandre Oliva wrote:

Using the mkinitrd patch that I posted before, the result was that
mdadm did try to bring up all raid devices but, because the raid456
module was not loaded in initrd, the raid devices were left inactive.


probably your initrd is broken, it should not have even tried to bring
up an md array that was not needed to mount root.


Then, when rc.sysinit tried to bring them up with mdadm -A -s, that
did nothing to the inactive devices, since they didn't have to be
assembled.  Adding --run didn't help.

My current work-around is to add raid456 to initrd, but that's ugly.
Scanning /proc/mdstat for inactive devices in rc.sysinit and doing
mdadm --run on them is feasible, but it looks ugly and error-prone.

Would it be reasonable to change mdadm so as to, erhm, disassemble ;-)
the raid devices it tried to bring up but that, for whatever reason,
it couldn't activate?  (say, missing module, not enough members,
whatever)


this would make sense if it were an option, patches welcome :)

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: let md auto-detect 128+ raid members, fix potential race condition

2006-08-01 Thread Alexandre Oliva
On Aug  1, 2006, Bill Davidsen <[EMAIL PROTECTED]> wrote:

> I rarely think you are totally wrong about anything RAID, but I do
> believe you have missed the point of autodetect. It is intended to
> work as it does now, building the array without depending on some user
> level functionality.

Well, it clearly depends on at least some user level functionality
(the ioctl that triggers autodetect).  Going from that to a
full-fledged mdadm doesn't sound like such a big deal to me.

> I don't personally see the value of autodetect for putting together
> the huge number of drives people configure. I see this as a way to
> improve boot reliability, if someone needs 64 drives for root and
> boot, they need to read a few essays on filesystem
> configuration. However, I'm aware that there are some really bizarre
> special cases out there.

There's LVM.  If you have to keep root out of the VG just because
people say so, you lose lots of benefits from LVM, such as being able
to grow root with the system running, take snapshots of root, etc.

Sure enough the LVM subsystem could make things better for one to not
need all of the PVs in the root-containing VG in order to be able to
mount root read-write, or at all, but if you think about it, if initrd
is set up such that you only bring up the devices that hold the actual
root device within the VG and then you change that, say by taking a
snapshot of root, moving it around, growing it, etc, you'd be better
off if you could still boot.  So you do want all of the VG members to
be around, just in case.

This is trivially-accomplished for regular disks whose drivers are
loaded by initrd, but for raid devices, you need to tentatively bring
up every raid member you can, just in case some piece of root is
there, otherwise you may end up unable to boot.

Yes, this is an argument against root on LVM, but there are arguments
*for* root on LVM as well, and there's no reason to not support both
behaviors equally well and let people figure out what works best for
them.

-- 
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
Secretary for FSF Latin Americahttp://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: let md auto-detect 128+ raid members, fix potential race condition

2006-08-01 Thread Alexandre Oliva
On Aug  1, 2006, Michael Tokarev <[EMAIL PROTECTED]> wrote:

> Alexandre Oliva wrote:
> []
>> If mdadm can indeed scan all partitions to bring up all raid devices
>> in them, like nash's raidautorun does, great.  I'll give that a try,

> Never, ever, try to do that (again).  Mdadm (or vgscan, or whatever)
> should NOT assemble ALL arrays found, but only those which it has
> been told to assemble.  This is it again: you bring another disk into
> a system (disk which comes from another machine), and mdadm finds
> FOREIGN arrays and brings them up as /dev/md0, where YOUR root
> filesystem should be.  That's what 'homehost' option is for, for
> example.

Exactly.  So make it /all/all local/, if you must.  It's the same as
far as I'm concerned.

> If initrd should be reconfigured after some changes (be it raid
> arrays, LVM volumes, hostname, whatever), -- I for one am fine
> with that.

Feel free to be fine with it, as long as you also let me be free to
not be fine with it and try to cut a better deal :-)

-- 
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
Secretary for FSF Latin Americahttp://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: let md auto-detect 128+ raid members, fix potential race condition

2006-08-01 Thread Alexandre Oliva
On Aug  1, 2006, Alexandre Oliva <[EMAIL PROTECTED]> wrote:

>> I'll give it a try some time tomorrow, since I won't turn on that
>> noisy box today any more; my daughter is already asleep :-)

> But then, I could use my own desktop to test it :-)

But then, I wouldn't be testing quite the same scenario.

My boot-required RAID devices were all raid 1, whereas the larger,
separate volume group was all raid 6.

Using the mkinitrd patch that I posted before, the result was that
mdadm did try to bring up all raid devices but, because the raid456
module was not loaded in initrd, the raid devices were left inactive.

Then, when rc.sysinit tried to bring them up with mdadm -A -s, that
did nothing to the inactive devices, since they didn't have to be
assembled.  Adding --run didn't help.

My current work-around is to add raid456 to initrd, but that's ugly.
Scanning /proc/mdstat for inactive devices in rc.sysinit and doing
mdadm --run on them is feasible, but it looks ugly and error-prone.

Would it be reasonable to change mdadm so as to, erhm, disassemble ;-)
the raid devices it tried to bring up but that, for whatever reason,
it couldn't activate?  (say, missing module, not enough members,
whatever)

-- 
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
Secretary for FSF Latin Americahttp://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 9] md: Factor out part of raid10d into a separate function.

2006-08-01 Thread Neil Brown
On Tuesday August 1, [EMAIL PROTECTED] wrote:
> don't think this is better, NeilBrown wrote:
> 
> >raid10d has t many nested block, so take the fix_read_error
> >functionality out into a separate function.
> >  
> >
> 
> Definite improvement in readability. Will all versions of the compiler 
> do something appropriate WRT inlining or not?

As the separated function is called about once in a blue moon, it
hardly matters.  I'd probably rather it wasn't inlined so as to be
sure it doesn't clutter the L-1 cache when it isn't needed, but that's
the sort of thing I really want to leave to the compiler.

Maybe it would be good to stick an 'unlikely' or 'likely' in raid10d
to tell the compiler how likely a read error is...

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: let md auto-detect 128+ raid members, fix potential race condition

2006-08-01 Thread Bill Davidsen

Neil Brown wrote:


[linux-raid added to cc.
Background: patch was submitted to remove the current hard limit
of 127 partitions that can be auto-detected - limit set by 
'detected_devices array in md.c.

]

My first inclination is not to fix this problem.

I consider md auto-detect to be a legacy feature.
I don't use it and I recommend that other people don't use it.
However I cannot justify removing it, so it stays there.
Having this limitation could be seen as a good motivation for some
more users to stop using it.

Why not use auto-detect?
I have three issues with it.

1/
   It just isn't "right".  We don't mount filesystems from partitions
   just because they have type 'Linux'.  We don't enable swap on
   partitions just because they have type 'Linux swap'.  So why do we
   assemble md/raid from partitions that have type 'Linux raid
   autodetect'? 
 



I rarely think you are totally wrong about anything RAID, but I do 
believe you have missed the point of autodetect. It is intended to work 
as it does now, building the array without depending on some user level 
functionality. The name "autodetect" clearly differentiates this type 
from the others you mentioned, there is no implication that swap or 
Linux partitions should do anything automatically.


This is not a case of my using a feature and defending it, I don't use 
it currently. for all of the reasons you enumerate. That doesn't mean 
that I haven't used the autodetect in the past or that I won't in the 
future, particularly with embedded systems.


2/ 
   It can cause problems when moving devices.  If you have two

   machines, both with an 'md0' array and you move the drives from one
   on to the other - say because the first lost a powersupply - and
   then reboot the machine that received the drives, which array gets
   assembled as 'md0' ?? You might be lucky, you might not. This
   isn't purely theoretical - there have been pleas for help on
   linux-raid resulting from exactly this - though they have been
   few. 

3/ 
   The information redundancy can cause a problem when it gets out of

   sync.  i.e. you add a partition to a raid array without setting
   the partition type to 'fd'.  This works, but on the next reboot
   the partition doesn't get added back into the array and you have
   to manually add it yourself.
   This too is not purely theory - it has been reported slightly more
   often than '2'.

So my preferred solution to the problem is to tell people not to use
autodetect.  Quite possibly this should be documented in the code, and
maybe even have a KERN_INFO message if more than 64 devices are
autodetected. 
 

I don't personally see the value of autodetect for putting together the 
huge number of drives people configure. I see this as a way to improve 
boot reliability, if someone needs 64 drives for root and boot, they 
need to read a few essays on filesystem configuration. However, I'm 
aware that there are some really bizarre special cases out there.


Maybe the limit should be in KCONFIG, with a default of 16 or so.

--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 005 of 9] md: Replace magic numbers in sb_dirty with well defined bit flags

2006-08-01 Thread Bill Davidsen

Ingo Oeser wrote:


Hi Neil,

I think the names in this patch don't match the description at all.
May I suggest different ones?

On Monday, 31. July 2006 09:32, NeilBrown wrote:
 


Instead of magic numbers (0,1,2,3) in sb_dirty, we have
some flags instead:
MD_CHANGE_DEVS
  Some device state has changed requiring superblock update
  on all devices.
   



MD_SB_STALE or MD_SB_NEED_UPDATE
 


I think STALE is better, it is unambigous.

 


MD_CHANGE_CLEAN
  The array has transitions from 'clean' to 'dirty' or back,
  requiring a superblock update on active devices, but possibly
  not on spares
   



Maybe split this into MD_SB_DIRTY and MD_SB_CLEAN ?
 

I don't think the split is beneficial, but I don't care for the name 
much. Some name like SB_UPDATE_NEEDED or the like might be better.


 


MD_CHANGE_PENDING
  A superblock update is underway.  
   



MD_SB_PENDING_UPDATE

 

I would have said UPDATE_PENDING, but either is more descriptive than 
the original.


Neil - the logic in this code is pretty complex, all the help you can 
give the occasional reader, by using very descriptive names for things, 
is helpful to the reader and reduces your "question due to 
misunderstanding" load.


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 004 of 9] md: Factor out part of raid10d into a separate function.

2006-08-01 Thread Bill Davidsen

don't think this is better, NeilBrown wrote:


raid10d has t many nested block, so take the fix_read_error
functionality out into a separate function.
 



Definite improvement in readability. Will all versions of the compiler 
do something appropriate WRT inlining or not?


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: let md auto-detect 128+ raid members, fix potential race condition

2006-08-01 Thread Michael Tokarev
Alexandre Oliva wrote:
[]
> If mdadm can indeed scan all partitions to bring up all raid devices
> in them, like nash's raidautorun does, great.  I'll give that a try,

Never, ever, try to do that (again).  Mdadm (or vgscan, or whatever)
should NOT assemble ALL arrays found, but only those which it has
been told to assemble.  This is it again: you bring another disk into
a system (disk which comes from another machine), and mdadm finds
FOREIGN arrays and brings them up as /dev/md0, where YOUR root
filesystem should be.  That's what 'homehost' option is for, for
example.

If initrd should be reconfigured after some changes (be it raid
arrays, LVM volumes, hostname, whatever), -- I for one am fine
with that.  Hopefully no one will argue that if you forgot to
install an MBR into your replacement drive, it was entirely your
own fault that your system become unbootable, after all ;)

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html