On Wed, May 20, 2009 at 09:49:50AM +0100, Darren Kenny wrote:
> Hi Renee,
> 
> On 20/05/2009 01:21, Renee Danson wrote:
> > While thinking about the dhcp_wait_time property, and whether or not we
> > still need it in phase 1, I came up with some more questions about how
> > we deal with NCU priority group mode.
> > 
> > For NCUs with activation-mode set to prioritized: each NCU is assigned
> > a priority group number; one or more links may have the same number.
> > Each prioritized link also has a priority group mode (exclusive, shared,
> > all).  The mode determines how many of the group members must be available
> > in order for the group to be considered available: exclusive means that
> > one member must be available, and at most one will be enabled; shared
> > means that one member must be available, and any that are available will
> > be enabled; all means that all members must be available and will be
> > enabled.
> > 
> > So far, so good.  The question is: how do we define "available"?  The
> > most obvious answer is the link is up.  This works for ethernet devices
> > that report link state, but falls down with wireless devices, which
> > aren't really up until connected.  So when deciding which NCUs should
> > be enabled, we consider wireless links to always be available.  If we
> > are unable to complete a connection for any reason, we must then fall
> > back to the next choice on our priority list.
> 
> Certainly the iwh and ath drivers don't seem to flag a LINK as UP unless they
> are actually connected - but maybe all wireless drivers aren't the same. Alan
> added a test at the LINK level (IIRC) that checks if a wireless device is
> connected, this would seem to be the correct thing to use to make the decision
> w.r.t. the wireless LINK NCU and whether it's UP or not. I think that assuming
> all wireless links are available is a mistake.

By "available" I mean that a connection attempt can be made on it.
Because it takes explicit action, and possibly user interaction, to
get to that connected state, nwamd needs to make a decision about
whether or not it should even try to connect; that's the decision
I'm talking about here.

> >From a *user* perspective, the network (as a whole) is not available until 
> >they
> have an IP address and is able to ping/browse www.google.com ;)

Of course; that's the problem I was trying to get to.

> This also applies - and will probably be an RFE we need to address at some 
> point
> - where any application needs to be able to "phone home" and wants to ask the
> question "Do I have a network connection?", if so then do some action. An
> example where this is currently relevant would be with IPS and it's automatic
> check for updates - this needs a way to know when it's on-line (and maybe
> whether it's connected to the internet or an internal LAN?)...

Yeah, the thing that's hard about that is that there's really no way
to answer the question "Do I have a network connection?".  We can say
that you have link, or that you are connected to a wlan (though the
hardware/drivers lie about each of those in some cases); we can say
that we have an assigned IP address; we can say that you have a route,
or that in the very recent past we were able to contact a name server,
or a particular address on the network.  But a) that can change at any
moment, with no visible indication; and b) none of it definitively
answers whether you can load www.google.com, or any other particular
website, in your browser.

The definitions we're using here are: a link is up if it has carrier
(ethernet) or is connected to a wlan (wifi).  An interface is up if
its underlying link is up and it has at least one address assigned
to it.

> > But there's another catch.  The phase 0/0.5 policy conflates link and
> > interface configuration, so if, for example, you have a wired link
> > that's up, but cannot obtain a dhcp address on it, nwamd will (after a
> > timeout) fall back to the next available device.  But in our priority
> > group scheme, with the default policy (which is supposed to match that
> > of phase 0/0.5) in place, as long as one ethernet device has link, we
> > consider that priority group available and active.  No need to try
> > anything else.
> > 
> > To resolve this, I think we need to make our NCU condition checking a
> > little more complicated, unfortunately.  I think we need to extend
> > the state check of a link NCU to include the state of the associated
> > interface NCU.  In other words, both the link:bge0 and interface:bge0
> > must be online in order for the link:bge0 component of its priority
> > group to be considered online.
> 
> This is certainly how the GUI presents things to the user - since he GUI
> combines the LINK and IP NCUs into a single representation (that's what a User
> expects) - so we view a specific device to not be fully available unless both
> the LINK and IP NCUs are ready - it certainly would be good if nwamd 
> considered
> this too...

The way the GUI presents things to the user can't be a driver for the
internal architecture.  There are valid reasons for separating links
from interfaces; most particularly because it won't always be the case
that all links will have ip plumbed on them.  Future link types will
build on each other; if you create an aggregation link above two physical
links, you can't plumb ip on those physical links.

> > We do need to allow time for this to happen, though.  So I suspect
> > we still need something like the dhcp_wait_time value (which probably
> > needs to be a tunable property) to bound the time we'll wait on the
> > link/interface pair to become online.  If after the timeout one or
> > the other is still in the offline* state, we should leave it there,
> > but move on to the next priority group and start trying to bring up
> > links there.
> 
> I agree whole-heartedly about this approach.
> 
> As for the time to allow - why does it need to be any different to the
> dhcp_wait_time value? Surely this is the same thing in the end, or is it that
> this value is too long for this usage?

I think the current default value--60 seconds--is probably a reasonable
default.  Though with it now including wlan connection time, it may not
be long enough.  We want to avoid creating unnecessary churn, while at
the same time prevent long waits for something to happen.  We'll likely
need to play with it once we have things working.

Thanks for the feedback,
renee

> If it's the latter, then it would make sense (and I believe this is the impl,
> but not 100% sure) that all the interfaces in a group are brought up in 
> parallel
> so that the switch between groups is as short as possible, i.e. not some
> multiple of the timeout value.
> 
> Thanks,
> 
> Darren.

Reply via email to