Hi Renee,

On 20/05/2009 01:21, Renee Danson wrote:
> While thinking about the dhcp_wait_time property, and whether or not we
> still need it in phase 1, I came up with some more questions about how
> we deal with NCU priority group mode.
> 
> For NCUs with activation-mode set to prioritized: each NCU is assigned
> a priority group number; one or more links may have the same number.
> Each prioritized link also has a priority group mode (exclusive, shared,
> all).  The mode determines how many of the group members must be available
> in order for the group to be considered available: exclusive means that
> one member must be available, and at most one will be enabled; shared
> means that one member must be available, and any that are available will
> be enabled; all means that all members must be available and will be
> enabled.
> 
> So far, so good.  The question is: how do we define "available"?  The
> most obvious answer is the link is up.  This works for ethernet devices
> that report link state, but falls down with wireless devices, which
> aren't really up until connected.  So when deciding which NCUs should
> be enabled, we consider wireless links to always be available.  If we
> are unable to complete a connection for any reason, we must then fall
> back to the next choice on our priority list.

Certainly the iwh and ath drivers don't seem to flag a LINK as UP unless they
are actually connected - but maybe all wireless drivers aren't the same. Alan
added a test at the LINK level (IIRC) that checks if a wireless device is
connected, this would seem to be the correct thing to use to make the decision
w.r.t. the wireless LINK NCU and whether it's UP or not. I think that assuming
all wireless links are available is a mistake.

>From a *user* perspective, the network (as a whole) is not available until they
have an IP address and is able to ping/browse www.google.com ;)

This also applies - and will probably be an RFE we need to address at some point
- where any application needs to be able to "phone home" and wants to ask the
question "Do I have a network connection?", if so then do some action. An
example where this is currently relevant would be with IPS and it's automatic
check for updates - this needs a way to know when it's on-line (and maybe
whether it's connected to the internet or an internal LAN?)...

> 
> But there's another catch.  The phase 0/0.5 policy conflates link and
> interface configuration, so if, for example, you have a wired link
> that's up, but cannot obtain a dhcp address on it, nwamd will (after a
> timeout) fall back to the next available device.  But in our priority
> group scheme, with the default policy (which is supposed to match that
> of phase 0/0.5) in place, as long as one ethernet device has link, we
> consider that priority group available and active.  No need to try
> anything else.
> 
> To resolve this, I think we need to make our NCU condition checking a
> little more complicated, unfortunately.  I think we need to extend
> the state check of a link NCU to include the state of the associated
> interface NCU.  In other words, both the link:bge0 and interface:bge0
> must be online in order for the link:bge0 component of its priority
> group to be considered online.

This is certainly how the GUI presents things to the user - since he GUI
combines the LINK and IP NCUs into a single representation (that's what a User
expects) - so we view a specific device to not be fully available unless both
the LINK and IP NCUs are ready - it certainly would be good if nwamd considered
this too...

> 
> We do need to allow time for this to happen, though.  So I suspect
> we still need something like the dhcp_wait_time value (which probably
> needs to be a tunable property) to bound the time we'll wait on the
> link/interface pair to become online.  If after the timeout one or
> the other is still in the offline* state, we should leave it there,
> but move on to the next priority group and start trying to bring up
> links there.

I agree whole-heartedly about this approach.

As for the time to allow - why does it need to be any different to the
dhcp_wait_time value? Surely this is the same thing in the end, or is it that
this value is too long for this usage?

If it's the latter, then it would make sense (and I believe this is the impl,
but not 100% sure) that all the interfaces in a group are brought up in parallel
so that the switch between groups is as short as possible, i.e. not some
multiple of the timeout value.

Thanks,

Darren.

Reply via email to