Michael Hunter wrote:
> On Tue, 22 Sep 2009 16:40:46 +0100
> Alan Maguire <Alan.Maguire at Sun.COM> wrote:
>
>   
>> fresh webrev in place, I've retested and
>> it seems to resolve all the issues described
>> in 11437, 11092 and 10707. We're seeing
>> some weirdness with DHCPv6 and autoconf
>> that John has kindly spent some time investigating,
>> this likely results from us having to rely on regular
>> router advertisements rather than sending
>> solicitations when IPv6 is plumbed. The same
>> behaviour is observed outside of NWAM so it
>> doesn't seem to be an NWAM-specific issue.
>>
>> One other weird thing we see on John's test machine -
>> DHCPv4 takes a long time to respond sometimes
>> (>60sec), despite nwamd doing the right thing in
>> starting DHCP during the configuration phase.
>> Webrev is at:
>>
>> http://zhadum.east/export/ws/amaguire/nwam1-fixes/webrev/
>>     
> [...]
>
> ncu.c:205 This should be reworded to something like "In our current
> implementation reinitialization needs to start from scratch in order to
> make management of NCUs rational." or something like that.
>
>   
done.
> ncu.c:251 Why?  You could have received a state{less,full} address by
> now.
>
>   
This got me thinking. I've been going back and
forth between two approaches in fixing these issues.
The first, represented in this webrev, is to note if
any static addresses are configured and go online
directly if some are configured. The second is to
leave that to the handle_if_state() function, so when
it sees a static address assigned, it moves us online.
The second  approach is better, being more general,
but I'd been seeing some weirdness with spurious
RTM_DELADDRs following RTM_NEWADDRs
(I haven't fully determined the source of these yet).

I've gone back to the latter approach, reinstating
the code that drops RTM_DELADDR IF_STATE events
which occur when the address isn't really gone. Doing
this gets rid of the issue where we were bouncing from
online to offline as the mysterious RTM_DELADDRs
made us think we had no addresses left (this only
happens with static addresses which makes me
think the cause is in the order in which we do
things in add_ipaddr(), but I'm not sure. ifconfig.c
sets prefixlen etc before setting the address, that
may help here).

In looking more closely at nwamd_ncu_handle_if_state_event()
I've found a major issue - we're trying to map an
address to a logical interface number by walking the
current configuration to find which lifnum has the
address assigned to it. Problem is, if it's an
RTM_DELADDR, the address is gone and we bail
from the IF_STATE handling function without
marking the NCU down. The solution is just to use
the logical interface name from the IF_STATE event.
I also noticed there was a bug in events.c where we
try to set the nwe_name argument to the link name
rather than the logical interface name (removing the
:<lifnum>) but never do. I've fixed this so the
nwe_name field is the link name and the nwe_ifname
(a new field in the structure) contains the logical ifname.
This has allowed us to remove a bunch of unneeded code
and to icfg_open() the lif directly.
> ncu_ip.c:342 (curiosity) Was setting the flags back to what they were
> causing a problem?
>
>   
No, but when I was casting around for reasons
for the spurious RTM_DELADDRs I figured
mimimizing the cases where we change
the flags made sense.
> ncu_ip.c:390 the prefix argument can go away since you are getting the whole
> newaddr message
>
>   
Good point, I've removed it.
> ncu_ip.c:1286,1294,1294,1407 This must never happen in testing since
> you alloc the address on the first go around and free it every time
> after that ;)
>
>   
Good catch! I've relocated the retry: label.

I've retested for autoconf, and for the various
combinations of static/dhcp and static/autoconf
v6, and will see if I can retest on John's rig later
today when he's about. All looks good.
Revised webrev at:

http://zhadum.east/export/ws/amaguire/nwam1-fixes/webrev/

Thanks for the review!

Alan

Reply via email to