Michael Hunter wrote: > On Tue, 22 Sep 2009 16:40:46 +0100 > Alan Maguire <Alan.Maguire at Sun.COM> wrote: > > >> fresh webrev in place, I've retested and >> it seems to resolve all the issues described >> in 11437, 11092 and 10707. We're seeing >> some weirdness with DHCPv6 and autoconf >> that John has kindly spent some time investigating, >> this likely results from us having to rely on regular >> router advertisements rather than sending >> solicitations when IPv6 is plumbed. The same >> behaviour is observed outside of NWAM so it >> doesn't seem to be an NWAM-specific issue. >> >> One other weird thing we see on John's test machine - >> DHCPv4 takes a long time to respond sometimes >> (>60sec), despite nwamd doing the right thing in >> starting DHCP during the configuration phase. >> Webrev is at: >> >> http://zhadum.east/export/ws/amaguire/nwam1-fixes/webrev/ >> > [...] > > ncu.c:205 This should be reworded to something like "In our current > implementation reinitialization needs to start from scratch in order to > make management of NCUs rational." or something like that. > > done. > ncu.c:251 Why? You could have received a state{less,full} address by > now. > > This got me thinking. I've been going back and forth between two approaches in fixing these issues. The first, represented in this webrev, is to note if any static addresses are configured and go online directly if some are configured. The second is to leave that to the handle_if_state() function, so when it sees a static address assigned, it moves us online. The second approach is better, being more general, but I'd been seeing some weirdness with spurious RTM_DELADDRs following RTM_NEWADDRs (I haven't fully determined the source of these yet).
I've gone back to the latter approach, reinstating the code that drops RTM_DELADDR IF_STATE events which occur when the address isn't really gone. Doing this gets rid of the issue where we were bouncing from online to offline as the mysterious RTM_DELADDRs made us think we had no addresses left (this only happens with static addresses which makes me think the cause is in the order in which we do things in add_ipaddr(), but I'm not sure. ifconfig.c sets prefixlen etc before setting the address, that may help here). In looking more closely at nwamd_ncu_handle_if_state_event() I've found a major issue - we're trying to map an address to a logical interface number by walking the current configuration to find which lifnum has the address assigned to it. Problem is, if it's an RTM_DELADDR, the address is gone and we bail from the IF_STATE handling function without marking the NCU down. The solution is just to use the logical interface name from the IF_STATE event. I also noticed there was a bug in events.c where we try to set the nwe_name argument to the link name rather than the logical interface name (removing the :<lifnum>) but never do. I've fixed this so the nwe_name field is the link name and the nwe_ifname (a new field in the structure) contains the logical ifname. This has allowed us to remove a bunch of unneeded code and to icfg_open() the lif directly. > ncu_ip.c:342 (curiosity) Was setting the flags back to what they were > causing a problem? > > No, but when I was casting around for reasons for the spurious RTM_DELADDRs I figured mimimizing the cases where we change the flags made sense. > ncu_ip.c:390 the prefix argument can go away since you are getting the whole > newaddr message > > Good point, I've removed it. > ncu_ip.c:1286,1294,1294,1407 This must never happen in testing since > you alloc the address on the first go around and free it every time > after that ;) > > Good catch! I've relocated the retry: label. I've retested for autoconf, and for the various combinations of static/dhcp and static/autoconf v6, and will see if I can retest on John's rig later today when he's about. All looks good. Revised webrev at: http://zhadum.east/export/ws/amaguire/nwam1-fixes/webrev/ Thanks for the review! Alan
