Michael Hunter wrote:
> On Wed, 23 Sep 2009 12:10:44 +0100
> Alan Maguire <Alan.Maguire at Sun.COM> wrote:
>
>
>> Michael Hunter wrote:
>>
>>> On Tue, 22 Sep 2009 16:40:46 +0100
>>> Alan Maguire <Alan.Maguire at Sun.COM> wrote:
>>>
> [...]
>
>>> ncu.c:251 Why? You could have received a state{less,full} address by
>>> now.
>>>
>>>
>>>
>> This got me thinking. I've been going back and
>> forth between two approaches in fixing these issues.
>> The first, represented in this webrev, is to note if
>> any static addresses are configured and go online
>> directly if some are configured. The second is to
>> leave that to the handle_if_state() function, so when
>> it sees a static address assigned, it moves us online.
>> The second approach is better, being more general,
>>
>
> I hadn't thought along these lines but I agree it is more general.
>
> I also think it is a specific example of a more general problem in our
> implementation. We do some stuff immediately and other stuff based on
> an event/work queue. This creates event sequences where cause and
> effect get muddled.
>
>
>> but I'd been seeing some weirdness with spurious
>> RTM_DELADDRs following RTM_NEWADDRs
>> (I haven't fully determined the source of these yet).
>>
>> I've gone back to the latter approach, reinstating
>> the code that drops RTM_DELADDR IF_STATE events
>> which occur when the address isn't really gone.
>>
>
> This is really using new info to filter old events. It could be seen
> as a way of debouncing the event stream. I would rather figure out
> where the extra events are coming from. But as long as we document
> what we are doing clearly I don't see an issue with doing this for now.
>
>
>> Doing
>> this gets rid of the issue where we were bouncing from
>> online to offline as the mysterious RTM_DELADDRs
>> made us think we had no addresses left (this only
>> happens with static addresses which makes me
>> think the cause is in the order in which we do
>> things in add_ipaddr(), but I'm not sure. ifconfig.c
>> sets prefixlen etc before setting the address, that
>> may help here).
>>
>
> Write an CR to track.
>
>
>> In looking more closely at nwamd_ncu_handle_if_state_event()
>> I've found a major issue - we're trying to map an
>> address to a logical interface number by walking the
>> current configuration to find which lifnum has the
>> address assigned to it. Problem is, if it's an
>> RTM_DELADDR, the address is gone and we bail
>>
>
> I remember seeing this and thought I'd gotten rid of it. Sorry.
>
>
>> from the IF_STATE handling function without
>> marking the NCU down. The solution is just to use
>> the logical interface name from the IF_STATE event.
>>
>
> Important to make sure that a deletion of an address just triggers us
> reevaluating the NCU, not necessarily taking it down.
>
>
sorry, I should have clarified - it'll only
take the NCU down if there's no configured
addresses left IIRC (interface_ncu_up_down()
deals with this).
>> I also noticed there was a bug in events.c where we
>> try to set the nwe_name argument to the link name
>> rather than the logical interface name (removing the
>> :<lifnum>) but never do. I've fixed this so the
>> nwe_name field is the link name and the nwe_ifname
>> (a new field in the structure) contains the logical ifname.
>> This has allowed us to remove a bunch of unneeded code
>> and to icfg_open() the lif directly.
>>
>
> great!
>
>
>>> ncu_ip.c:342 (curiosity) Was setting the flags back to what they were
>>> causing a problem?
>>>
>>>
>>>
>> No, but when I was casting around for reasons
>> for the spurious RTM_DELADDRs I figured
>> mimimizing the cases where we change
>> the flags made sense.
>>
>
> understood
>
>
>>> ncu_ip.c:390 the prefix argument can go away since you are getting the whole
>>> newaddr message
>>>
>>>
>>>
>> Good point, I've removed it.
>>
>>> ncu_ip.c:1286,1294,1294,1407 This must never happen in testing since
>>> you alloc the address on the first go around and free it every time
>>> after that ;)
>>>
>>>
>>>
>> Good catch! I've relocated the retry: label.
>>
>> I've retested for autoconf, and for the various
>> combinations of static/dhcp and static/autoconf
>> v6, and will see if I can retest on John's rig later
>> today when he's about. All looks good.
>> Revised webrev at:
>>
>> http://zhadum.east/export/ws/amaguire/nwam1-fixes/webrev/
>>
>> Thanks for the review!
>>
>
> ncu_ip.c:1133 We probably can't get ourselves into the situation where
> DHCP is on anything other then the physical interface, but we will
> probably have to allow for it later. We need a CR to track that issue.
>
>
I'll file this and the RTM_DELADDR bug in
the morning.
> ncu_ip.c:1224 The thing I dislike most about this is that we spin on
> the retry. There are probably a bunch of blocking operations between
> the retry and the goto so it isn't a hard spin. Maybe better if we can
> schedule this via the work queue for a retry in a second?
>
>
True. I'll push things as they are for now and create
an RFE for this.
> Looks good.
>
>
Thanks!
Alan