On Tue, Aug 18, 2009 at 05:35:05PM -0400, Anurag S. Maskey wrote:
>
>
> Renee Danson Sommerfeld wrote:
>>> code review for 10682 Need a policy / process for handling duplicated
>>> (already used) addresses
>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=10682
>>>
>>> webrev is at
>>> http://zhadum.east/export/ws/am223141/temp/nwam1-work/webrev/
>>>
>> ncu_ip.c, line 681: Do we know at this point that the DELADDR was for
>> a v4 address? Even if that's the case, collisions are possible on v6
>> interfaces, as well, so we need these checks for both v4 and v6. I'm
>> guessing we probably want to pull the af out of the sockaddr in the
>> event data, rather than always using AF_INET.
>>
> IPv6 - always forgetting that. Thanks for pointing that out. I've
> changed the protocol line to:
>
> intf.if_protocol = evm->data.if_state.addr.ss_family;
Looks good.
>> general question: are there other cases where we put objects into
>> maintenance state? I know we've talked about different cases where
>> that might make sense, but I've never actually seen it happen, so I'm
>> wondering if we've actually implemented that before. In any case,
>> I'm just thinking this is an area we need to test carefully, especially
>> if we're treading new ground with the use of maintenance state here.
>>
> These are the situations where an object enters MAINTENANCE state
> (auxiliary state in parenthesis):
>
> * IP NCU: duplicate address (duplicate address detected)
> * Loc: smf_refresh_instance() returns non-zero (method/service
> enable failed). Doesn't mean refresh failed, just the function call.
> * ENM: start script missing (method or FMRI not specified)
> * ENM: start/stop script fails (method/service enable failed)
> * ENM: invalid or missing FMRI (invalid configuration values)
> * ENM: smf_{restart,enable,restore}_instance() returns non-zero
> (method/service enable failed)
> * ENM: failure to create thread to run scripts (method/service
> enable failed)
Ah, of course, ENMs!
Since this is the first time we're doing it with NCUs, though, it's
probably worth some extra testing. What's the recovery path? Need
to make sure we come out of the state cleanly when appropriate. The
two cases I can think of are
- if the user fixes things by changing the (duplicate) static addr
assigned to the ncu
- if the user fixes things by shutting down/correcting the other
system. In this case, I think expecting a refresh of the nwam
service is reasonable; need to make sure that clears things up.
Does that make sense? Have you already done all this and I'm rambling
on needlessly? :-)
-renee
>
>