[nwam-dev] [Bug 14332] nwam doesn't recover from dhcpagent failures

[email protected] Tue, 2 Feb 2010 22:43:21 GMT

http://defect.opensolaris.org/bz/show_bug.cgi?id=14332



Renee Danson Sommerfeld <renee.danson at sun.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |CAUSEKNOWN
         AssignedTo|nwam-dev at opensolaris.org    |renee.danson at sun.com


--- Comment #1 from Renee Danson Sommerfeld <renee.danson at sun.com> 
2010-02-02 22:43:20 UTC ---
The particular situation I'm looking at occurs only on the first boot after
I've done an image-update, and appears to be specific to systems that have an
eventhook script.

I believe the problem originates when the nwamd gets a SIGHUP soon after
acquiring a dhcp lease.  In this case, the interface and link are
re-initialized; but the dhcp release, followed quickly by an unplumb of the
interface, cause problems for dhcpagent--which is still processing the start
request, because the eventhook script is still running.  By the time nwamd
tries to start dhcp again on the interface, dhcpagent has gotten into a state
where it still has a reference to the original lif; because of this, the start
request fails.

nwamd sees the failure; but it sees it in the thread that was spawned to start
dhcp, and does nothing about it.

An initial fix is to simply retry the start request when this particular
failure occurs.  The eventhook script eventually finishes, and dhcpagent clears
up its state; so subsequent start requests do work.

Future work (which I'll file as a separate bug) should involve at least
feedback into the state machine, or perhaps more general monitoring of network
state: does it match what nwamd thinks it should be?

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.

[nwam-dev] [Bug 14332] nwam doesn't recover from dhcpagent failures

Reply via email to