http://defect.opensolaris.org/bz/show_bug.cgi?id=12084
amaguire <alan.maguire at sun.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |CAUSEKNOWN
CC| |alan.maguire at sun.com
AssignedTo|nwam-dev at opensolaris.org |alan.maguire at sun.com
--- Comment #2 from amaguire <alan.maguire at sun.com> 2009-10-21 10:52:48 UTC
---
(In reply to comment #1)
> I haven't find any problems in Gui side by now, but I will continue to
> evaluate
> it. I saw nwamd was trying to re-initialize nge0, but failed. So I reassign to
> nwamd to see if something can be found there.
examining the nwamd log, we initialize nge0 and start a DHCP request and place
the link NCU in (offline*,waiting for addr). Then (ln927 of the log) we get a
LINK_DOWN event:
Oct 19 17:49:06 octagon nwamd[68]: [ID 961658 daemon.debug] 1: (8126a08)
link:nge0: running method for event LINK_STATE
Oct 19 17:49:06 octagon nwamd[68]: [ID 530693 daemon.debug] 1:
nwamd_ncu_handle_link_state_event: got LINK DOWN for priority group 1
Oct 19 17:49:06 octagon nwamd[68]: [ID 223478 daemon.info] 1:
nwamd_object_set_state: state event (online*, interface/link is down) for
link:nge0
These events can only result from DL_NOTE_LINK_DOWN notifications, so it seems
like we got a genuine link down event around when we started configuring nge0.
As a consequence we unplumb nge0.
Later, we get a LINK_UP event which should fix things, but it's ignored:
Oct 19 17:49:07 octagon nwamd[68]: [ID 720546 daemon.debug] 1: (8119a08)
link:nge0: running method for event LINK_STATE
Oct 19 17:49:07 octagon nwamd[68]: [ID 695749 daemon.debug] 1:
nwamd_ncu_handle_link_state_event: got LINK UP event for priority group 1, less
preferred than current 0, ignoring
The problem is that there is no priority group 0 in the User NCP. We get to
this point as a result of running out of priority groups, denoted here:
Oct 19 17:49:06 octagon nwamd[68]: [ID 784275 daemon.debug] 1:
nwamd_ncp_find_next_priority_group: no priority groups >= 2 exist
Oct 19 17:49:06 octagon nwamd[68]: [ID 941809 daemon.debug] 1: ran out of prio
groups
Oct 19 17:49:06 octagon nwamd[68]: [ID 489249 daemon.debug] 1:
nwamd_ncp_activate_priority_group: activating priority group 0
We end up with 0 because nwamd_ncp_check_priority_group() fails (since the
LINK_DOWN has meant no NCUs are active in priority group 1), and as a
consequence we end up here:
773 /*
774 * Nothing unique could be started so try them all. Once one
775 * of them gets into a reasonable state then we will prune
776 * everything below it (see first part of this conditional).
777 */
778 prio = 0;
779 do {
780 nwamd_ncp_activate_priority_group(prio);
781 } while (nwamd_ncp_find_next_priority_group(prio + 1, &prio));
We are unlucky in that the link state event arrived just as we tried to
activate priority group 0 - as a consequence it was ignored since it referred
to a less preferred priority group. It should never be the case that we try and
activate a priority group that doesn't exist. I think the right approach here
(replacing lns778-781)
prio = INVALID_PRIORITY_GROUP
while (nwamd_ncp_find_next_priority_group(prio)) {
nwamd_ncp_activate_priority_group(prio);
}
If this were done, we would never get to a point where the priority group was
invalid and the LINK_STATE event ignored. A separate matter is why the link
state events are happening, but I think fixing the priority group issue will
make things less brittle wrt link state events.
--
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.