http://defect.opensolaris.org/bz/show_bug.cgi?id=12084


amaguire <alan.maguire at sun.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |CAUSEKNOWN
                 CC|                            |alan.maguire at sun.com
         AssignedTo|nwam-dev at opensolaris.org    |alan.maguire at sun.com


--- Comment #2 from amaguire <alan.maguire at sun.com> 2009-10-21 10:52:48 UTC 
---
(In reply to comment #1)
> I haven't find any problems in Gui side by now, but I will continue to 
> evaluate
> it. I saw nwamd was trying to re-initialize nge0, but failed. So I reassign to
> nwamd to see if something can be found there.

examining the nwamd log, we initialize nge0 and start a DHCP request and place
the link NCU in (offline*,waiting for addr). Then (ln927 of the log) we get a
LINK_DOWN event:

Oct 19 17:49:06 octagon nwamd[68]: [ID 961658 daemon.debug] 1: (8126a08)
link:nge0: running method for event LINK_STATE
Oct 19 17:49:06 octagon nwamd[68]: [ID 530693 daemon.debug] 1:
nwamd_ncu_handle_link_state_event: got LINK DOWN for priority group 1
Oct 19 17:49:06 octagon nwamd[68]: [ID 223478 daemon.info] 1:
nwamd_object_set_state: state event (online*, interface/link is down) for
link:nge0

These events can only result from DL_NOTE_LINK_DOWN notifications, so it seems
like we got a genuine link down event around when we started configuring nge0.
As a consequence we unplumb nge0.

Later, we get a LINK_UP event which should fix things, but it's ignored:

Oct 19 17:49:07 octagon nwamd[68]: [ID 720546 daemon.debug] 1: (8119a08)
link:nge0: running method for event LINK_STATE
Oct 19 17:49:07 octagon nwamd[68]: [ID 695749 daemon.debug] 1:
nwamd_ncu_handle_link_state_event: got LINK UP event for priority group 1, less
preferred than current 0, ignoring

The problem is that there is no priority group 0 in the User NCP. We get to
this point as a result of running out of priority groups, denoted here:

Oct 19 17:49:06 octagon nwamd[68]: [ID 784275 daemon.debug] 1:
nwamd_ncp_find_next_priority_group: no priority groups >= 2 exist
Oct 19 17:49:06 octagon nwamd[68]: [ID 941809 daemon.debug] 1: ran out of prio
groups
Oct 19 17:49:06 octagon nwamd[68]: [ID 489249 daemon.debug] 1:
nwamd_ncp_activate_priority_group: activating priority group 0

We end up with 0 because nwamd_ncp_check_priority_group() fails (since the
LINK_DOWN has meant no NCUs are active in priority group 1), and as a
consequence we end up here:

773         /*
774          * Nothing unique could be started so try them all.  Once one
775          * of them gets into a reasonable state then we will prune
776          * everything below it (see first part of this conditional).
777          */
778         prio = 0;
779         do {
780             nwamd_ncp_activate_priority_group(prio);
781         } while (nwamd_ncp_find_next_priority_group(prio + 1, &prio));

We are unlucky in that the link state event arrived just as we tried to
activate priority group 0 - as a consequence it was ignored since it referred
to a less preferred priority group. It should never be the case that we try and
activate a priority group that doesn't exist. I think the right approach here
(replacing lns778-781)

prio = INVALID_PRIORITY_GROUP

while (nwamd_ncp_find_next_priority_group(prio)) {
       nwamd_ncp_activate_priority_group(prio);
}


If this were done, we would never get to a point where the priority group was
invalid and the LINK_STATE event ignored. A separate matter is why the link
state events are happening, but I think fixing the priority group issue will
make things less brittle wrt link state events.

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.

Reply via email to