http://defect.opensolaris.org/bz/show_bug.cgi?id=14244
amaguire <alan.maguire at sun.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |alan.maguire at sun.com
--- Comment #5 from amaguire <alan.maguire at sun.com> 2010-01-30 14:19:37 UTC
---
Regarding the nge0 issues:
Renee noticed we are ignoring link state events for nge0. We ignore link state
events for priority groups less than the current. The link state up event for
nge0 (which is alone in exclusive priority mode 2) comes in just _after_ the
link down events for e1000g0/1 and just before the link state event for nge1.
The thing is, e1000g0,1 and nge1 are all in the shared priority group 1, so we
shouldn't move priority group until such a time as all of them go down.
When we do our NCU check after all this (ln4451 in the log), we correctly
determine the conditions for priority group 1 to be active are not met.
nwam_activate_ncus() calls nwamd_ncp_check_priority_group() with -1, and we
start with priority group 1. This fails since the number of shared online NCUs
is 0. Then we move on to check priority group 2, this fails and we run out of
priority groups:
Jan 28 18:43:08 zomby nwamd[436]: [ID 784279 daemon.debug] 1:
nwamd_ncp_find_next_priority_group: no priority groups >= 3 exist
Jan 28 18:43:08 zomby nwamd[436]: [ID 941809 daemon.debug] 1: ran out of prio
groups
At this point we try and activate each priority group in turn,
getting to group 2 at line 4484:
Jan 28 18:43:08 zomby nwamd[436]: [ID 489251 daemon.debug] 1:
nwamd_ncp_activate_priority_group: activating priority group 2
At this point, we reinit nge0, get the link state event and we go
online and propogate the event to IP. It's specified to use DHCP
for v4, and we start DHCP. We then get a flurry of interface flag
events for v4 and v6 and a bunch of noise from VRRP, followed by
a link down and IFF_RUNNING 0x800 disappears from the v4 flags.
Here's the show-events stream (at least I _think_ it is the one
for the relevant time period):
OBJECT_STATE ncu interface:nge0 -> state offline*, (re)initialized b
OBJECT_STATE ncu interface:nge0 -> state offline*, waiting for IP ad
IF_STATE nge0 -> state (5) flags 1000842
IF_STATE nge0 -> state (5) flags 1000842
IF_STATE nge0 -> state (5) flags 1000843
IF_STATE nge0 -> state (5) flags 2000840
IF_STATE nge0 -> state (5) flags 2000840
IF_STATE nge0 -> state (5) flags 1000803
IF_STATE nge0 -> state (5) flags 2000801
LINK_STATE nge0 -> state down
IF_STATE nge0 -> state (5) flags 1004803
OBJECT_STATE ncu link:nge0 -> state online*, interface/link is down
OBJECT_STATE ncu link:nge0 -> state offline, interface/link is down
OBJECT_STATE ncu interface:nge0 -> state online*, conditions for act
IF_STATE nge0 -> state (5) flags 1004802
OBJECT_STATE ncu interface:nge0 -> state offline, conditions for act
IF_STATE nge0 -> state (5) flags 2000800
At that point, we move back the e1000g0 etc so it seems like we're trying all
the priority groups again.
The v4 interface flags move from
1000842 IFF_IPV4|IFF_MULTICAST|IFF_RUNNING|IFF_BROADCAST to
1000843 IFF_IPV4|IFF_MULTICAST|IFF_RUNNING|IFF_BROADCAST|IFF_UP to
1000803 IFF_IPV4|IFF_MULTICAST|IFF_BROADCAST|IFF_UP to
1004803 IFF_IPV4|IFF_DHCPRUNNING|IFF_MULTICAST|IFF_BROADCAST|IFF_UP to
1004802 IFF_IPV4|IFF_DHCPRUNNING|IFF_MULTICAST|IFF_BROADCAST to
So in other words, the link seems to switch off IFF_RUNNING on the interface
before we can start DHCP. Then IFF_UP is turned off.
What could cause us to lose IFF_RUNNING? Is this a link flap issue made worse
by the User NCP configuration policy enforced by nwamd? I'll investigate
further on Monday, but can I confirm that the Automatic NCP always works? And
is this consistenly reproducible or intermittent? Thanks!
--
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.