http://defect.opensolaris.org/bz/show_bug.cgi?id=8769
Renee Danson <renee.danson at sun.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |renee.danson at sun.com
Status|NEW |ACCEPTED
--- Comment #1 from Renee Danson <renee.danson at sun.com> 2009-05-08 15:07:42
---
This comment proceeds on the assumption that Darren is seeing the same
thing I am, which involves the dlpi_recv() loop in dlpi_thread() spinning
wildly.
The dlpi_recv() function is in almost all cases immediately returning with
DLPI_EINHANDLE: invalid DLPI handle. I've seen the system get into this
state after a couple of NCP switches, and when it's in this state, I see
many more dlpi_thread() threads than I would expect, which suggests that
somehow these aren't getting cleaned up properly. One possible smoking gun
is this check in nwamd_ncu_handle_disable_event():
/* If the NCU is not online, return */
if (ncu_obj->object_state != NWAM_STATE_ONLINE &&
ncu_obj->object_state != NWAM_STATE_OFFLINE) {
nwamd_object_unlock(ncu_obj);
return;
}
This check happens before the call to dlpi_delete_link(), which would clean
up the dlpi thread. I don't believe all the phys ncus were in the online
or offline state every time I switched ncp.
Another nit in the code is that I think the guard in dlpi_thread() is not
correct. That function has the following loop:
do {
rc = dlpi_recv(*dh, NULL, NULL, msgbuf, &msglen, -1,
&recvdata);
if (rc != DLPI_SUCCESS) {
failures++;
} else {
nlog(LOG_DEBUG, "dlpi_recv message");
failures = 0;
}
} while (rc != DLPI_EINVAL || failures > 3);
In this case, rc was never DLPI_EINVAL, and almost never DLPI_SUCCESS. So
failures ended up being a huge number. I think that second check should be
'failures < 3'?
--
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.