http://defect.opensolaris.org/bz/show_bug.cgi?id=8769


Renee Danson <renee.danson at sun.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |renee.danson at sun.com
             Status|NEW                         |ACCEPTED




--- Comment #1 from Renee Danson <renee.danson at sun.com>  2009-05-08 15:07:42 
---
This comment proceeds on the assumption that Darren is seeing the same
thing I am, which involves the dlpi_recv() loop in dlpi_thread() spinning
wildly.

The dlpi_recv() function is in almost all cases immediately returning with
DLPI_EINHANDLE: invalid DLPI handle.  I've seen the system get into this
state after a couple of NCP switches, and when it's in this state, I see
many more dlpi_thread() threads than I would expect, which suggests that
somehow these aren't getting cleaned up properly.  One possible smoking gun
is this check in nwamd_ncu_handle_disable_event():

        /* If the NCU is not online, return */
        if (ncu_obj->object_state != NWAM_STATE_ONLINE &&
            ncu_obj->object_state != NWAM_STATE_OFFLINE) {
                nwamd_object_unlock(ncu_obj);
                return;
        }

This check happens before the call to dlpi_delete_link(), which would clean
up the dlpi thread.  I don't believe all the phys ncus were in the online
or offline state every time I switched ncp.

Another nit in the code is that I think the guard in dlpi_thread() is not
correct.  That function has the following loop:

        do {
                rc = dlpi_recv(*dh, NULL, NULL, msgbuf, &msglen, -1,
&recvdata);
                if (rc != DLPI_SUCCESS) {
                        failures++;
                } else {
                        nlog(LOG_DEBUG, "dlpi_recv message");
                        failures = 0;
                }
        } while (rc != DLPI_EINVAL || failures > 3);

In this case, rc was never DLPI_EINVAL, and almost never DLPI_SUCCESS.  So
failures ended up being a huge number.  I think that second check should be
'failures < 3'?

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.

Reply via email to