I think I found the cause (not related to us) but I am not sure. 

When switching NCPs, nwam and network/location are refreshed.  So, there 
are two svc.startd that are wedged on a door() call with the pstack that 
has been emailed around a few times.  I checked the pstack of the nscd 
daemon and there seems to be a deadlock (two threads with the same 
following pstack)

bash-3.2# pstack `pgrep nscd`
[...]
-----------------  lwp# 63 / thread# 63  --------------------
 c52e2489 lwp_park (0, 0, 0)
 c52d8ea3 mutex_lock_impl (81570a4, 0, 0, c52d8fd7, 6) + 163
 c52d900a mutex_lock (81570a4, 6, c0f36c98, c501a0aa) + 3f
 c501a128 rpc_fd_lock (8157008, 6, 0, c5013569) + 98
 c5013587 clnt_dg_call (80cee08, 3, c503af6c, c0f36de0, c503b034, 
c0f36df0) + 2f
 c503a8c4 domatch  (80c5dd0, c51d70d8, c0f373f0, 9, 80dd270) + 78
 c503a1e3 __yp_match_cflookup (80c5dd0, c51d70d8, c0f373f0, 9, c0f36ef8, 
c0f36efc) + 147
 c51d65b8 _nss_nis_ypmatch (80c5dd0, c51d70d8, c0f373f0, c0f36ef8, 
c0f36efc, 0) + 38
 c51d666d _nss_nis_lookup (809de28, c0f37070, 1, c51d70d8, c0f373f0, 0) + 35
 c51d3a29 getbyname (809de28, c0f37070, 0, c52d7d9d) + 109
 0806a416 nss_search (c507d268, c4ff99a4, 4, c0f37070) + 762
 c4ffe00b _switch_getipnodebyname_r (c0f373f0, 814b414, 814b428, 2120, 
1a, 3) + 6b
 c4ffcf6b _get_hostserv_inetnetdir_byname (80dd590, c0f371a0, c0f37178, 
c4ff7ccd) + bc3
 c4ff7d84 netdir_getbyname (80dd590, c0f371f8, c0f371f4, c5021fae) + c4
 c5022161 _getclnthandle_timed (c0f373f0) + 1c1
 c5022b8b __rpcb_findaddr_timed (186a4, 2, 80dd540, c0f373f0, c0f372fc, 
0) + 2df
 c5015dde clnt_tp_create_timed (c0f373f0, 186a4, 2, 80dd540, 0, 
c507c000) + 3e
 c50156c7 clnt_create_timed (c0f373f0, 186a4, 2, c5068b0c, 0, 0) + 18f
 c5015529 clnt_create (c0f373f0, 186a4, 2, c5068b0c) + 29
 c5037174 __yp_all_cflookup (80c5dd0, c51d7038, c0f37928, 0) + 1f8
 c51d6a83 _nss_nis_do_all (809d1a8, c0f37ab0, c0f37d50, c526c72d) + 4f
 c51d3472 getbymember (809d1a8, c0f37ab0, 0, 806a483) + 7a
 0806a416 nss_search (0, 80695ec, 6, c0f37ab0) + 762
 0806af9c nss_psearch (c0f37c90, 80000, c0f37c78, 8130748) + f0
 0805d6f3 lookup_int (c0fbbca8, 0, 0, c0f37cb0) + efb
 0805d8b4 nsc_lookup (c0fbbca8, 0, 10, c0f37cb0) + 18
 0806faad lookup   (c0fbbd38, c8, 0, 1) + 13d
 08070049 switcher (deadbeed, c0fbbd38, c8, 0, 0, 806fe3c) + 20d
 c52e7ff0 __door_return () + 60

There was a change made to nscd in snv_127 (our gate is synced with 
this) for the fix to

        6863709 nscd dumps core after receiving SIGHUP

I haven't done a thorough read of the evaluation, but I wonder if this 
is root cause.

Anurag

Reply via email to