On Thu, Nov 12, 2009 at 01:38:17PM -0500, Anurag S. Maskey wrote:
> 
> >>1. add project=default entries to the netadm/netcfg
> >>users in /etc/user_attr. Needed to prevent startd
> >>(the SMF master restarter) getting hung in project
> >>lookups to nsswitch-specified name services as
> >>discussed in getdefaultproj(3PROJECT). When
> >>methods are run as non-root user (such as nwam's
> >>methods) the default project is looked up for that user
> >>and we can get snared in nameservice lookups.
> >>
> >>2. to prevent nis/client is going into maintenance due to being
> >>restarted too quickly, we may need to use smf_kill_contract
> >>to empty the contract rather than the simple :kill stop method.
> >>
> >>3. copy /etc/nsswitch.files to /etc/nsswitch.conf during do_ns()
> >>as part of location application prior to disabling name services.
> >>This prevents us from having references to nis in /etc/nsswitch.conf
> >>when nis/client is disabled.
> >>
> >>7. When activating locations, if a manual location is specified
> >>it is activated regardless of whether any IP NCUs are available.
> >>We should revert to activating NoNet in that case.
> >So, can we get a test build with items 1, 2, 3, and 7 in it, and see
> >how that works?  Or has that combination already been tested?
> My testing shows that 1,2,3,7 does not work. I rebooted. Then after
> logging in waited 10 minutes doing nothing. Then, I switched NCPs
> and nwam and network/location got stuck.
> 
> I had not seen nis/client go into maintenance, so I testing both
> with and without 3 and the bug exists in both cases.
> 
> The *important* stuff here is to wait until all refreshes to
> network/location and any related refresh/restarts have been done.

Okay, let me make sure I understand what's happening.

You boot up, with the User location enabled (and set to enable NIS).

The system comes up, and things get configured and set up as expected.

Then you switch NCPs.  This results in everything getting wedged as
described below.

Do you think disabling the location before the NCP switch would help?
It doesn't seem unreasonable, as we know we're going to be tearing
everything down at that point.  And it would take NIS out of the picture,
which seems like it would help.  What do you think?

-renee

> Services get stuck like this:
> 
> bash-3.2# svcs -p nis/client nwam location
> STATE STIME FMRI
> online 17:32:38 svc:/network/nis/client:default
> 17:28:32 109823 ypbind
> 17:33:27 110321 ypbind
> online* 17:33:02 svc:/network/physical:nwam
> 17:20:50 103347 nwamd
> 17:33:02 110319 svc.startd
> online* 17:33:02 svc:/network/location:default
> 17:33:02 110320 svc.startd
> 
> Both svc.startd have the same pstack as below:
> 
> bash-3.2# pstack 110319
> 110324: /lib/svc/bin/svc.startd
> c52f61aa door (3, c38bec20, 0, 0, 0, 3)
> c5284a40 _nsc_try1door (c5385f48, c38bed54, c38bed58, c38bed5c,
> c38bec9c, c38bee00) + 64
> c5284d8e _nsc_trydoorcall_ext (c38bed54, c38bed58, c38bed5c, c52946d5) + 236
> c5294766 _nsc_search (c5385d28, c527b938, 6, c38bee00) + b6
> c5293212 nss_search (c5385d28, c527b938, 6, c38bee00) + 2a
> c527bed1 _getgroupsbymember (8193688, 890dda8, 10, 1) + 9d
> c5286906 initgroups (8193688, 41, 8369208, c4e79042) + 66
> c4e79279 restarter_set_method_context (811ed08, c38beecc, c38beed8,
> 8079772) + 245
> 08070392 exec_method (81f0cc0, 2, 89089e0, 811ed08, 0, 1) + 86
> 08070b1e method_run (c38befa8, 2, c38befac, 80712f9) + 3ee
> 080713c8 method_thread (8927140, c5385000, c38befe8, c52f10da) + 184
> c52f112f _thrp_setup (c3a08a00) + 9b
> c52f13b0 _lwp_start (c3a08a00, 0, 0, 0, 0, 0)
> 
> Then nwam and network/location go into maintenance. Then clearing
> both services brings things back to normal.
> 
> Anurag
> 
> 

Reply via email to