http://defect.opensolaris.org/bz/show_bug.cgi?id=10116


Anurag S. Maskey <Anurag.Maskey at Sun.COM> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|CLI                         |ON daemon
             Status|INCOMPLETE                  |CAUSEKNOWN




--- Comment #12 from Anurag S. Maskey <Anurag.Maskey at Sun.COM>  2009-08-04 
13:05:47 ---
Found the cause for the original segfault.  nwamd debug logs would have
definitely made the investigation 100x faster/easier, as the logs would have
indicated when the failure occurred.

This segfault doesn't happen if nwamd is running normally.  It only happens in
the case on upgrade (from phase 0/0.5).  On first run after upgrade,
manifest-import hasn't imported the phase 1 manifest; so nwamd runs as root. 
We have a special case in main() to handle this (sets up the signal handling
and pause()s).  There is no event handling thread nor the event queues.  When
manifest-import imports the new manifest, it refreshes nwam (ie., sends a
SIGHUP), which causes a refresh, where nwamd exits.  svc.configd(1M) will start
nwamd again, this time correctly.

The problem lies when nwamd is running as root and SIGTERM is received (rather
than SIGHUP as mentioned above).  What happens in this case is that
graceful_shutdown() tries to enqueue FINI and SHUTDOWN events (the NoNet and
Automatic location have already been created by network/location).  BUT, there
are no event queues to enqueue these events to.  Thus, the SIGSEV.

The question now is: what is causing sending the SIGTERM to nwamd at this time? 

I think the solution is to generalize the special case for handling SIGHUP
during upgrade to all signals.  That way, any signal will cause nwamd to
restart.  If the new manifest hasn't been imported, then we repeat and wait for
the import followed by refresh.

In code words, something like:

@@ -145,6 +145,30 @@
 static void *
 sighandler(void *arg)
 {
        int sig;
        while (!shutting_down) {
                sig = sigwait(&sigwaitset);
                nlog(LOG_DEBUG, "signal %s caught", strsignal(sig));
+
+               /*
+                * When manifest-import imports the Phase 1 manifest, it
refreshes
+                * NWAM.  The NWAM Phase 1 properties must be available.  If
not,
+                * NWAM was refreshed too soon, fail.
+                */
+               if (nwamd_lookup_count_property(OUR_FMRI, OUR_PG,
+                   OUR_NCU_WAIT_TIME_PROP_NAME, &propval) != 0)
+                       pfail("Warning: Phase 1 properties not available. 
Failing...");
+               
+               /*
+                * Now that we're certain the phase 1 manifest has been
imported,
+                * we check the version property.  If it does not yet exist,
that
+                * means the upgrade from phase 0/0.5 has not happened; we need
to
+                * exit so that nwamd can be restarted and perform the upgrade.
+                */
+               if (nwamd_lookup_count_property(OUR_FMRI, OUR_PG,
OUR_VERSION_PROP_NAME,
+                   &propval) != 0) {
+                       nlog(LOG_ERR, "Warning: Phase 1 properties available, "
+                            "but NWAM has not been upgraded.  "
+                            "Exiting to let svc.startd(1M) restart NWAM");
+                       exit(EXIT_FAILURE);
+               }
+
                switch (sig) {
                case SIGALRM:
                        /*
@@ -295,28 +319,7 @@
 static void
 nwamd_refresh(void)
 {
        uint64_t propval;

-       /*
-        * When manifest-import imports the Phase 1 manifest, it refreshes
-        * NWAM.  The NWAM Phase 1 properties must be available.  If not,
-        * NWAM was refreshed too soon, fail.
-        */
-       if (nwamd_lookup_count_property(OUR_FMRI, OUR_PG,
-           OUR_NCU_WAIT_TIME_PROP_NAME, &propval) != 0)
-               pfail("Warning: Phase 1 properties not available. 
Failing...");

-       /*
-        * Now that we're certain the phase 1 manifest has been imported,
-        * we check the version property.  If it does not yet exist, that
-        * means the upgrade from phase 0/0.5 has not happened; we need to
-        * exit so that nwamd can be restarted and perform the upgrade.
-        */
-       if (nwamd_lookup_count_property(OUR_FMRI, OUR_PG,
OUR_VERSION_PROP_NAME,
-           &propval) != 0) {
-               nlog(LOG_ERR, "Warning: Phase 1 properties available, "
-                   "but NWAM has not been upgraded.  "
-                   "Exiting to let svc.startd(1M) restart NWAM");
-               exit(EXIT_FAILURE);
-       }

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.

Reply via email to