Quoth Dave Challis on Thu, Nov 22, 2007 at 07:35:55AM -0800:
> 1. Periodically (I'm still looking into why this is happening), apache
> dies, leaving behind a defunct process.  SMF sees that the apache
> process isn't running, so tries to restart it.  It seems to be unable
> to kill the defunct process as part of its stop method though.
> 
> Looking in /var/svc/log/network-http-CSWapache2, there were the messages:
> [ Nov 22 03:10:03 Stopping because service restarting. ]
> [ Nov 22 03:10:03 Executing stop method ("/lib/svc/method/http-CSWapache2 
> stop") ]
> [ Nov 22 03:10:04 Method "stop" exited with status 0 ]
> [ Nov 22 03:11:04 Method or service exit timed out.  Killing contract 108 ]
> [ Nov 22 03:11:05 Method or service exit timed out.  Killing contract 108 ]
> [ Nov 22 03:11:06 Method or service exit timed out.  Killing contract 108 ]
> This message was repeated every second in the log for several hours.
...
> Using the 'ps' command then showed process 3158 as a defunct apache
> process, which I'm guessing SMF wasn't able to kill.
> 
> After removing the defunct process with 'preap 3158',  then the
> messages about contract 108 in the log files stopped.

I don't think we've seen this problem before.  My best guess right now
is that when the process died the kernel reparented it to init, which is
supposed to reap it, but yours isn't working.  Try "ps -lp 1" to verify
that your init is still running.

> 2. After fixing the problem mentioned above, I'm unable to clear the
> maintenance state on this service.
> 
> I manually set the maintenance state using:
> bash-3.00# svcadm -v mark maintenance http-CSWapache2
> Action maint_on set for svc:/network/http-CSWapache2:CSWapache2.
> 
> I then tried to clear this state using:
> bash-3.00# svcadm -v clear http-CSWapache2
> Action maint_off set for svc:/network/http-CSWapache2:CSWapache2.
> 
> However, if I use svcs to report on the state of this service, it
> still reports it as in maintenance:
> bash-3.00# svcs -xv http-CSWapache2
> svc:/network/http-CSWapache2:CSWapache2 (Apache 2 HTTP server)
>  State: maintenance since 22 November 2007 12:28:35 GMT
> Reason: Maintenance requested by an administrator.
>    See: http://sun.com/msg/SMF-8000-63
>    See: man -M /opt/csw/apache2/man -s 8 httpd
> Impact: This service is not running.
> 
> How can I force svcadm to clear this state?

I presume the log didn't show that the service was cleared and went
right back into maintenance.  In which case the svcadm clear probably
didn't change the time on the State: line, which indicates that
svc.startd isn't responding to your requests.  In that case, the output
of "echo ::startd_log | mdb -p `pgrep startd`" might be useful.


David

Reply via email to