In Watchdog.stop() try the following:

  WatchdogTask task = _task;
  _task = null;

  if (task != null)

-- Scott

On May 14, 2009, at 2:43 PM, Rob Lockstone wrote:

On May 14, 2009, at 14:03, Scott Ferguson wrote:

On May 14, 2009, at 12:57 PM, Rob Lockstone wrote:

Environment: Resin Pro 3.1.9 (100 Server License) on 64-bit Windows
2003/08 Server with Java 1.5_18.

This bug <> is still present in
Resin Pro 3.1.9. I've already updated the bug, but figured I would
post here because I don't know how often bugs are read/updated.

Looking at the code, the socket timeout is only 1s, which is pretty
short.  The timeout is in com.caucho.boot.WatchdogProcess.runInstance

Agreed. The logs indicate that the closingInstance() method is getting called. But look at the time stamps between the time when that happens and the time when it attempts to restart the WatchdogTask (which is when the problem occurs):

[2009/05/14 06:29:58.873] WatchdogProcess[Watchdog[],1] stopping Resin
[2009/05/14 06:30:01.827] java.lang.IllegalStateException: Can't start new task because of old task 'WatchdogTask[Watchdog[]]'

That's about 3 seconds. So I don't think the socket time out is an issue. It does get to the closeInstance() method.

In the closingInstance() method, line 322 is:

        int status = process.exitValue();

So that means it has to wait for the process itself to exit. The destroy() method does a waitFor() on the process, which is reasonable. However, that waitFor() isn't encapsulated by any kind of maximum wait time thread. Java doesn't offer a waitFor() method with a wait time. In code that I've written that uses Processes, I always encapsulate the Process inside a thread and specify a maximum amount of time that I'm willing to wait for the process to end. In theory, this could lead to memory leaks if the process *never* ends. But at least that information can be logged.

But anyway, none of this explains why the specified WatchdogTask isn't being set to null in the Watchdog (it does, but only in the kill() method), so that means the same Watchdog instance is getting re-used. That's where I'm getting a little lost in the code. I see the WatchdogManager, and I see that there are a few places that Watchdog instances get added to the Map, but I don't see where they get removed. Should there be some point where a Watchdog instance gets removed from the WatchdogManager?

Then again, given that resin is getting completely shut down, why is there even a WatchdogManager and list of Watchdogs to worry about? Shouldn't all that disappear along with everything else?


-- Scott

Our deployment system uses the Windows SC (Service Controller)
commands to stop and then start resin. I built in a five second delay
between the time the SC query command notifies me that resin has
stopped and the time that it attempts to start it up again. My
concern, of course, is that five seconds might not always be enough
time for the watchdog to completely exit.

The original reporter of this bug indicates that, on a busy machine,
he has to wait as long as 15 seconds. However, it's unclear to me if
he's confirmed that resin has stopped before initiating the 15 second delay. The five second delay I put in kicks in *after* I've confirmed
that resin has stopped as reported by the sc query command.

Is there any way to know if this five second delay is a legitimate
hack? Waiting for the dev/QA cycle for the 3.1.10 release isn't going to work for us on our time table. I'm looking at the watchdog code now
to see if I can figure out a fix, but I'm not going to be able to
spend too much time on it, I'm afraid.


resin-interest mailing list

resin-interest mailing list

resin-interest mailing list

resin-interest mailing list

Reply via email to