RE: Solr Replication during Tomcat shutdown causes shutdown to hang/fail

Phil Black-Knight Thu, 02 Oct 2014 06:26:21 -0700

I was helping to look into this with Nick & I think we may have figured out
the core of the problem...


The problem is easily reproducible by starting replication on the slave and
then sending a shutdown command to tomcat (e.g. catalina.sh stop).

With a debugger attached, it looks like the fsyncService thread is blocking
VM shutdown because it is created as a non-daemon thread.

Essentially what seems to be happening is that the fsyncService thread is
running when 'catalina.sh stop' is executed. This goes in and calls
SnapPuller.destroy() which aborts the current sync. Around line 517 of the
SnapPuller, there is code that is supposed to cleanup the fsyncService
thread, but I don't think it is getting executed because the thread that
called SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so
the JVM ends up shutting that down before it can cleanup the fysncService...

So... it seems like:

    if (fsyncService != null)
ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService);
could be added around line 1706 of SnapPuller.java,  or

          puller.setDaemon(*false*);
could be added around line 230 of ReplicationHandler.java, however this
needs some additional work (and I think it might need to be added
regardless) since the cleanup code in SnapPuller(around 517) that shuts
down the fsync thread never gets execute since
logReplicationTimeAndConfFiles() can throw IO exceptions bypassing the rest
of the finally block...So the call to
logReplicationTimeAndConfFiles() around line 512 would need to get wrapped
with a try/catch block to catch the IO exception...

I can submit patches if needed... and cross post to the dev mailing list...

-Phil

RE: Solr Replication during Tomcat shutdown causes shutdown to hang/fail

Reply via email to