I was helping to look into this with Nick & I think we may have figured out the core of the problem...
The problem is easily reproducible by starting replication on the slave and then sending a shutdown command to tomcat (e.g. catalina.sh stop). With a debugger attached, it looks like the fsyncService thread is blocking VM shutdown because it is created as a non-daemon thread. Essentially what seems to be happening is that the fsyncService thread is running when 'catalina.sh stop' is executed. This goes in and calls SnapPuller.destroy() which aborts the current sync. Around line 517 of the SnapPuller, there is code that is supposed to cleanup the fsyncService thread, but I don't think it is getting executed because the thread that called SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so the JVM ends up shutting that down before it can cleanup the fysncService... So... it seems like: if (fsyncService != null) ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService); could be added around line 1706 of SnapPuller.java, or puller.setDaemon(*false*); could be added around line 230 of ReplicationHandler.java, however this needs some additional work (and I think it might need to be added regardless) since the cleanup code in SnapPuller(around 517) that shuts down the fsync thread never gets execute since logReplicationTimeAndConfFiles() can throw IO exceptions bypassing the rest of the finally block...So the call to logReplicationTimeAndConfFiles() around line 512 would need to get wrapped with a try/catch block to catch the IO exception... I can submit patches if needed... and cross post to the dev mailing list... -Phil