Shane Cruz wrote: > So, with full debug logging turned on, I did see this exception in the > logs right before the restart: > > [13:55:37.603] com.caucho.log.EnvironmentLogger.log > com.caucho.config.ConfigException: OpenSSL can't open > certificate-chain-file '/nfs/certs/mysite.crt' > [13:55:37.603] at com.caucho.vfs.OpenSSLFactory.open(Native Method) > [13:55:37.603] at > com.caucho.vfs.OpenSSLFactory.accept(OpenSSLFactory.java:419) > [13:55:37.603] at com.caucho.server.port.Port.accept(Port.java:813) > [13:55:37.603] at > com.caucho.server.port.TcpConnection.run(TcpConnection.java:495) > [13:55:37.603] at > com.caucho.util.ThreadPool.runTasks(ThreadPool.java:520) > [13:55:37.603] at com.caucho.util.ThreadPool.run(ThreadPool.java:442) > [13:55:37.603] at java.lang.Thread.run(Thread.java:619) > [13:55:37.603] > [13:55:49.109] com.caucho.log.EnvironmentLogger.log Server[myserver1] > starting > > That certificate is getting loaded over NFS. Is there a chance that a > certificate loading failure due to an NFS issue could cause the JVM to > exit? I thought the certificate would just be loaded one time at > startup, but it looks like maybe it accesses it during runtime as well.
Possibly an issue running out of file descriptors? That exception shouldn't cause a restart directly. It would cause that thread to exit, but would also start up a new thread to listen to that port (because it's assuming the current thread is broken for some reason.) But you could get a "can't open" if you run out of file descriptors, and running out of file descriptors can force a restart. -- Scott > > Unfortunately, on a different JVM, there was a crash that doesn't seem > to have the same exception: > > [13:36:03.102] com.caucho.log.EnvironmentLogger.log allocate > PoolItem[jdbc/db1,3340053,com.caucho.sql.ManagedConnectionImpl@744ab820] > [13:36:03.102] com.caucho.log.EnvironmentLogger.log allocate > PoolItem[jdbc/db2,1020267,com.caucho.sql.ManagedConnectionImpl@2a121a07] > [13:36:16.815] com.caucho.log.EnvironmentLogger.log Server[myserver2] > starting > > Scott, what are your thoughts on the certificate issue? To be safe, > we should probably start by not loading the certificate over an NFS share. > > Thanks, > Shane > > On Fri, Feb 11, 2011 at 1:40 PM, Scott Ferguson <[email protected] > <mailto:[email protected]>> wrote: > > Shane Cruz wrote: > > We are running Resin Pro 3.0.25 on RHEL 5.5 and using 64-bit Sun JDK > > 1.6.0_05. Recently, we have started seeing several incidents where > > the Resin JVM seems to just randomly get restarted. There is > nothing > > in the logs to indicate that the JVM was shutdown cleanly or a > restart > > was attempted, the log files just go from displaying regular log > lines > > to displaying the following: > The logging for 4.0 is much more informative. With 3.0 it's a bit > trickier. > > > > [11:24:18.095] com.caucho.log.EnvironmentLogger.log Server[myserver] > > starting > > > > Things that have already been checked: > > > > 1. There doesn’t appear to be a JVM crash as no HotSpot Error log > > files are created as they usually would be. > > > > 2. There are no signs in the sudo logs that anyone is manually > > restarting the JVM. > > > > 3. There are no signs in the logs that Resin is restarting > itself even > > though we have a “min-free-memory” setting of 1M. With higher > values > > of that setting we have seen the JVM get restarted due to low > memory, > > but I am pretty sure logging always indicated that the JVM was > > restarting when this happened before. > > > > 4. We are not using the resin “ping” check that might restart > the JVM > > if it is unresponsive. > > > > 5. Kernel logging is enabled and it doesn't look like the kernel > > is killing it for any reason > > > > It almost seems as if the JVM is just getting a kill -9 and then the > > wrapper script is starting it back up. What is the best way to > track > > down what might be killing the JVM? We are in the process of > testing > > an upgrade to a newer version of the JDK, but I am not very > confident > > that will fix the problem. I am going to try to turn on full Resin > > debug logging, but I thought I would reach out in case anyone > else had > > an idea of how to track this down. Is there a way to wrap the Linux > > kill command to find out if that is being run? Any other > suggestions > > on where to look? > Since a phantom kill is pretty unlikely, I wouldn't spend too much > time > on that theory. > > Since you're not getting a hs_* error, the most likely would be either > something calling System.exit or System.halt, possibly Resin > itself for > something like running out of threads or memory (although, as you > pointed out, that should be logged.) > > Other than that, the restart should only happen if the config files > change (theoretically something like NFS or 'touch' could trigger > that, > but I assume that's not happening.) > > -- Scott > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > resin-interest mailing list > > [email protected] <mailto:[email protected]> > > http://maillist.caucho.com/mailman/listinfo/resin-interest > > > > > > _______________________________________________ > resin-interest mailing list > [email protected] <mailto:[email protected]> > http://maillist.caucho.com/mailman/listinfo/resin-interest > > > ------------------------------------------------------------------------ > > _______________________________________________ > resin-interest mailing list > [email protected] > http://maillist.caucho.com/mailman/listinfo/resin-interest > _______________________________________________ resin-interest mailing list [email protected] http://maillist.caucho.com/mailman/listinfo/resin-interest
