On 10/21/2014 1:24 AM, Salman Akram wrote: > Yes so the most imp thing is what's the best way to 'know' that there is > OOM? Some script of a ping with 1-2 mins time?
To touch on both your question and that posed by Toke Eskildsen: Java itself has a configuration option to call a program or script when OOME occurs, the idea is that this script should kill the application and start it back up. Lucene and the way it builds indexes have built-in protections that should keep the index from becoming corrupt when the app is killed at an unknown location. Even relatively simple applications tend to have several layers, complicated ones may have dozens or hundreds of layers. Lucene and Solr are not simple. Dealing with all possible fallout from OOME in application code is *hard*. It involves extra code and careful planning. Engineering a safe exit from the entire application is even harder, something that even an experienced programmer who's in charge of the entire application might not be able to easily do. I see a number of try/catch cases in the code where a Throwable is trapped, rather than an Exception. This means that OOME will not result in the entire program dying. We might want OOME to result in the program dying ... but getting there will involve a lot of tedious work examining existing code to determine what errors are possible and how to handle each one specifically, allowing OOME to bubble up to the point where it can kill the app. Even then, it is likely to only kill Solr, not the servlet container ... which means that it probably won't restart without the OOM config option on the JRE. > The reason I want auto restart or at least some error (so that it can > switch to another slave) is I want to have a good sleep if something goes > wrong at night so that the systems keep on working and can look into > details in the morning. That's the whole purpose of having a fail over > implemented. > > On a side node the instance where we had this OOM didn't have an explicit > Xmx set (on 64 bit Windows) so in that case is there some default max? > There was ample mem available so why would it throw OOM? The default max heap is dependent on the specific java implementation -- whether it's 32 bit or 64 bit, whether it's a client JVM or a server JVM, etc. And it will usually depend on how much memory the system has installed, too. The first answer on this SO question will let you find out what the default is for your system: http://stackoverflow.com/questions/4667483/how-is-the-default-java-heap-size-determined If you're getting OOME, then some aspect of your memory configuration wasn't large enough for your index, configuration, or query pattern. One thing that can get exceeded when everything looks like it should be fine is PermGen. An error stacktrace was never included on this thread, so we have no idea exactly what kind of error we're dealing with. Thanks, Shawn