On 10/21/2014 1:24 AM, Salman Akram wrote:
> Yes so the most imp thing is what's the best way to 'know' that there is
> OOM? Some script of a ping with 1-2 mins time?

To touch on both your question and that posed by Toke Eskildsen:

Java itself has a configuration option to call a program or script when
OOME occurs, the idea is that this script should kill the application
and start it back up.  Lucene and the way it builds indexes have
built-in protections that should keep the index from becoming corrupt
when the app is killed at an unknown location.

Even relatively simple applications tend to have several layers,
complicated ones may have dozens or hundreds of layers.  Lucene and Solr
are not simple.  Dealing with all possible fallout from OOME in
application code is *hard*.  It involves extra code and careful
planning.  Engineering a safe exit from the entire application is even
harder, something that even an experienced programmer who's in charge of
the entire application might not be able to easily do.

I see a number of try/catch cases in the code where a Throwable is
trapped, rather than an Exception.  This means that OOME will not result
in the entire program dying.  We might want OOME to result in the
program dying ... but getting there will involve a lot of tedious work
examining existing code to determine what errors are possible and how to
handle each one specifically, allowing OOME to bubble up to the point
where it can kill the app.  Even then, it is likely to only kill Solr,
not the servlet container ... which means that it probably won't restart
without the OOM config option on the JRE.

> The reason I want auto restart or at least some error (so that it can
> switch to another slave) is I want to have a good sleep if something goes
> wrong at night so that the systems keep on working and can look into
> details in the morning. That's the whole purpose of having a fail over
> implemented.
> 
> On a side node the instance where we had this OOM didn't have an explicit
> Xmx set (on 64 bit Windows) so in that case is there some default max?
> There was ample mem available so why would it throw OOM?

The default max heap is dependent on the specific java implementation --
whether it's 32 bit or 64 bit, whether it's a client JVM or a server
JVM, etc.  And it will usually depend on how much memory the system has
installed, too.  The first answer on this SO question will let you find
out what the default is for your system:

http://stackoverflow.com/questions/4667483/how-is-the-default-java-heap-size-determined

If you're getting OOME, then some aspect of your memory configuration
wasn't large enough for your index, configuration, or query pattern.
One thing that can get exceeded when everything looks like it should be
fine is PermGen.  An error stacktrace was never included on this thread,
so we have no idea exactly what kind of error we're dealing with.

Thanks,
Shawn

Reply via email to