" That's why it is considered better to crash the program and restart it
for OOME."

In the end aren't you also saying the same thing or I misunderstood
something?

We don't get this issue on master server (indexing). Our real concern is
slave where sometimes (rare) so not an obvious heap config issue but when
it happens our failover doesn't even work (moving to another slave) as
there is no error so I just want a good way to know if there is an OOM and
shift to a failover or just have that server restarted.




On Mon, Oct 20, 2014 at 7:25 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/19/2014 11:32 PM, Ramzi Alqrainy wrote:
> > You can create a script to ping on Solr every 10 sec. if no response,
> then
> > restart it (Kill process id and run Solr again).
> > This is the fastest and easiest way to do that on windows.
>
> I wouldn't do this myself.  Any temporary problem that results in a long
> query time might result in a true outage while Solr restarts.  If OOME
> is a problem, then you can deal with that by providing a program for
> Java to call when OOME occurs.
>
> Sending notification when ping times get excessive is a good idea, but I
> wouldn't make it automatically restart, unless you've got a threshold
> for that action so it only happens when the ping time is *REALLY* high.
>
> The real fix for OOME is to make the heap larger or to reduce the heap
> requirements by changing how Solr is configured or used.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
> Writing a program that has deterministic behavior in an out of memory
> condition is very difficult.  The Lucene devs *have* done this hard work
> in the lower levels of IndexWriter and the specific Directory
> implementations, so that OOME doesn't cause *index corruption*.
>
> In general, once OOME happens, program operation (and in some cases the
> status of the most recently indexed documents) is completely
> undetermined.  We can be sure that the data which has already been
> written to disk will be correct, but nothing beyond that.  That's why it
> is considered better to crash the program and restart it for OOME.
>
> Thanks,
> Shawn
>
>


-- 
Regards,

Salman Akram

Reply via email to