Re: OOM spreads to other replica's/HA when OOM

Shawn Heisey Mon, 18 Dec 2017 08:38:03 -0800

On 12/18/2017 9:01 AM, Susheel Kumar wrote:
> Any thoughts on how one can provide HA in these situations.


As I have said already a couple of times today on other threads, there
are *exactly* two ways to deal with OOME.  No other solution is possible.

1) Configure the system to allow the process to access more of the
resource that it's running out of.  This is typically the solution that
people will utilize.  In your case, you would need to make the heap larger.

2) Change the configuration or the environment so fewer resources are
required.

OOME is special.  It is a problem that all the high availability steps
in the world cannot protect you from, for precisely the reasons that
Emir and I have described.  You must ensure that Solr is set up so there
are enough resources that OOME cannot occur.

I can see a general argument for making it possible to configure or
disable any retry mechanism in SolrCloud, but that is not the solution
here.  It would most likely only *delay* the problem to a later query. 
The OOME itself must be fixed, using one of the two solutions already
outlined.

Thanks,
Shawn

Re: OOM spreads to other replica's/HA when OOM

Reply via email to