On 12/18/2017 9:01 AM, Susheel Kumar wrote: > Any thoughts on how one can provide HA in these situations.
As I have said already a couple of times today on other threads, there are *exactly* two ways to deal with OOME. No other solution is possible. 1) Configure the system to allow the process to access more of the resource that it's running out of. This is typically the solution that people will utilize. In your case, you would need to make the heap larger. 2) Change the configuration or the environment so fewer resources are required. OOME is special. It is a problem that all the high availability steps in the world cannot protect you from, for precisely the reasons that Emir and I have described. You must ensure that Solr is set up so there are enough resources that OOME cannot occur. I can see a general argument for making it possible to configure or disable any retry mechanism in SolrCloud, but that is not the solution here. It would most likely only *delay* the problem to a later query. The OOME itself must be fixed, using one of the two solutions already outlined. Thanks, Shawn