Re: Proactively going down in anticipation of high load / bad state

2015-02-17 Thread S G
Hi Erick,

I have submitted a patch at https://issues.apache.org/jira/browse/SOLR-7121
Will add tests to this if the approach is acceptable.

Thanks
Sachin


On Tue, Jan 6, 2015 at 12:26 PM, Erick Erickson 
wrote:

> If you have done the some coding, it's always appropriate to open a JIRA
> and attach the patch for discussion.
>
> Yonik's Law of Patches:
>
> "A half-baked patch in Jira, with no documentation, no tests
> and no backwards compatibility is better than no patch at all."
>
> Even if the approach is shot down, it may spawn alternative approaches
> or stimulate thinking.
>
> Best,
> Erick
>
> On Tue, Jan 6, 2015 at 11:50 AM, S G  wrote:
> > Hi,
> >
> > For a solr cloud, is there a setting that allows a core to proactively go
> > down if its able to detect some temporary issues like high GC, high
> > thread-counts, temporary network slow down etc. ?
> > Currently we see that a node gets in a distributed deadlock because its
> not
> > able to detect such situations.
> >
> > I am exploring Solr code to see if its possible to take some proactive
> > action in such cases.
> > One way could be to have configurable limits for GC time, thread-count,
> > response-time, 5-minute-rate etc. and make a core shut down if it senses
> > problems.
> > Once that happens, a background thread will monitor the trouble causing
> > parameters and recover the downed core when situation improves.
> >
> >
> > My current patch can bring down a core for:
> > 1) High thread-counts,
> > 2) High 95thPcRequestTime,
> > 3) Huge # of heavy queries in a given time.
> >
> > The patch also recovers the core when its health improves.
> >
> >
> > If the above seems doable, then I can create a JIRA for more discussion
> and
> > implementation.
> >
> >
> > Thanks
> > Sachin
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Proactively going down in anticipation of high load / bad state

2015-01-06 Thread Erick Erickson
If you have done the some coding, it's always appropriate to open a JIRA
and attach the patch for discussion.

Yonik's Law of Patches:

"A half-baked patch in Jira, with no documentation, no tests
and no backwards compatibility is better than no patch at all."

Even if the approach is shot down, it may spawn alternative approaches
or stimulate thinking.

Best,
Erick

On Tue, Jan 6, 2015 at 11:50 AM, S G  wrote:
> Hi,
>
> For a solr cloud, is there a setting that allows a core to proactively go
> down if its able to detect some temporary issues like high GC, high
> thread-counts, temporary network slow down etc. ?
> Currently we see that a node gets in a distributed deadlock because its not
> able to detect such situations.
>
> I am exploring Solr code to see if its possible to take some proactive
> action in such cases.
> One way could be to have configurable limits for GC time, thread-count,
> response-time, 5-minute-rate etc. and make a core shut down if it senses
> problems.
> Once that happens, a background thread will monitor the trouble causing
> parameters and recover the downed core when situation improves.
>
>
> My current patch can bring down a core for:
> 1) High thread-counts,
> 2) High 95thPcRequestTime,
> 3) Huge # of heavy queries in a given time.
>
> The patch also recovers the core when its health improves.
>
>
> If the above seems doable, then I can create a JIRA for more discussion and
> implementation.
>
>
> Thanks
> Sachin

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Proactively going down in anticipation of high load / bad state

2015-01-06 Thread S G
Hi,

For a solr cloud, is there a setting that allows a core to proactively go
down if its able to detect some temporary issues like high GC, high
thread-counts, temporary network slow down etc. ?
Currently we see that a node gets in a distributed deadlock because its not
able to detect such situations.

I am exploring Solr code to see if its possible to take some proactive
action in such cases.
One way could be to have configurable limits for GC time, thread-count,
response-time, 5-minute-rate etc. and make a core shut down if it senses
problems.
Once that happens, a background thread will monitor the trouble causing
parameters and recover the downed core when situation improves.


My current patch can bring down a core for:
1) High thread-counts,
2) High 95thPcRequestTime,
3) Huge # of heavy queries in a given time.

The patch also recovers the core when its health improves.


If the above seems doable, then I can create a JIRA for more discussion and
implementation.


Thanks
Sachin