[jira] [Comment Edited] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284021#comment-16284021
 ] 

Scott Blum edited comment on SOLR-11423 at 12/8/17 6:52 PM:


I'm fixing solr/CHANGES.txt while cherry-picking into 7x and 7_2.  I'll push a 
change to master to move it when that's done.

Note: the solr/CHANGES.txt that shipped with 7.1 does not erroneously report 
this bugfix being included, it's just master/7x/7_2 that has it wrong.



was (Author: dragonsinth):
I'm fixing solr/CHANGES.txt while cherry-picking into 7x and 7_2.  I'll push a 
change to master to move it when that's done.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283981#comment-16283981
 ] 

Tomás Fernández Löbbe edited comment on SOLR-11423 at 12/8/17 6:37 PM:
---

bq. Sounds good to me. So backport to branch_7_2 and branch_7x?
+1. And lets fix CHANGES.txt
Is this OK [~jpountz]? This is not a bugfix, but as you said, it's in an odd 
state right now, and since it was included in the 7.1 CHANGES I feel we should 
correct ASAP


was (Author: tomasflobbe):
bq. Sounds good to me. So backport to branch_7_2 and branch_7x?
+1. And lets fix CHANGES.txt

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-12-08 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283972#comment-16283972
 ] 

Scott Blum edited comment on SOLR-11423 at 12/8/17 6:21 PM:


Sounds good to me.  So backport to branch_7_2 and branch_7x?


was (Author: dragonsinth):
Sounds good to me.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11423) Overseer queue needs a hard cap (maximum size) that clients respect

2017-10-16 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206976#comment-16206976
 ] 

Cao Manh Dat edited comment on SOLR-11423 at 10/17/17 4:14 AM:
---

Should we modify the behavior here a little bit, instead of throw an 
IllegalStateException() (which can lead to many errors cause this is an 
unchecked exception) should we try first, if the queue is full, retry until 
timeout.
[~dragonsinth] I really want to hear about your cluster status after SOLR-11443 
get applied ( maybe we do not need this hard cap at all if the Overseer can 
process messages fast enough )


was (Author: caomanhdat):
Should we modify the behavior here a little bit, instead of throw an 
IllegalStateException() (which can lead to many errors cause this is an 
unchecked exception) should we try first, if the queue is full, retry until 
timeout.
[~dragonsinth] I really want to hear about your cluster status after SOLR-11443 
get applied.

> Overseer queue needs a hard cap (maximum size) that clients respect
> ---
>
> Key: SOLR-11423
> URL: https://issues.apache.org/jira/browse/SOLR-11423
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Scott Blum
>Assignee: Scott Blum
>
> When Solr gets into pathological GC thrashing states, it can fill the 
> overseer queue with literally thousands and thousands of queued state 
> changes.  Many of these end up being duplicated up/down state updates.  Our 
> production cluster has gotten to the 100k queued items level many times, and 
> there's nothing useful you can do at this point except manually purge the 
> queue in ZK.  Recently, it hit 3 million queued items, at which point our 
> entire ZK cluster exploded.
> I propose a hard cap.  Any client trying to enqueue a item when a queue is 
> full would throw an exception.  I was thinking maybe 10,000 items would be a 
> reasonable limit.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org