[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-12-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262144#comment-14262144
 ] 

ASF GitHub Bot commented on SOLR-6336:
--

Github user andyetitmoves closed the pull request at:

https://github.com/apache/lucene-solr/pull/81


 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 4.10, Trunk


 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-10-16 Thread Jessica Cheng Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174420#comment-14174420
 ] 

Jessica Cheng Mallet commented on SOLR-6336:


Please let me know if I'm supposed to open a new issue (not sure what the 
policy is).

I'm encountering a bug from this patch where now I'm stuck in a loop making 
getChildren() request to zookeeper with this thread dump:
{quote}
Thread-51 [WAITING] CPU time: 1d 15h 0m 57s
java.lang.Object.wait()
org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, 
ZooKeeper$WatchRegistration)
org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher)
org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation)
org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean)
org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher)
org.apache.solr.cloud.DistributedQueue.getChildren(long)
org.apache.solr.cloud.DistributedQueue.peek(long)
org.apache.solr.cloud.DistributedQueue.peek(boolean)
org.apache.solr.cloud.Overseer$ClusterStateUpdater.run()
java.lang.Thread.run()
{quote}

Looking at the code, I think the issue is that LatchChildWatcher#process always 
sets the event to its member, regardless of its type, but the problem is that 
once an event is set, the await no longer waits. In this state, the while loop 
in getChildren(long), when called with wait being Integer.MAX_VALUE will come 
back, NOT wait at await because event != null, but then it still will not get 
any children.

{quote}
while (true) {
  if (!children.isEmpty()) break;
  watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait);
  if (watcher.getWatchedEvent() != null) {
children = orderedChildren(null);
  }
  if (wait != Long.MAX_VALUE) break;
}
{quote}

I think the fix would be to only set the event in the watcher if the type is a 
NodeChildrenChanged.

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 4.10, Trunk


 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-10-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174514#comment-14174514
 ] 

Shawn Heisey commented on SOLR-6336:


[~mewmewball], because this issue is resolved and you're having what looks to 
you like a related bug, it's standard practice to open a new issue.  Sometimes 
discussion may continue on an issue after it's resolved, typically that would 
be for clarification purposes, to decide whether a new issue should be filed.  
You can link the new issue to this as a related issue after you create it.

I do not understand the cloud/zookeeper internals well enough to know what you 
were saying above.  One day I hope to.


 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 4.10, Trunk


 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-10-16 Thread Jessica Cheng Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174519#comment-14174519
 ] 

Jessica Cheng Mallet commented on SOLR-6336:


Thanks for the clarification [~elyograg]! I'll open a new issue. Thanks!

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 4.10, Trunk


 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-10-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174541#comment-14174541
 ] 

Mark Miller commented on SOLR-6336:
---

Best rule of thumb: if the issue is released, new issue and link it to the 
related one. If it's not released, reopen the issue. 

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 4.10, Trunk


 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-10-16 Thread Jessica Cheng Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174556#comment-14174556
 ] 

Jessica Cheng Mallet commented on SOLR-6336:


Got it! Thanks!

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 4.10, Trunk


 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-08-13 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095461#comment-14095461
 ] 

Ramkumar Aiyengar commented on SOLR-6336:
-

I raised SOLR-6370 to catch such issues proactively.

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.10


 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-08-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090630#comment-14090630
 ] 

ASF GitHub Bot commented on SOLR-6336:
--

GitHub user andyetitmoves opened a pull request:

https://github.com/apache/lucene-solr/pull/81

Cache children as well so that they can be returned when the watcher is 
reused

Fixes an issue with apache/lucene-solr#80 pulled to SOLR-6336. If the first 
`getChildren` actually returns nodes, and the second request happens before the 
watch is fired, currently it will return no children.

The rest of the patch is just minor code cleanup.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr trunk-reuse-latch-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/81.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #81


commit f603e7506d1fe5d956a75cdb13897b1b7af7ac70
Author: Ramkumar Aiyengar andyetitmo...@gmail.com
Date:   2014-08-08T07:44:10Z

Cache children as well so that they can be returned when the watcher is 
reused




 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller

 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-08-08 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090634#comment-14090634
 ] 

Ramkumar Aiyengar commented on SOLR-6336:
-

Fixed a bug with children not being returned when the watcher is reused.

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller

 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-08-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090795#comment-14090795
 ] 

ASF subversion and git services commented on SOLR-6336:
---

Commit 1616771 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1616771 ]

SOLR-6336: Cache children as well so that they can be returned when the watcher 
is reused.

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller

 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-08-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090797#comment-14090797
 ] 

ASF subversion and git services commented on SOLR-6336:
---

Commit 1616772 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1616772 ]

SOLR-6336: Cache children as well so that they can be returned when the watcher 
is reused.

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller

 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089230#comment-14089230
 ] 

ASF GitHub Bot commented on SOLR-6336:
--

GitHub user andyetitmoves opened a pull request:

https://github.com/apache/lucene-solr/pull/80

Reuse watcher in DistributedQueue across peek/take

Initial patch for SOLR-6336. Some more work can probably be done (other 
functions in the queue still do this, and probably tests would be good to check 
this in general), but here's an initial fix which passes tests and fixes 
Jenkins failures currently happening..

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr fix-watch-leak

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/80.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #80


commit d61f5de2056a4ea841c2d637b9def1c2cb8597b0
Author: Ramkumar Aiyengar raiyen...@bloomberg.net
Date:   2014-08-07T12:30:22Z

Reuse watcher in DistributedQueue across peek/take




 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar

 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-08-07 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089527#comment-14089527
 ] 

Ramkumar Aiyengar commented on SOLR-6336:
-

bq. other functions in the queue still do this

That's only {{offer}}, which is best managed by the caller as the watcher 
depends on the arguments passed in. In any case, that's handled reasonably well 
by CollectionsHandler which uses it.

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller

 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-08-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090140#comment-14090140
 ] 

ASF subversion and git services commented on SOLR-6336:
---

Commit 1616655 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1616655 ]

SOLR-6336: DistributedQueue can easily create too many ZooKeeper Watches. 
(closes #80)

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller

 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-08-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090139#comment-14090139
 ] 

ASF subversion and git services commented on SOLR-6336:
---

Commit 1616654 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1616654 ]

SOLR-6336: DistributedQueue can easily create too many ZooKeeper Watches. 
(closes #80)

 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller

 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches

2014-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090164#comment-14090164
 ] 

ASF GitHub Bot commented on SOLR-6336:
--

Github user asfgit closed the pull request at:

https://github.com/apache/lucene-solr/pull/80


 DistributedQueue (and it's use in OCP) leaks ZK Watches
 ---

 Key: SOLR-6336
 URL: https://issues.apache.org/jira/browse/SOLR-6336
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller

 The current {{DistributedQueue}} implementation leaks ZK watches whenever it 
 finds children or times out on finding one. OCP uses this in its event loop 
 and can loop tight in some conditions (when exclusivity checks fail), leading 
 to lots of watches which get triggered together on the next event (could be a 
 while for some activities like shard splitting).
 This gets exposed by SOLR-6261 which spawns a new thread for every parallel 
 watch event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org