[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-24 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073066#comment-14073066
 ] 

Ramkumar Aiyengar commented on SOLR-6261:
-

Forgot to mention, tests pass. Let me know if the changes look good..

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071766#comment-14071766
 ] 

Ramkumar Aiyengar commented on SOLR-6261:
-

Added tests for the leader failover case (original symptoms), and the parallel 
watching functionality. Let me know if this approach works, if so, we have 
three transition approaches:

 * Always have `SolrZkClient` use the new way (probably not a great idea, esp. 
considering this is in SolrJ)
 * Have an option per `SolrZkClient`, this will force all or most uses within 
Solr to use the new approach, but allow external uses to continue as they are
 * The way it currently is, decided on a per-watch basis

I am sort of wavering between the second and third options, opinions welcome..


 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071845#comment-14071845
 ] 

Mark Miller commented on SOLR-6261:
---

I actually kind of like option 1. What is your concern around it being in 
Solrj? I think, at this point, it's pretty unlikely anyone is counting on the 
current behavior - it's generally probably a bug. We have also already treated 
a lot of this at the cloud level as subject to change a bit because a lot of it 
is so early. Depending on the impact, we need some flexibility to get things 
right.

I guess I just don't see a lot of downside or negative impact if we choose 1.

The upside of doing 1 IMO, is that it becomes a lot harder for other/future 
devs to screw up. The default makes it hard to do.

2 is not too bad, but prone to future developers consistently choosing the 
right flag to pass to ensure our zk thread gets to always crank along.

3 is the least preferable to me.

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071933#comment-14071933
 ] 

Ramkumar Aiyengar commented on SOLR-6261:
-

I agree (1) is ideal, and I guess I was just being paranoid since I am not that 
well-versed in how this class is used outside Solr. I am happy to stick to your 
judgement in this case..

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071946#comment-14071946
 ] 

Mark Miller commented on SOLR-6261:
---

I think it's worth considering for sure, but weighing both sides, I think 
enforcing it for all is probably just a really overall beneficial change in 
this case. Getting out of the way of the notification thread without going out 
of your way is great.

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071979#comment-14071979
 ] 

Ramkumar Aiyengar commented on SOLR-6261:
-

Updated for Option (1), tests are still running though..

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-22 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070379#comment-14070379
 ] 

Ramkumar Aiyengar commented on SOLR-6261:
-

Alternative approach using an executor, just a sketch at this point (still 
fails a few tests). It has an `instanceof` which is a bit ugly, but any other 
method to maintain existing behaviour when needed can be used, this was just 
the simplest.. Once we are settled on the approach, we can hunt down other 
stuff using the event thread..

https://github.com/apache/lucene-solr/pull/66/files

(would be nice if commits to a pull showed up here..)

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070397#comment-14070397
 ] 

Mark Miller commented on SOLR-6261:
---

Hmm...that is a very interesting approach. I'll have to spend some time 
thinking about this one.

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070607#comment-14070607
 ] 

Mark Miller commented on SOLR-6261:
---

I really kind of like this idea of just ensuring the zk process thread is 
humming along. The more I think about it, the more I like it.

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-21 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069369#comment-14069369
 ] 

Mark Miller commented on SOLR-6261:
---

Hmm...I'm a little hesitant to fire up a new thread for every one rather than 
use the Update executor or something. Seems like a good step forward though.

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-21 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069428#comment-14069428
 ] 

Ramkumar Aiyengar commented on SOLR-6261:
-

Yeah, I thought of pooling this up as well initially, but then this is really a 
function of number of cores in the instance and a lot of threadpools are a 
function of the number of cores already?

Can still look into changing it..

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-21 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069436#comment-14069436
 ] 

Mark Miller commented on SOLR-6261:
---

I dunno - I struggled with it when I first saw it and quickly got lazy about 
it. Something nicer about it, I think it's best to use pools to spin up 
threads, but I have a hard time worrying about it too much in this case.

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-21 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069438#comment-14069438
 ] 

Mark Miller commented on SOLR-6261:
---

We should look across our process methods and make sure there are not other 
obvious spots we are holding things up.

 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068034#comment-14068034
 ] 

ASF GitHub Bot commented on SOLR-6261:
--

GitHub user andyetitmoves opened a pull request:

https://github.com/apache/lucene-solr/pull/66

Run checkIfIamLeader in a separate thread

Initial patch for 
[SOLR-6261](https://issues.apache.org/jira/browse/SOLR-6261) to run 
`checkIfIAmLeader` in a separate thread, passes all tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr trunk-parallel-leader

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/66.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #66


commit 6b0c98c6462a05c24dbf111450c14e53a447b6d3
Author: Ramkumar Aiyengar andyetitmo...@gmail.com
Date:   2014-07-20T19:08:58Z

Run checkIfIamLeader in a separate thread




 Run checkIfIamLeader in a separate thread
 -

 Key: SOLR-6261
 URL: https://issues.apache.org/jira/browse/SOLR-6261
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.9
Reporter: Ramkumar Aiyengar
Priority: Minor

 Currently checking for leadership (due to the leader's ephemeral node going 
 away) happens in ZK's event thread. If there are many cores and all of them 
 are due leadership, then they would have to serially go through the two-way 
 sync and leadership takeover.
 For tens of cores, this could mean 30-40s without leadership before the last 
 in the list even gets to start the leadership process. If the leadership 
 process happens in a separate thread, then the cores could all take over in 
 parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org