[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073066#comment-14073066 ] Ramkumar Aiyengar commented on SOLR-6261: - Forgot to mention, tests pass. Let me know if the changes look good.. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071766#comment-14071766 ] Ramkumar Aiyengar commented on SOLR-6261: - Added tests for the leader failover case (original symptoms), and the parallel watching functionality. Let me know if this approach works, if so, we have three transition approaches: * Always have `SolrZkClient` use the new way (probably not a great idea, esp. considering this is in SolrJ) * Have an option per `SolrZkClient`, this will force all or most uses within Solr to use the new approach, but allow external uses to continue as they are * The way it currently is, decided on a per-watch basis I am sort of wavering between the second and third options, opinions welcome.. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071845#comment-14071845 ] Mark Miller commented on SOLR-6261: --- I actually kind of like option 1. What is your concern around it being in Solrj? I think, at this point, it's pretty unlikely anyone is counting on the current behavior - it's generally probably a bug. We have also already treated a lot of this at the cloud level as subject to change a bit because a lot of it is so early. Depending on the impact, we need some flexibility to get things right. I guess I just don't see a lot of downside or negative impact if we choose 1. The upside of doing 1 IMO, is that it becomes a lot harder for other/future devs to screw up. The default makes it hard to do. 2 is not too bad, but prone to future developers consistently choosing the right flag to pass to ensure our zk thread gets to always crank along. 3 is the least preferable to me. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071933#comment-14071933 ] Ramkumar Aiyengar commented on SOLR-6261: - I agree (1) is ideal, and I guess I was just being paranoid since I am not that well-versed in how this class is used outside Solr. I am happy to stick to your judgement in this case.. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071946#comment-14071946 ] Mark Miller commented on SOLR-6261: --- I think it's worth considering for sure, but weighing both sides, I think enforcing it for all is probably just a really overall beneficial change in this case. Getting out of the way of the notification thread without going out of your way is great. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071979#comment-14071979 ] Ramkumar Aiyengar commented on SOLR-6261: - Updated for Option (1), tests are still running though.. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070379#comment-14070379 ] Ramkumar Aiyengar commented on SOLR-6261: - Alternative approach using an executor, just a sketch at this point (still fails a few tests). It has an `instanceof` which is a bit ugly, but any other method to maintain existing behaviour when needed can be used, this was just the simplest.. Once we are settled on the approach, we can hunt down other stuff using the event thread.. https://github.com/apache/lucene-solr/pull/66/files (would be nice if commits to a pull showed up here..) Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070397#comment-14070397 ] Mark Miller commented on SOLR-6261: --- Hmm...that is a very interesting approach. I'll have to spend some time thinking about this one. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070607#comment-14070607 ] Mark Miller commented on SOLR-6261: --- I really kind of like this idea of just ensuring the zk process thread is humming along. The more I think about it, the more I like it. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069369#comment-14069369 ] Mark Miller commented on SOLR-6261: --- Hmm...I'm a little hesitant to fire up a new thread for every one rather than use the Update executor or something. Seems like a good step forward though. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069428#comment-14069428 ] Ramkumar Aiyengar commented on SOLR-6261: - Yeah, I thought of pooling this up as well initially, but then this is really a function of number of cores in the instance and a lot of threadpools are a function of the number of cores already? Can still look into changing it.. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069436#comment-14069436 ] Mark Miller commented on SOLR-6261: --- I dunno - I struggled with it when I first saw it and quickly got lazy about it. Something nicer about it, I think it's best to use pools to spin up threads, but I have a hard time worrying about it too much in this case. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069438#comment-14069438 ] Mark Miller commented on SOLR-6261: --- We should look across our process methods and make sure there are not other obvious spots we are holding things up. Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread
[ https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068034#comment-14068034 ] ASF GitHub Bot commented on SOLR-6261: -- GitHub user andyetitmoves opened a pull request: https://github.com/apache/lucene-solr/pull/66 Run checkIfIamLeader in a separate thread Initial patch for [SOLR-6261](https://issues.apache.org/jira/browse/SOLR-6261) to run `checkIfIAmLeader` in a separate thread, passes all tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-parallel-leader Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/66.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #66 commit 6b0c98c6462a05c24dbf111450c14e53a447b6d3 Author: Ramkumar Aiyengar andyetitmo...@gmail.com Date: 2014-07-20T19:08:58Z Run checkIfIamLeader in a separate thread Run checkIfIamLeader in a separate thread - Key: SOLR-6261 URL: https://issues.apache.org/jira/browse/SOLR-6261 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.9 Reporter: Ramkumar Aiyengar Priority: Minor Currently checking for leadership (due to the leader's ephemeral node going away) happens in ZK's event thread. If there are many cores and all of them are due leadership, then they would have to serially go through the two-way sync and leadership takeover. For tens of cores, this could mean 30-40s without leadership before the last in the list even gets to start the leadership process. If the leadership process happens in a separate thread, then the cores could all take over in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org