[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009179#comment-15009179
 ] 

ASF subversion and git services commented on SOLR-7569:
---

Commit 1714844 from [~noble.paul] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1714844 ]

SOLR-7569 test failure fix

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
>  Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-17 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009186#comment-15009186
 ] 

Mike Drob commented on SOLR-7569:
-

bq. Can we close this now, and create new JIRAs for future enhancements? Mark 
Miller, Shalin Shekhar Mangar, Noble Paul?
I agree with this.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
>  Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009164#comment-15009164
 ] 

ASF subversion and git services commented on SOLR-7569:
---

Commit 1714842 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1714842 ]

SOLR-7569 test failure fix

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
>  Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009213#comment-15009213
 ] 

Mark Miller commented on SOLR-7569:
---

This was only reopened because the test was ignored due to reverting SOLR-7989. 
With that resolved, this should be fine.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
>  Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-16 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006835#comment-15006835
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


Can we close this now, and create new JIRAs for future enhancements? 
[~mark.mil...@oblivion.ch], [~shalinmangar], [~noble.paul]?

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
>  Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-12 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002251#comment-15002251
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


bq. I've taken a crack at making SOLR-7989 work.
Thanks!

bq. Perhaps the last thing the API should do is run through each shard and see 
if the registered leader is DOWN, and if it is make it ACTIVE (preferably by 
asking it to publish itself as ACTIVE - we don't want to publish for someone 
else). If the call waits around to make sure all the leaders come up, this 
should be simple.
This makes sense. I think this is something that Shalin alluded to (please 
excuse me if I'm mistaken) when he said, {{1. Leader is live but 'down' -> mark 
it 'active'}}. The suggestion for the replicas to mark themselves ACTIVE 
instead of someone else marking them down seems like a good thing to do.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
>  Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000726#comment-15000726
 ] 

ASF subversion and git services commented on SOLR-7569:
---

Commit 1713899 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1713899 ]

SOLR-7989, SOLR-7569: Ignore this test.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
>  Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000753#comment-15000753
 ] 

Mark Miller commented on SOLR-7569:
---

A better approach is probably for this API to deal with a DOWN but valid leader 
itself. It should only ever happen due to manually screwing up LIR and if this 
API is messing with LIR, it should also fix the ramifications.

Perhaps the last thing the API should do is run through each shard and see if 
the registered leader is DOWN, and if it is make it ACTIVE (preferably by 
asking it to publish itself as ACTIVE - we don't want to publish for someone 
else). If the call waits around to make sure all the leaders come up, this 
should be simple.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
>  Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000722#comment-15000722
 ] 

ASF subversion and git services commented on SOLR-7569:
---

Commit 1713898 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1713898 ]

SOLR-7989, SOLR-7569: Ignore this test.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
>  Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000797#comment-15000797
 ] 

Mark Miller commented on SOLR-7569:
---

bq. It should only ever happen due to manually screwing up LIR and if this API 
is messing with LIR

Down the road though, we will want to solve this for SOLR-7034 and SOLR-7065.

I've taken a crack at making SOLR-7989 work.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
>  Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-05 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991913#comment-14991913
 ] 

Mark Miller commented on SOLR-7569:
---

I'm kind of split on where it should go. For something simple and brute force 
like this, CollectionsHandler is probably fine. Either way seems ok.

I wouldn't really worry about it being async if it stays in CollectionsHandler.


> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992322#comment-14992322
 ] 

ASF subversion and git services commented on SOLR-7569:
---

Commit 1712854 from [~noble.paul] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1712854 ]

SOLR-7569: A collection API called FORCELEADER when all replicas in a shard are 
down

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992304#comment-14992304
 ] 

ASF subversion and git services commented on SOLR-7569:
---

Commit 1712851 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1712851 ]

SOLR-7569: A collection API called FORCELEADER when all replicas in a shard are 
down

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992310#comment-14992310
 ] 

ASF subversion and git services commented on SOLR-7569:
---

Commit 1712852 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1712852 ]

SOLR-7569: changed message

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-04 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989439#comment-14989439
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


One down side of not having something like OVERRIDELASTPUBLISHED is that in the 
test, I couldn't set the last published to DOWN and check if it was set back to 
ACTIVE by the FORCELEADER. In this updated patch with 
FORCEPREPAREFORLEADERSHIP, the test has no easy way of setting the last 
published to down before the API command is called. Not a deal breaker, but 
just putting it out there. I'm personally fine either ways (or if there's 
another name that is more suitable).

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-11-04 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989269#comment-14989269
 ] 

Noble Paul commented on SOLR-7569:
--

Let's not keep the core admin command as OVERRIDELASTPUBLISHED. This means it 
can be a generic enough API which may be abused by others for other things. 
Let's not tell others what we are doing internally and keep the command name 
opaque

This particular collection admin operation does not really have to  go to 
overseer, it can be performed by the receiving node itself because the clearing 
of LIR node does not have to be done at overseer anyway

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-10-29 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980440#comment-14980440
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


bq. It seems like what we really want is to make sure the last published state 
for each replica does not prevent it from becoming the leader?
It seems to me that there's no easy way to set the last published state of a 
replica without the replicas doing it themselves. Do you think we should be 
doing that instead of marking them as active? Or do you think that just 
clearing the LIR is enough?

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-10-29 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980581#comment-14980581
 ] 

Mark Miller commented on SOLR-7569:
---

There are two main things I think that prevent replicas from becoming a leader 
- if there last published state on the clouddescriptor is not ACTIVE or LIR. I 
thought we would want to clear LIR and perhaps add an ADMIN command that will 
set the last published state on the clouddescriptor to ACTIVE for each replica.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-10-23 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971221#comment-14971221
 ] 

Shalin Shekhar Mangar commented on SOLR-7569:
-

Thanks Ishan but I think you missed the test in your latest patch? Its size has 
decreased from 36kb to 8kb.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-10-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971226#comment-14971226
 ] 

Mark Miller commented on SOLR-7569:
---

It seems like what we really want is to make sure the last published state for 
each replica does not prevent it from becoming the leader?

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-10-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971223#comment-14971223
 ] 

Mark Miller commented on SOLR-7569:
---

bq.  // Marking all live nodes as active.

We do we do this manually like this? Shouldn't we allow this to happen 
naturally?

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-10-23 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971406#comment-14971406
 ] 

Shalin Shekhar Mangar commented on SOLR-7569:
-

bq. It seems like what we really want is to make sure the last published state 
for each replica does not prevent it from becoming the leader?

Do you mean that removing blockers like LIR is enough?

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-10-20 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964979#comment-14964979
 ] 

Shalin Shekhar Mangar commented on SOLR-7569:
-

Thanks Ishan. 

# ForceLeaderTest.testReplicasInLIRNoLeader has a 5 second sleep, why? Isn't 
waitForRecoveriesToFinish() enough?
# Similarly, ForceLeaderTest.testLeaderDown has a 15 second sleep for steady 
state to be reached? What is this steady state, is there a better way than 
waiting for an arbitrary amount of time? In general, Thread.sleep should be 
avoided as much as possible as a way to reach steady state.
# Can you please add some javadocs on the various test methods describing the 
scenario that they are test? 
# minor nit - can you use assertEquals when testing equality of state etc 
instead of assertTrue. The advantage with assertEquals is that it logs the 
mismatched values in the exception messages.
# In OverseerCollectionMessageHandler, lirPath can never be null. The lir path 
should probably be logged in debug rather than INFO.
{code}
// Clear out any LIR state
  String lirPath = 
overseer.getZkController().getLeaderInitiatedRecoveryZnodePath(collection, 
sliceId);
  if (lirPath != null && zkStateReader.getZkClient().exists(lirPath, true)) 
{
StringBuilder sb = new StringBuilder();
zkStateReader.getZkClient().printLayout(lirPath, 4, sb);
log.info("Cleaning out LIR data, which was: " + sb);
zkStateReader.getZkClient().clean(lirPath);
  }
{code}
# There's no need to send an empty string as the role while publishing the 
state of the replica.
# minor nit - you can compare enums directly using == instead of .equals
# Referring to the following, what is the thinking behind it? when can this 
happen? is there a test which specifically exercises this scenario? seems like 
this can interfere with the leader election if the leader election was taking 
some time?
{code}
// If we still don't have an active leader by now, it maybe possible that the 
replica at the head of the election queue
  // was the leader at some point and never left the queue, but got marked 
as down. So, if the election queue is not empty,
  // and the replica at the head of the queue is live, then mark it as a 
leader.
{code}

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-19 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876938#comment-14876938
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


bq. what happened to the idea of allowing the user to pick the leader as part 
of the recover shard request?
I had a look at your patch for SOLR-6236, I think we can tackle this using that 
approach. At this point, I'm inclined to keep this patch at this and tackle it 
separately. Most likely, the system will pick a reasonable leader, and it will 
sync with other replicas and the shard will be restored.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877089#comment-14877089
 ] 

Mark Miller commented on SOLR-7569:
---

I wonder if Recover is the right terminology. It seems so broad and "fix 
anything" like. Perhaps it should be something close to 'forceleader' - 
something that is specific about what is happening and gives an idea that you 
are overriding the system as you are.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-19 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877101#comment-14877101
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


I had the same dilemma while naming this. Recover does seem like it will fix 
things if anything is broken, which can be misleading since at this time we 
aren't doing anything other than helping fix the LIR state to bring the shard 
back up.
On the other hand, I am not sure about force leader, because we aren't really 
forcing a leader, but just paving things for an election to happen. I'm really 
not totally sure either way.

How about keeping this as recover shard, documenting this as an advanced API 
which can potentially cause data loss, and then later add whatever else we need 
to recover the system from to this API itself?

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877114#comment-14877114
 ] 

Mark Miller commented on SOLR-7569:
---

That sounds reasonable as long as we have good doc warning about it.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877117#comment-14877117
 ] 

Mark Miller commented on SOLR-7569:
---

bq. because we aren't really forcing a leader, but just paving things for an 
election to happen. 

I guess it comes down to how you want to think about. When you use this, it 
will be because the system is blocking a leader from taking over. By running 
this API command, you remove the blocks, thus 'forcing' a leader the system 
would not normally pick - or at least attempting to force a leader the system 
would not really pick. It depends on if you want to get bogged down in 
implementation or design.

I think your proposal is fine though.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-19 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877156#comment-14877156
 ] 

Timothy Potter commented on SOLR-7569:
--

+1 on FORCE_LEADER for the name of the action.

bq. was stuck with "address already in use" exception
You should have access to the SocketProxy if you need to close it down before 
trying to restart the original leader's Jetty. If not, we should fix that.

bq. I'm inclined to keep this patch at this and tackle it separately

sounds good ... may not ever be needed in the wild with the solution you've 
created here ;-)

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740864#comment-14740864
 ] 

Mark Miller commented on SOLR-7569:
---

bq.  what happened to the idea of allowing the user to pick the leader as part 
of the recover shard request?

As long as it's optional and documented so that users understand the risks, 
it's probably okay. But, I think in most cases the system will beat most users 
in most cases in understanding who should really be the leader.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-11 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740684#comment-14740684
 ] 

Timothy Potter commented on SOLR-7569:
--

Looks good Ishan. Sorry for the delay getting a review done. In 
putNonLeadersIntoLIR, you probably want to wait a little bit before killing the 
leader after sending doc #2 to give the leader time to put the replicas into 
LIR; this works quickly on our local workstations but can take a little more 
time on Jenkins.

I'm also wondering if you should bring the original downed leader back into the 
mix (the one that got killed in the putNonLeadersIntoLIR method) in the 
testReplicasInLIRNoLeader test after the new leader is selected and see what 
state it comes back to. Also, try sending another doc #5 once the Jetty hosting 
the original leader is back online.

Lastly, what happened to the idea of allowing the user to pick the leader as 
part of the recover shard request? I read the comments above and agree that 
just triggering a re-election is preferred, but sometimes us humans actually 
know which replica is best. It seems reasonable to me to accept an optional 
parameter that specifies the replica that should be selected. However, if 
others don't like that idea, then I'm fine with this for now.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-03 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729291#comment-14729291
 ] 

Timothy Potter commented on SOLR-7569:
--

Hi, will dig into this in detail later today, sorry for the delay (been on 
another project ) ...

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-02 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727740#comment-14727740
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


bq. At a high-level, the issue boils down to giving SolrCloud operators a way 
to either a) manually force a leader to be elected, or b) set an optional 
configuration property that triggers the force leader behavior after seeing so 
many failed recoveries due to no leader.
I think the difference is that in this issue, we're just trying to (manually) 
clean up the LIR state and mark affected down replicas as active and hope that 
normal leader election is initiated and normalcy is restored. Based on initial 
glance at SOLR-6236, it seems that the intention there is to actually force one 
of the replicas to become a leader (either manually or automatically).

I am not sure which path we should take, but it seems the approach taken here 
is less intrusive/safer, if/when it works.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-02 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727720#comment-14727720
 ] 

Mark Miller commented on SOLR-7569:
---

This seems to have some overlap with SOLR-6236 based on the comments. Tim did 
some work here as well:

{quote}At a high-level, the issue boils down to giving SolrCloud operators a 
way to either a) manually force a leader to be elected, or b) set an optional 
configuration property that triggers the force leader behavior after seeing so 
many failed recoveries due to no leader. So this can be considered an optional 
availablity-over-consistency mode with respect to leader-failover.{quote}

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-02 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727549#comment-14727549
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:



bq.  1.  nit - RecoverShardTest has an unused notLeader1 variable
Thanks. Made some refactoring to the test and this has gone away now.

bq.   2.Shouldn't the "Wait for a long time for a steady state" piece of 
code be before the proxies for the two replicas are reopened? The LIR state 
will surely be set at indexing time and only if the proxy is closed. Also if 
you move that wait before the proxy is reopened then you are sure to have the 
LIR state as 'down'.
This makes sense, I've made the change.

bq.   3.The check for 'numActiveReplicas' and 'numReplicasOnLiveNodes' 
should be done after force refreshing the cluster state of the cloudClient 
otherwise spurious failures can happen

I didn't know about this force update of the cluster state; I've now added it.

bq.  4.nit - Why is sendDoc overridden in RecoverShardTest? The minRf is 
same, just the max retries has been increased and wait between retries has been 
decreased
The tests were (and still are) taking too long, and reducing the wait from 
30sec to 1sec was helpful.

bq. 5.The OCMH.recoverShard() isn't unsetting the leader properly. It 
should be as simple as:
Thanks, I've cleaned this up.

bq.  6.Can you please write a test to ensure that this API works with 
'async' parameter?
TODO.

bq.Leader is live but 'down' -> mark it 'active'
This works now. Added testLeaderDown() method.

bq.Leader itself is in LIR -> delete the LIR node
This should work, since the API method first clears the LIR state. Couldn't add 
a test for this, since I couldn't simulate this state in a test.

bq.Leader is not live:   Replicas are live but 'down' or 'recovering' 
-> mark them 'active'
This works now. Added testAllReplicasDownNoLeader() method.

bq.Leader is not live:   Replicas are live but in LIR -> delete the LIR 
nodes
This works as last patch. The corresponding test is now at 
testReplicasInLIRNoLeader().

bq. Did you find out why/how that happened? If this is reproducible, can you 
please create an issue and post the test there?
Added SOLR-7989 for this, will look deeper soon.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-02 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727757#comment-14727757
 ] 

Mark Miller commented on SOLR-7569:
---

[~thelabdude], what is your impression?

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-02 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727760#comment-14727760
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


bq. This seems to have some overlap with SOLR-6236 based on the comments.
I think I should've posted the patches there, instead of here. Seems like I'm 
trying to solve the same problem here (albeit, in a different way).

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-09-02 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727759#comment-14727759
 ] 

Mark Miller commented on SOLR-7569:
---

bq. we're just trying to (manually) clean up the LIR state and mark affected 
down replicas as active and hope that normal leader election is initiated and 
normalcy is restored.

I like that approach too, but I want to make sure we consider SOLR-6236.

> Create an API to force a leader election between nodes
> --
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>  Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-08-27 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716934#comment-14716934
 ] 

Shalin Shekhar Mangar commented on SOLR-7569:
-

Thanks Ishan! A few comments:
# nit - RecoverShardTest has an unused notLeader1 variable
# Shouldn't the Wait for a long time for a steady state piece of code be 
*before* the proxies for the two replicas are reopened? The LIR state will 
surely be set at indexing time and only if the proxy is closed. Also if you 
move that wait before the proxy is reopened then you are sure to have the LIR 
state as 'down'.
# The check for 'numActiveReplicas' and 'numReplicasOnLiveNodes' should be done 
after force refreshing the cluster state of the cloudClient otherwise spurious 
failures can happen
# nit - Why is sendDoc overridden in RecoverShardTest? The minRf is same, just 
the max retries has been increased and wait between retries has been decreased
# The OCMH.recoverShard() isn't unsetting the leader properly. It should be as 
simple as:
{code}
ZkNodeProps m = new ZkNodeProps(Overseer.QUEUE_OPERATION, 
OverseerAction.LEADER.toLower(),
  ZkStateReader.SHARD_ID_PROP, shardId, ZkStateReader.COLLECTION_PROP, 
collection);
  Overseer.getInQueue(zkClient).offer(Utils.toJSON(m));
{code}
# Can you please write a test to ensure that this API works with 'async' 
parameter?

I think some simple scenarios are not being taken care of. This command only 
helps if there a LIR node exists but we can do a bit more:
# Leader is live but 'down' - mark it 'active'
# Leader itself is in LIR - delete the LIR node
# Leader is not live:
## Replicas are live but 'down' or 'recovering' - mark them 'active'
## Replicas are live but in LIR - delete the LIR nodes

Can you please add some tests exercising each of the above scenarios?

bq. I also tried to mark just one of the replicas as active instead of all the 
replicas, hoping it will become leader and others would recover from it. 
However, this resulted in one of the other down replicas becoming leader but 
still staying down. Looking into why that could be happening; bug?

Did you find out why/how that happened? If this is reproducible, can you please 
create an issue and post the test there?

 Create an API to force a leader election between nodes
 --

 Key: SOLR-7569
 URL: https://issues.apache.org/jira/browse/SOLR-7569
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
  Labels: difficulty-medium, impact-high
 Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
 SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
 SOLR-7569_lir_down_state_test.patch


 There are many reasons why Solr will not elect a leader for a shard e.g. all 
 replicas' last published state was recovery or due to bugs which cause a 
 leader to be marked as 'down'. While the best solution is that they never get 
 into this state, we need a manual way to fix this when it does get into this  
 state. Right now we can do a series of dance involving bouncing the node 
 (since recovery paths between bouncing and REQUESTRECOVERY are different), 
 but that is difficult when running a large cluster. Although it is possible 
 that such a manual API may lead to some data loss but in some cases, it is 
 the only possible option to restore availability.
 This issue proposes to build a new collection API which can be used to force 
 replicas into recovering a leader while avoiding data loss on a best effort 
 basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-08-24 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709330#comment-14709330
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


I just tried to simulate the scenario where all the replicas are in down state 
due to LIR, and there is no leader. In this state, the leader election queue is 
empty.

So, I am thinking of some way to have the replicas (that are on live nodes) to 
join the leader election. Is there any clean way of doing that, short of a core 
reload?

 Create an API to force a leader election between nodes
 --

 Key: SOLR-7569
 URL: https://issues.apache.org/jira/browse/SOLR-7569
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
  Labels: difficulty-medium, impact-high
 Attachments: SOLR-7569.patch, SOLR-7569.patch, 
 SOLR-7569_lir_down_state_test.patch


 There are many reasons why Solr will not elect a leader for a shard e.g. all 
 replicas' last published state was recovery or due to bugs which cause a 
 leader to be marked as 'down'. While the best solution is that they never get 
 into this state, we need a manual way to fix this when it does get into this  
 state. Right now we can do a series of dance involving bouncing the node 
 (since recovery paths between bouncing and REQUESTRECOVERY are different), 
 but that is difficult when running a large cluster. Although it is possible 
 that such a manual API may lead to some data loss but in some cases, it is 
 the only possible option to restore availability.
 This issue proposes to build a new collection API which can be used to force 
 replicas into recovering a leader while avoiding data loss on a best effort 
 basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-08-24 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709403#comment-14709403
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


 In this state, the leader election queue is empty.
Ignore that, I was catching that state before the replicas had a chance to 
rejoin the election. The last assert in the patch is inappropriate.

 Create an API to force a leader election between nodes
 --

 Key: SOLR-7569
 URL: https://issues.apache.org/jira/browse/SOLR-7569
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
  Labels: difficulty-medium, impact-high
 Attachments: SOLR-7569.patch, SOLR-7569.patch, 
 SOLR-7569_lir_down_state_test.patch


 There are many reasons why Solr will not elect a leader for a shard e.g. all 
 replicas' last published state was recovery or due to bugs which cause a 
 leader to be marked as 'down'. While the best solution is that they never get 
 into this state, we need a manual way to fix this when it does get into this  
 state. Right now we can do a series of dance involving bouncing the node 
 (since recovery paths between bouncing and REQUESTRECOVERY are different), 
 but that is difficult when running a large cluster. Although it is possible 
 that such a manual API may lead to some data loss but in some cases, it is 
 the only possible option to restore availability.
 This issue proposes to build a new collection API which can be used to force 
 replicas into recovering a leader while avoiding data loss on a best effort 
 basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-08-21 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706723#comment-14706723
 ] 

Mark Miller commented on SOLR-7569:
---

I don't really like the idea of choosing a leader. It seems to me this feature 
should force a new election and address the state that prevents someone from 
becoming leader somehow. You still want the sync stage and the system to pick 
the best leader though. This should just get you out of the state that is 
preventing a leader from being elected.

 Create an API to force a leader election between nodes
 --

 Key: SOLR-7569
 URL: https://issues.apache.org/jira/browse/SOLR-7569
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
  Labels: difficulty-medium, impact-high
 Attachments: SOLR-7569.patch, SOLR-7569.patch


 There are many reasons why Solr will not elect a leader for a shard e.g. all 
 replicas' last published state was recovery or due to bugs which cause a 
 leader to be marked as 'down'. While the best solution is that they never get 
 into this state, we need a manual way to fix this when it does get into this  
 state. Right now we can do a series of dance involving bouncing the node 
 (since recovery paths between bouncing and REQUESTRECOVERY are different), 
 but that is difficult when running a large cluster. Although it is possible 
 that such a manual API may lead to some data loss but in some cases, it is 
 the only possible option to restore availability.
 This issue proposes to build a new collection API which can be used to force 
 replicas into recovering a leader while avoiding data loss on a best effort 
 basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-08-21 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706311#comment-14706311
 ] 

Varun Thacker commented on SOLR-7569:
-

bq. Pick the next leader: If the leader election queue is not empty and the 
first replica in the queue is on a live node, choose the replica as the next 
leader. Otherwise, pick a random replica, which is on a live node, to become 
the next leader (TODO: we can have the user specify which replica he/she wants 
as the next leader).

Maybe pick the leader amongst the replicas which has the latest commit 
timestamp?

 Create an API to force a leader election between nodes
 --

 Key: SOLR-7569
 URL: https://issues.apache.org/jira/browse/SOLR-7569
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
  Labels: difficulty-medium, impact-high
 Attachments: SOLR-7569.patch, SOLR-7569.patch


 There are many reasons why Solr will not elect a leader for a shard e.g. all 
 replicas' last published state was recovery or due to bugs which cause a 
 leader to be marked as 'down'. While the best solution is that they never get 
 into this state, we need a manual way to fix this when it does get into this  
 state. Right now we can do a series of dance involving bouncing the node 
 (since recovery paths between bouncing and REQUESTRECOVERY are different), 
 but that is difficult when running a large cluster. Although it is possible 
 that such a manual API may lead to some data loss but in some cases, it is 
 the only possible option to restore availability.
 This issue proposes to build a new collection API which can be used to force 
 replicas into recovering a leader while avoiding data loss on a best effort 
 basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-08-18 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701387#comment-14701387
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:


Thanks [~markrmil...@gmail.com] for the pointer to the issues. 
I agree this is a sledge hammer to undo the effects of bugs, which shouldn't be 
needed if we go by an improved design. We have observed the effects of these 
bugs in production clusters of our clients, and this is to help them in such a 
scenario. Do you think we should continue down this sledge hammer path, 
parallel to fixing the bugs?

 Create an API to force a leader election between nodes
 --

 Key: SOLR-7569
 URL: https://issues.apache.org/jira/browse/SOLR-7569
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
  Labels: difficulty-medium, impact-high
 Attachments: SOLR-7569.patch


 There are many reasons why Solr will not elect a leader for a shard e.g. all 
 replicas' last published state was recovery or due to bugs which cause a 
 leader to be marked as 'down'. While the best solution is that they never get 
 into this state, we need a manual way to fix this when it does get into this  
 state. Right now we can do a series of dance involving bouncing the node 
 (since recovery paths between bouncing and REQUESTRECOVERY are different), 
 but that is difficult when running a large cluster. Although it is possible 
 that such a manual API may lead to some data loss but in some cases, it is 
 the only possible option to restore availability.
 This issue proposes to build a new collection API which can be used to force 
 replicas into recovering a leader while avoiding data loss on a best effort 
 basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-08-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701392#comment-14701392
 ] 

Mark Miller commented on SOLR-7569:
---

Yes, I think having this option is useful in the short term and in the longer 
term. The system will generally refuse to continue on if it thinks it may have 
data loss and stopping could allow a user to possibly recover that data. This 
could act as a way for a user to override that.

 Create an API to force a leader election between nodes
 --

 Key: SOLR-7569
 URL: https://issues.apache.org/jira/browse/SOLR-7569
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
  Labels: difficulty-medium, impact-high
 Attachments: SOLR-7569.patch


 There are many reasons why Solr will not elect a leader for a shard e.g. all 
 replicas' last published state was recovery or due to bugs which cause a 
 leader to be marked as 'down'. While the best solution is that they never get 
 into this state, we need a manual way to fix this when it does get into this  
 state. Right now we can do a series of dance involving bouncing the node 
 (since recovery paths between bouncing and REQUESTRECOVERY are different), 
 but that is difficult when running a large cluster. Although it is possible 
 that such a manual API may lead to some data loss but in some cases, it is 
 the only possible option to restore availability.
 This issue proposes to build a new collection API which can be used to force 
 replicas into recovering a leader while avoiding data loss on a best effort 
 basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-08-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701327#comment-14701327
 ] 

Mark Miller commented on SOLR-7569:
---

bq. maybe due to bugs?

The current design allows for this.

See SOLR-7034 and SOLR-7065 as possible improvement steps.

This is kind of a hack solution to a current production problem or to force a 
leader election even when we know it probably means data loss, those issues are 
closer to what is supposed to come next in terms of improving the current 
design.

I may have also just seen a bug where LIR info in ZK prevents anyone from 
becoming the leader even on full restart. SOLR-7065 should address those kinds 
of bugs if done right.

 Create an API to force a leader election between nodes
 --

 Key: SOLR-7569
 URL: https://issues.apache.org/jira/browse/SOLR-7569
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
  Labels: difficulty-medium, impact-high
 Attachments: SOLR-7569.patch


 There are many reasons why Solr will not elect a leader for a shard e.g. all 
 replicas' last published state was recovery or due to bugs which cause a 
 leader to be marked as 'down'. While the best solution is that they never get 
 into this state, we need a manual way to fix this when it does get into this  
 state. Right now we can do a series of dance involving bouncing the node 
 (since recovery paths between bouncing and REQUESTRECOVERY are different), 
 but that is difficult when running a large cluster. Although it is possible 
 that such a manual API may lead to some data loss but in some cases, it is 
 the only possible option to restore availability.
 This issue proposes to build a new collection API which can be used to force 
 replicas into recovering a leader while avoiding data loss on a best effort 
 basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

2015-08-18 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701503#comment-14701503
 ] 

Erick Erickson commented on SOLR-7569:
--

bq: If the chosen leader is not at the head of the leader election queue, have 
it join the election at the head (similar to what REBALANCELEADERS tries to do).

Be _really_ careful if you are trying to manipulate the leader election stuff, 
it's very easy to get wrong. Or at least it was last time I looked, perhaps 
it's changed a lot since then. I'd be glad to chat about what I remember if 
you'd like.

 Create an API to force a leader election between nodes
 --

 Key: SOLR-7569
 URL: https://issues.apache.org/jira/browse/SOLR-7569
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
  Labels: difficulty-medium, impact-high
 Attachments: SOLR-7569.patch


 There are many reasons why Solr will not elect a leader for a shard e.g. all 
 replicas' last published state was recovery or due to bugs which cause a 
 leader to be marked as 'down'. While the best solution is that they never get 
 into this state, we need a manual way to fix this when it does get into this  
 state. Right now we can do a series of dance involving bouncing the node 
 (since recovery paths between bouncing and REQUESTRECOVERY are different), 
 but that is difficult when running a large cluster. Although it is possible 
 that such a manual API may lead to some data loss but in some cases, it is 
 the only possible option to restore availability.
 This issue proposes to build a new collection API which can be used to force 
 replicas into recovering a leader while avoiding data loss on a best effort 
 basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org