[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2018-10-03 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637183#comment-16637183
 ] 

ASF subversion and git services commented on SOLR-8034:
---

Commit 1854bb1ff8377390c1117dcfb8a35c6480977c21 in lucene-solr's branch 
refs/heads/branch_7x from Tomas Fernandez Lobbe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1854bb1 ]

SOLR-12767: Always include the achieved rf in the response

This commit deprecates the min_rf parameter. Solr now always includes the 
achieved replication
factor in the update requests (as if min_rf was always specified). Also, 
reverts the changes
introduced in SOLR-8034, replicas that don't ack an update will have to recover 
to prevent
inconsistent shards.


> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Anshum Gupta
>Priority: Major
>  Labels: solrcloud
> Fix For: 5.4, 6.0
>
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2018-10-03 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637181#comment-16637181
 ] 

ASF subversion and git services commented on SOLR-8034:
---

Commit 46f753d7c6df52c06d970a13d3b742310276f2ca in lucene-solr's branch 
refs/heads/master from Tomas Fernandez Lobbe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=46f753d ]

SOLR-12767: Always include the achieved rf in the response

This commit deprecates the min_rf parameter. Solr now always includes the 
achieved replication
factor in the update requests (as if min_rf was always specified). Also, 
reverts the changes
introduced in SOLR-8034, replicas that don't ack an update will have to recover 
to prevent
inconsistent shards.


> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Anshum Gupta
>Priority: Major
>  Labels: solrcloud
> Fix For: 5.4, 6.0
>
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-16 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790539#comment-14790539
 ] 

Anshum Gupta commented on SOLR-8034:


Thanks [~mewmewball].

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Anshum Gupta
>  Labels: solrcloud
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-16 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14768990#comment-14768990
 ] 

Timothy Potter commented on SOLR-8034:
--

Cool - I created SOLR-8062 to handle that change.

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Anshum Gupta
>  Labels: solrcloud
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14768979#comment-14768979
 ] 

Mark Miller commented on SOLR-8034:
---

bq. Previously, we talked about throwing an exception instead of just returning 
to value for the client to interpret and maybe that makes it more explicit that 
clients MUST handle minRf not being achieved. 

I still think that is how this should work.

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Anshum Gupta
>  Labels: solrcloud
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746353#comment-14746353
 ] 

ASF subversion and git services commented on SOLR-8034:
---

Commit 1703302 from [~anshumg] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1703302 ]

SOLR-8034: Leader no longer puts replicas in recovery in case of a failed 
update, when minRF isn't achieved. (merge from trunk)

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Anshum Gupta
>  Labels: solrcloud
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746127#comment-14746127
 ] 

ASF subversion and git services commented on SOLR-8034:
---

Commit 1703289 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1703289 ]

SOLR-8034: Leader no longer puts replicas in recovery in case of a failed 
update, when minRF isn't achieved.

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Anshum Gupta
>  Labels: solrcloud
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-15 Thread Jessica Cheng Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745746#comment-14745746
 ] 

Jessica Cheng Mallet commented on SOLR-8034:


Oops, sorry! Didn't know there's another Tim Potter. :P

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Anshum Gupta
>  Labels: solrcloud
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-15 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745737#comment-14745737
 ] 

Anshum Gupta commented on SOLR-8034:


Thanks for fixing the assert.

bq.replica will not realize it's down on its own since the partition is between 
the leader and the replica, not between the replica and zookeeper – so it won't 
be set to down until the leader tries to forward the document to it and fails

Right, should've realized that.

Also, about my opinion being split, I wasn't in on this, but I thought more and 
it makes more sense to go with this.

Thanks [~mewmewball] . LGTM overall, I'll commit this.

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>  Labels: solrcloud
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-15 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745735#comment-14745735
 ] 

Timothy Potter commented on SOLR-8034:
--

[~tpot] oops! Jessica was actually pinging me {{thelabdude}} (same name, 
different handle)

[~mewmewball] this looks good to me ... nice test case! Also, I agree that if 
the client is using {{minRf}} then it is their responsibility to handle the 
response correctly. Previously, we talked about throwing an exception instead 
of just returning to value for the client to interpret and maybe that makes it 
more explicit that clients MUST handle minRf not being achieved.  We should 
handle that in another ticket though.

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>  Labels: solrcloud
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-15 Thread Jessica Cheng Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745722#comment-14745722
 ] 

Jessica Cheng Mallet commented on SOLR-8034:


Ah, and regarding

{quote}
I'm kind of split on this as the replica here would be out of sync from the 
leader and would never know about it, increasing the odds of inconsistency when 
the client doesn't handle it the right way i.e. it kind of self-heals at this 
point, and that would stop happening.
{quote}
I'd hope that if the user is explicitly using minRf that they handle it the 
right way (i.e. retry if minRf isn't achieved). The contract would be if the 
request fails, it needs to be retried or we can possibly see inconsistent 
state. I think this is true currently in a normal update if the forwarded 
parallel update to the followers succeeds but somehow it fails on the leader--a 
failure would be returned to the user but the update could be present on the 
followers. 

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>  Labels: solrcloud
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-15 Thread Jessica Cheng Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745706#comment-14745706
 ] 

Jessica Cheng Mallet commented on SOLR-8034:


[~anshumg], I fixed the comment for the assertion, but I didn't add the test 
that the replica is down after the first network partition, because the point 
is that the replica will not realize it's down on its own since the partition 
is between the leader and the replica, not between the replica and zookeeper -- 
so it won't be set to down until the leader tries to forward the document to it 
and fails, and then set it in leader-initiated-recovery.

[~tpot], we discussed this in ticket 4072.

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>  Labels: solrcloud
> Attachments: SOLR-8034.patch, SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-14 Thread Tim Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744936#comment-14744936
 ] 

Tim Potter commented on SOLR-8034:
--

Hi [~mewmewball].  I don't recall talking with you - perhaps you meant someone 
else?

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>  Labels: solrcloud
> Attachments: SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-11 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741577#comment-14741577
 ] 

Anshum Gupta commented on SOLR-8034:


I'm kind of split on this as the replica here would be out of sync from the 
leader and would never know about it, increasing the odds of inconsistency when 
the client doesn't handle it the right way i.e. it kind of self-heals at this 
point, and that would stop happening.

At the same time, I kind of like the idea as it allows for failover in the case 
you've mentioned above.

For the patch, LGTM. Can you also test that the replica is down, right after 
the first network partition, but before you send the document? That would 
ensure that the replication factor is 2, because of the partition and not due 
to another variable.

Also, can you fix the comment for assertion failure below? You're expecting 2 
_non-leader_replicas and not 2 _replicas_:
{code}
+List notLeaders =
+ensureAllReplicasAreActive(testCollectionName, "shard1", 1, 3, 
maxWaitSecsToSeeAllActive);
+assertTrue("Expected 2 replicas for collection " + testCollectionName
++ " but found " + notLeaders.size() + "; clusterState: "
++ printClusterStateInfo(testCollectionName),
+notLeaders.size() == 2);
{code}

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>  Labels: solrcloud
> Attachments: SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery

2015-09-11 Thread Jessica Cheng Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741536#comment-14741536
 ] 

Jessica Cheng Mallet commented on SOLR-8034:


[~tpot] This is what we discussed a while ago. Will you please give it a look? 
Thanks!

> If minRF is not satisfied, leader should not put replicas in recovery
> -
>
> Key: SOLR-8034
> URL: https://issues.apache.org/jira/browse/SOLR-8034
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>  Labels: solrcloud
> Attachments: SOLR-8034.patch
>
>
> If the minimum replication factor parameter (minRf) in a solr update request 
> is not satisfied -- i.e. if the update was not successfully applied on at 
> least n replicas where n >= minRf -- the shard leader should not put the 
> failed replicas in "leader initiated recovery" and the client should retry 
> the update instead.
> This is so that in the scenario were minRf is not satisfied, the failed 
> replicas can still be eligible to become a leader in case of leader failure, 
> since in the client's perspective this update did not succeed.
> This came up from a network partition scenario where the leader becomes 
> sectioned off from its two followers, but they all could still talk to 
> zookeeper. The partitioned leader set its two followers as in leader 
> initiated recovery, so we couldn't just kill off the partitioned node and 
> have a follower take over leadership. For a minRf=1 case, this is the correct 
> behavior because the partitioned leader would have accepted updates that the 
> followers don't have, and therefore we can't switch leadership or we'd lose 
> those updates. However, in the case of minRf=2, solr never accepted any 
> update in the client's point of view, so in fact the partitioned leader 
> doesn't have any accepted update that the followers don't have, and therefore 
> the followers should be eligible to become leaders. Thus, I'm proposing 
> modifying the leader initiated recovery logic to not put the followers in 
> recovery if the minRf parameter is present and is not satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org