[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2017-04-26 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985005#comment-15985005
 ] 

Mark Miller commented on SOLR-7141:
---

[~mihaly.toth], on lowering that for tests see SOLR-9849 Use a very low value 
for solr.cloud.wait-for-updates-with-stale-state-pause in tests.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 5.1, 6.0
>
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2017-04-26 Thread Mano Kovacs (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984679#comment-15984679
 ] 

Mano Kovacs commented on SOLR-7141:
---

Sorry, the 7 seconds was already lowered in SOLR-9848.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 5.1, 6.0
>
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2017-04-26 Thread Mano Kovacs (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984423#comment-15984423
 ] 

Mano Kovacs commented on SOLR-7141:
---

[~mihaly.toth], I created SOLR-10570 about your proposal.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 5.1, 6.0
>
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2017-04-25 Thread Mihaly Toth (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983639#comment-15983639
 ] 

Mihaly Toth commented on SOLR-7141:
---

Until SOLR-7427 gets fixed 
{{solr.cloud.wait-for-updates-with-stale-state-pause}} could be set shorter in 
tests. I have seen failures caused by this long delay appearing couple of times.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 5.1, 6.0
>
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-04-19 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502056#comment-14502056
 ] 

Shalin Shekhar Mangar commented on SOLR-7141:
-

I opened SOLR-7427

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: Trunk, 5.1
>
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-04-19 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502048#comment-14502048
 ] 

Shalin Shekhar Mangar commented on SOLR-7141:
-

Thanks for explaining in detail, Yonik. This is indeed tricky. Shard splitting 
can suffer from the same problem. I'll open an issue to track this.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: Trunk, 5.1
>
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-04-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501622#comment-14501622
 ] 

Yonik Seeley commented on SOLR-7141:


It's tricky ;-)

>From memory, here's how it's supposed to work:
1. replica tells leader it want's to recover
2. leader starts forwarding updates to replica (which the replica buffers since 
it's in recovery)
3. leader executes a hard commit (so replica can replicate the current index)
4. replica starts replicating index from the last leader commit point

Note that the ordering of #2 and #3 is very important.  If we did #3 first and 
then #2 after, some updates won't make it into the commit and also won't be 
forwarded to the replica (and that leads to data loss).

Now the issue: even though we do #2 first and #3 after... it's possible to have 
an unfortunately scheduled update in a different thread that started before we 
did #2, and doesn't complete until after #3, so that update was not forwarded, 
and it's also not in the replicated index.  The sleep (which should be between 
steps #2 and #3) is to try and give time for this update to complete and make 
it into the index.

It occurs to me that the lucene IndexWriter thread stealing (same issue that 
caused this: SOLR-6820) could make this much more likely than we would have 
thought.

One possible alternative is to block updates for a commit of this type 
(replication commit).  Any blocked updates would need to see that they need to 
be forwarded to the replica too (once they are unblocked) - I don't know if the 
code is currently written that way.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: Trunk, 5.1
>
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-04-18 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501487#comment-14501487
 ] 

Shalin Shekhar Mangar commented on SOLR-7141:
-

Thanks Mark. What do you mean by "wrong" in this context? If the leader saw 
this node's state as down then it wouldn't send any updates our way; if it saw 
this node's state as recovering or active then it would send the exact same 
request. What kinds of requests or scenarios are we trying to prevent by 
waiting?

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: Trunk, 5.1
>
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-04-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501446#comment-14501446
 ] 

Mark Miller commented on SOLR-7141:
---

Its as the comment says:

{noformat}
// we wait a bit so that any updates on the leader
// that started before they saw recovering state 
// are sure to have finished
{noformat}

Slow updates based on old state can be wrong. We started this at 2 seconds just 
kind of out of thin air. I had to raise it because I saw it wasn't enough in 
some hdfs chaosmonkey test fails and so we need to make sure we have plenty of 
padding for real world hardware.

It might be nice to do something more concrete, but it's very tricky to solve 
nicely.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: Trunk, 5.1
>
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-04-18 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501426#comment-14501426
 ] 

Shalin Shekhar Mangar commented on SOLR-7141:
-

Hi Mark, can you please explain why this wait is necessary?

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: Trunk, 5.1
>
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-03-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377822#comment-14377822
 ] 

ASF subversion and git services commented on SOLR-7141:
---

Commit 1668884 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1668884 ]

SOLR-7141: RecoveryStrategy: Raise time that we wait for any updates from the 
leader before  they saw the recovery state to have finished.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-03-22 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375039#comment-14375039
 ] 

ASF subversion and git services commented on SOLR-7141:
---

Commit 1668396 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1668396 ]

SOLR-7141: RecoveryStrategy: Raise time that we wait for any updates from the 
leader before  they saw the recovery state to have finished.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Attachments: SOLR-7141.patch
>
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-02-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332275#comment-14332275
 ] 

Mark Miller commented on SOLR-7141:
---

I'll also make it configurable.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-02-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332273#comment-14332273
 ] 

Mark Miller commented on SOLR-7141:
---

I guess we should probably go up to like 10 seconds. Longer term, perhaps there 
is something else we can try that would better deal with a really badly timed 
long GC or something.

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7141) RecoveryStrategy: Raise time that we wait for any updates from the leader before they saw the recovery state to have finished.

2015-02-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332259#comment-14332259
 ] 

Mark Miller commented on SOLR-7141:
---

{noformat}
// we wait a bit so that any updates on the leader
// that started before they saw recovering state 
// are sure to have finished
try {
  Thread.sleep(3000);
} catch (InterruptedException e) {
  Thread.currentThread().interrupt();
}
{noformat}

> RecoveryStrategy: Raise time that we wait for any updates from the leader 
> before they saw the recovery state to have finished.
> --
>
> Key: SOLR-7141
> URL: https://issues.apache.org/jira/browse/SOLR-7141
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
>
> The current wait of 3 seconds is pushing the envelope a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org