[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-19 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516747#comment-16516747
 ] 

Cao Manh Dat commented on SOLR-11216:
-

Attached a patch including the optimization metionted above, including more 
javadocs and TODO notes. I will beast the test with PeerSyncReplicationTest to 
make sure it works correct.
If the result ok and no objections, I will commit the patch soon.

> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11216.patch, SOLR-11216.patch, SOLR-11216.patch, 
> SOLR-11216.patch, SOLR-11216.patch, SOLR-11216.patch
>
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-17 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515335#comment-16515335
 ] 

Cao Manh Dat commented on SOLR-11216:
-

Attached a patch with refactoring. The code seems much cleaner now.

> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11216.patch, SOLR-11216.patch, SOLR-11216.patch, 
> SOLR-11216.patch
>
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-15 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514525#comment-16514525
 ] 

Cao Manh Dat commented on SOLR-11216:
-

Thank guys for your reviews. This is a rough patch which needs to change/move 
things around to make it cleaner. To be more clear the process of the new 
PeerSync (PeerSyncWithLeader) is
* Replica gets its recent updates versions
* Replica requests recent updates versions + fingerprint from the leader
* Replica requests missed updates (updates in buffer tlog are considered missed 
updates) up to leader's {{fingerprint.maxVersionEncountered}}
* Replica apply missed updates then compare its fingerprint with leader's 
fingerprint in step 2

The reason for getting the fingerprint in step 2 is we do not trust 
{{fingerprint.maxVersionSpecified}}. Therefore we must use the fingerprint of 
the leader with {{fingerprint.maxVersionSpecified==Long.MAX_VALUE}} (or 
fingerprint of leader's index at the time of step 2). We may need to block 
updates between getting recent versions and computing fingerprint on the 
leader's side, but let do it later.

By request updates up to {{fingerprint.maxVersionEncountered}}. We will make 
sure that after apply updates, {{replica.maxVersionEncountered}} will equal 
with the leader, hence its fingerprint will be the same as the leader.

Another optimization here is on step 3, instead of considering buffered updates 
as missed updates, we just need to memo the buffered updates need to be applied 
on step 4.





> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11216.patch, SOLR-11216.patch, SOLR-11216.patch
>
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-15 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514244#comment-16514244
 ] 

Yonik Seeley commented on SOLR-11216:
-

{quote}
SolrQueryRequest req = new LocalSolrQueryRequest(core,
new ModifiableSolrParams()); 

request is not safely closed, is this intentional? won't this break the 
reference count mechanism?
{quote}

Yeah, it does look like it should be closed.  A SolrQueryRequest grabs a 
searcher reference on-demand, so that may be why it isn't causing an issue with 
any tests (the commit command doesn't grab a searcher reference with the 
provided request).  It should be fixed anyway though.



> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11216.patch, SOLR-11216.patch, SOLR-11216.patch
>
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-15 Thread hamada (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514102#comment-16514102
 ] 

hamada commented on SOLR-11216:
---

General review comments, some not related to the patch but relevant In general.

PeerSyncWithLeader 

use 

startingVersions.isEmpty() rather than size() == 0, same for 215

The following try/finally can return, in which case proc is not closed, Is this 
intentional, and if so please add a comment to the effect

line 299, consider sizing the List properly to avoid  garbage side effect from 
growing the list, same applies to line 317

 

HttpShardHandler.java 

if (urls.size()==0) { with if (urls.isEmpty()) {

 

RecoveryStrategy.java

line 223 and 613, 235 (on 
core.getDeletionPolicy().getLatestCommit().getGeneration()) may result in an 
NPE 

line 436 

SolrQueryRequest req = new LocalSolrQueryRequest(core,
 new ModifiableSolrParams()); 

request is not safely closed, is this intentional? won't this break the 
reference count mechanism?

 

 

> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11216.patch, SOLR-11216.patch, SOLR-11216.patch
>
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-15 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513887#comment-16513887
 ] 

Cao Manh Dat commented on SOLR-11216:
-

Attached patch for Solution 2. Created a new class PeerSyncWithLeader with some 
duplications with its original class (PeerSync) but what we will gain here is 
an easier to understand flow (fewer flags) and optimized for doing peerSync on 
recovery.
Any objections about this separations? [~shalinmangar] [~markrmil...@gmail.com]


> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11216.patch, SOLR-11216.patch, SOLR-11216.patch
>
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-13 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511005#comment-16511005
 ] 

Cao Manh Dat commented on SOLR-11216:
-

Attached patch for anyone wants to reproduce problems of Solution 3.

> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11216.patch, SOLR-11216.patch
>
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-13 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511004#comment-16511004
 ] 

Cao Manh Dat commented on SOLR-11216:
-

After spent a day adding more test and debugging problem. I think that with the 
current IndexFingerprint implementation we can't go with Solution 3.
Firstly, to go with Solution 3, we must compute the fingerprint of the index up 
to a specified point. But just by looking at the current index, we can't do 
that. Ie:
A leader :
- with updates: doc1(v=0), doc2(v=1), doc3(v=3), delete(doc3, v=4), doc2(v=5).
- its index will be: doc1(v=0), doc2(v=5)

A replica :
- with index: doc1(v=0), doc2(v=1)

Case 1:
A replica asks for updates and fingerprint up to (include) v=3. The Leader will 
return updates doc3(v=3)
- leader's fingerprint will be hash of doc1(v=0) (it will skip doc2, since its 
version = 5 > specified version 3)
- replica' fingerprint will be hash of  doc1(v=0), doc2(v=1), doc3(v=3)
-> incorrect fingerprint.

There are many other cases which are very tricky to solve. Therefore I think 
the best thing to do now is Solution 2.


> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11216.patch
>
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-12 Thread hamada (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509971#comment-16509971
 ] 

hamada commented on SOLR-11216:
---

from 20,000 foot level. Any time based solution is just brittle, solution 2 
sounds like a workaround. solution 3 seems to fit the bill.

 

> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11216.patch
>
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-12 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509691#comment-16509691
 ] 

Cao Manh Dat commented on SOLR-11216:
-

Attached a patch for this ticket, based on Solution 3. It needs more test, but 
the overall result seems good.

> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11216.patch
>
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2018-06-10 Thread Cao Manh Dat (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507649#comment-16507649
 ] 

Cao Manh Dat commented on SOLR-11216:
-

The problem here relates to "on-wire" updates which get returned in 
"getVersions" request but do not present in replica's buffering tlog. Here are 
couples of solution
* Solution 1: After submitting "getVersions" request, the replica will wait for 
some time. Therefore "on-wire" updates will land on buffering tlog. This is the 
simplest solution but less robust than solution 2.
* Solution 2: On finding missed updates, the replica will consider buffered 
updates as missed one. Hence will request these updates from the leader and 
apply them to its local index -> It will make the fingerprint comparison 
success.
* Solution 3: On finding missed updates, the replica will consider any updates 
with version larger than minVersion(buffered updates) are non-missed updates 
(the "on-wire" updates will be filled on applyBufferedUpdates() call). We only 
do fingerprint comparison up-to minVersion(buffered updates).

[~praste] Yeah, that case will be very tricky to solve, but at least we should 
solve some common cases.


> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions) when replica get recent versions ( so it will be 
> 1,2,3,4,5,6,7,8,9 )
> ## replica do peersync and request updates 5, 6, 9 from leader 
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2017-11-20 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260117#comment-16260117
 ] 

Pushkar Raste commented on SOLR-11216:
--

[~caomanhdat]
Can this fail if the leader processes updates out of order e.g. what if leader 
processed updates in the order 6 and has yet to process 5. Now the replica 
requests update 6. However, leader has just finished processing 5 (including a 
soft/hard commit) and when leader calculates index fingerprint up to 6, the 
leader's fingerprint will include version 5 as well. 

Considering all the race conditions, I think making fingerprint robust is 
tricky. 

> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## replica get recent versions which is 1,2,3,4,7,8
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions)
> ## replica do peersync and request updates 5, 6, 9 from leader
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org