[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-04-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456851#comment-16456851
 ] 

Mark Miller commented on SOLR-11702:


This looks like a massive improvement for many long standing issues, great work 
[~caomanhdat]!

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400266#comment-16400266
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit 1afe333844bf133538923a6ca1a3de0b2076d788 in lucene-solr's branch 
refs/heads/branch_7_3 from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1afe333 ]

SOLR-11702: Minor edits to log and exception messages

(cherry picked from commit dab739a)

(cherry picked from commit 4b52a19)


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400263#comment-16400263
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit dab739ae4cc8c3ff4ece24992ad8c633f7a4b19c in lucene-solr's branch 
refs/heads/master from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dab739a ]

SOLR-11702: Minor edits to log and exception messages


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400264#comment-16400264
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit 4b52a19f4adfed57c9265ebee85d4e03321f6dbb in lucene-solr's branch 
refs/heads/branch_7x from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4b52a19 ]

SOLR-11702: Minor edits to log and exception messages

(cherry picked from commit dab739a)


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-03-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394368#comment-16394368
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit 1f994c97301fbe8926115925102c78a8a133e26b in lucene-solr's branch 
refs/heads/branch_7x from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1f994c9 ]

SOLR-11702: Remove noise of exception messages on failed to ping leader


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-03-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394367#comment-16394367
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit e926f435d7e318b30b2d9ec38be87ad9ab7eed45 in lucene-solr's branch 
refs/heads/master from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e926f43 ]

SOLR-11702: Remove noise of exception messages on failed to ping leader


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-03-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392349#comment-16392349
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit b992bbb2d7480d4cf2ff1d9302a7e20732c1100c in lucene-solr's branch 
refs/heads/branch_7x from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b992bbb ]

SOLR-11702: Fix precommit, only throw error to client if the replica is not in 
the same shard as leader


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-03-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392347#comment-16392347
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit dae572819ba479bffd990ea7d8f0c4f7b76da5b0 in lucene-solr's branch 
refs/heads/master from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dae5728 ]

SOLR-11702: Fix precommit, only throw error to client if the replica is not in 
the same shard as leader


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-03-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383286#comment-16383286
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit ce2386aaabc401bc89990597279eefeb67a914b0 in lucene-solr's branch 
refs/heads/branch_7x from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ce2386a ]

SOLR-11702: Remove old LIR call in SolrCmdDistributor and let 
DistributedUpdateProcessor handle it on finish()


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-03-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383285#comment-16383285
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit f1ce5419eebfa361f572802eb4a8b637c2849bb5 in lucene-solr's branch 
refs/heads/master from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f1ce541 ]

SOLR-11702: Remove old LIR call in SolrCmdDistributor and let 
DistributedUpdateProcessor handle it on finish()


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-01-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343096#comment-16343096
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit 8c8d78a4bb6c0f3322471af5765a01848247409c in lucene-solr's branch 
refs/heads/branch_7x from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8c8d78a ]

SOLR-11702: Redesign current LIR implementation


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-01-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343094#comment-16343094
 ] 

ASF subversion and git services commented on SOLR-11702:


Commit 27ef6530646a9af6f8fdf491afd80185bc4f7fee in lucene-solr's branch 
refs/heads/master from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=27ef653 ]

SOLR-11702: Redesign current LIR implementation


> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-01-22 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335360#comment-16335360
 ] 

Cao Manh Dat commented on SOLR-11702:
-

bq. LIRRollingUpdatesTest.testNewReplicaOldLeader – why is the proxy closed for 
both leader and replica? Isn't closing for replica sufficient to force LIR?
Yeah, you're right, closing leader's proxy is not necessary. That call is only 
for safety, I just want to simulate the real network partition between leader 
and replica

bq. LIRRollingUpdatesTest calls TestInjection.reset() in tearDown but fault 
injection isn't used anywhere in the test so it can be removed.
+1

bq. Javadocs for ZkShardTerms.ensureTermIsHigher says "Ensure that leader's 
term is lower than some replica's terms" but shouldn't the leader have a higher 
term? This is also mentioned in the design document "The idea of term is only 
replicas (in the same shard) with highest term are considered healthy". The 
impl is doing the opposite i.e. it is increasing the replica's term to 
leaderTerm+1.
+1, the javadoc is miss typed 

bq. Can you add javadocs to the various methods in the ZkShardTerms.Terms class?
Sure

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-01-22 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335353#comment-16335353
 ] 

Shalin Shekhar Mangar commented on SOLR-11702:
--

Ok, thanks for clarifying Dat. A few more questions/comments:
 # LIRRollingUpdatesTest.testNewReplicaOldLeader -- why is the proxy closed for 
both leader and replica? Isn't closing for replica sufficient to force LIR?
 # LIRRollingUpdatesTest calls TestInjection.reset() in tearDown but fault 
injection isn't used anywhere in the test so it can be removed.
 # Javadocs for ZkShardTerms.ensureTermIsHigher says "Ensure that leader's term 
is lower than some replica's terms" but shouldn't the leader have a higher 
term? This is also mentioned in the design document "The idea of _term_ is only 
replicas (in the same shard) with highest term are considered healthy". The 
impl is doing the opposite i.e. it is increasing the replica's term to 
leaderTerm+1.
 # Can you add javadocs to the various methods in the ZkShardTerms.Terms class?

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-01-15 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326193#comment-16326193
 ] 

Cao Manh Dat commented on SOLR-11702:
-

{quote}
I'm not sure that ZkShardTerms.refreshTerms behaves correctly on ZK Reconnect. 
Say a watcher was set (numWatcher=1) but not fired and the zk client 
disconnects. Then on re-connect, the OnReconnectListener in ZkController fires 
which re-registers cores and calls refreshTerms again. Now watcher won't be 
initialized in this method (because numWatcher=1) and therefore won't be set on 
terms znode anymore. Can you please verify?
{quote}
The logic you described match with the code, but as I observed watcher is 
always fired on reconnect, at least on DISCONNECT event. 

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-01-15 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326166#comment-16326166
 ] 

Shalin Shekhar Mangar commented on SOLR-11702:
--

I'm not sure that ZkShardTerms.refreshTerms behaves correctly on ZK Reconnect. 
Say a watcher was set (numWatcher=1) but not fired and the zk client 
disconnects. Then on re-connect, the OnReconnectListener in ZkController fires 
which re-registers cores and calls refreshTerms again. Now watcher won't be 
initialized in this method (because numWatcher=1) and therefore won't be set on 
terms znode anymore. Can you please verify?

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-01-14 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325907#comment-16325907
 ] 

Cao Manh Dat commented on SOLR-11702:
-

Thanks [~shalinmangar]
{quote}
1. DUP.setupRequest skips replicas having terms. If I understand correctly, 
this will mean that updates are no longer forwarded to replicas until they 
publish themselves in recovery? Is that right?
{quote}
Right, if term of a replica is less than leader term, leader will stop sending 
updates to that replica.

{quote}
2. CreateCollectionCmd – throw InterruptedException directly from the method 
instead of trying to handle it here
{quote}
The code of deleting old term nodes in CreateCollectionCmd is handled exactly 
same as the code below it, I do not understand the problem here.

{quote}
3. Mark LIR related classes/methods as deprecated – those are more likely to 
get attention right before 8.0 I think.
{quote}
Sure, this is a good idea

{quote}
5. RecoveringCoreTermWatcher – Shouldn't lastTermDoRecovery be set after 
recovery completes? If not, how do we ensure that recoveries are stacked up?
{quote}
I do not see any problem in the current implementation, after we call 
{{doRecovery}}, the recovery process will start shortly

{quote}
6. RecoveringCoreTermWatcher catches NullPointerException. Do a null check 
instead.
{quote}
Sure!

{quote}
7. RecoveryStrategy – why pingLeader? isn't it sufficient to use 
ZkStateReader.getLeaderRetry as we used to do earlier?
{quote}
Imagine this case, when there are network partition between leader and replica
* Leader increase term of replica
* RecoveringCoreTermWatcher trigger recovery process of replica, replica goes 
into recovery ( hence increase its term )
* Leader increase term of replica ( because it failed to send update to replica 
and now term of replica is equals to leader's term)
* RecoveringCoreTermWatcher trigger recovery process of replica, replica goes 
into recovery ( hence increase its term )
* ... this process will be repeated forever until the network is healed

{quote}
8. ZkCollectionTerms – if getShard and remove methods need to be synchronized 
then seems like close can interfere. Perhaps better to synchronize on the terms 
map itself.
{quote}
This is a good idea

{quote}
9. Can you explain the purpose of "new".equals(cd.getCoreProperty("lirVersion", 
"new"))) used in various places?
{quote}
That flag mostly used for testing rolling updates and can be removed in 
SOLR-11812

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-01-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325893#comment-16325893
 ] 

Shalin Shekhar Mangar commented on SOLR-11702:
--

Thanks Dat. A few comments/questions:

# DUP.setupRequest skips replicas having terms. If I understand correctly, this 
will mean that updates are no longer forwarded to replicas until they publish 
themselves in recovery? Is that right?
# CreateCollectionCmd -- throw InterruptedException directly from the method 
instead of trying to handle it here
# Mark LIR related classes/methods as deprecated -- those are more likely to 
get attention right before 8.0 I think.
# ElectionContext -- Minor typo - "this replica is registered its term" -- 
s/is/has
# RecoveringCoreTermWatcher -- Shouldn't lastTermDoRecovery be set after 
recovery completes? If not, how do we ensure that recoveries are stacked up?
# RecoveringCoreTermWatcher catches NullPointerException. Do a null check 
instead.
# RecoveryStrategy -- why pingLeader? isn't it sufficient to use 
ZkStateReader.getLeaderRetry as we used to do earlier?
# ZkCollectionTerms -- if getShard and remove methods need to be synchronized 
then seems like close can interfere. Perhaps better to synchronize on the terms 
map itself.
# Can you explain the purpose of "new".equals(cd.getCoreProperty("lirVersion", 
"new"))) used in various places?

I'm still going through the rest of the changes. I'll add some more comments 
later.

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-01-09 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318845#comment-16318845
 ] 

Erick Erickson commented on SOLR-11702:
---

OK, reporting back. My problem was totally unrelated unfortunately. Mine went 
away with upgrading Jetty.

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2018-01-01 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307487#comment-16307487
 ] 

Erick Erickson commented on SOLR-11702:
---

OK, I'm giving it a try. My test case is quite simple, set up a 1-shard, 
4-replica collection and fire a bunch of updates at it. So far this happens on 
6.3 (where the problem was first reported) _and_ on 7.x. I suspect on master 
too, but don't want to spend the time since it happens on 7x.

Anyway, the patch applied cleanly and I'm running the test now. Basic auth 
doesn't seem to be necessary. I'll report back later.

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2017-12-31 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307357#comment-16307357
 ] 

Cao Manh Dat commented on SOLR-11702:
-

The current logic is quite stable. If the test can help us find some bugs in 
current implementation that will be great!

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2017-12-31 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307273#comment-16307273
 ] 

Erick Erickson commented on SOLR-11702:
---

[~caomanhdat] As it happens I'm working on understanding why a replica going 
into LIR and I have a test setup that let's me reproduce it reasonably reliably 
(although it may take a few hours). I'm determining whether having basic auth 
enabled is necessary or not. I _believe_ I've seen this on 7x and master

The point is, when you think the patch is ready I'd be happy to give it a go in 
my test environment, although it may take me a week, let me know.



> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2017-12-31 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307221#comment-16307221
 ] 

Cao Manh Dat commented on SOLR-11702:
-

[~shalinmangar] [~markrmil...@gmail.com] I pushed all the changes to 
jira/solr-11702. Do you mind to take a look? Thanks!

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
> Attachments: SOLR-11702.patch, SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2017-12-10 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16285422#comment-16285422
 ] 

Cao Manh Dat commented on SOLR-11702:
-

[~manokovacs] Yeah, that's the idea of _term_. But the current design does not 
tell anything about DOWN replica, I postpone the fix for SOLR-7065 and 
SOLR-7034 to another issue, where we introduce a new rule like this: "only 
return success if all DOWN replicas have term less than leader's term"

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
> Attachments: SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2017-12-08 Thread Mano Kovacs (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283455#comment-16283455
 ] 

Mano Kovacs commented on SOLR-11702:


Really like this approach, [~caomanhdat]. Not just a cleaner and more robust 
approach, but I believe it could be an alternative solution for the problems 
that motivates SOLR-7065. Correct me if I am wrong, but replica could become 
leader, regardless of their previous state or the number of replicas 
participating, as their (and others) term number would explicitly say if they 
are in sync or behind. Is my assumption correct?

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
> Attachments: SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2017-12-07 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282999#comment-16282999
 ] 

Cao Manh Dat commented on SOLR-11702:
-

[~mdrob] That's right. I borrowed term's idea from Raft. All the replicas can 
update its term equals to the leader's term. Only leader can increase terms of 
other replicas.

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
> Attachments: SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11702) Redesign current LIR implementation

2017-12-07 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16282992#comment-16282992
 ] 

Mike Drob commented on SOLR-11702:
--

Ooooh, good approach. This is similar in concept to how RAFT works, I think.

One thing that is unclear from design doc (haven't looked at code yet) is who 
updated the ZK terms when replica joins recovery. Is that the result of the 
leader acknowledging the PrepRecoveryCmd?

> Redesign current LIR implementation
> ---
>
> Key: SOLR-11702
> URL: https://issues.apache.org/jira/browse/SOLR-11702
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
> Attachments: SOLR-11702.patch
>
>
> I recently looked into some problem related to racing between LIR and 
> Recovering. I would like to propose a totally new approach to solve SOLR-5495 
> problem because fixing current implementation by a bandage will lead us to 
> other problems (we can not prove the correctness of the implementation).
> Feel free to give comments/thoughts about this new scheme.
> https://docs.google.com/document/d/1dM2GKMULsS45ZMuvtztVnM2m3fdUeRYNCyJorIIisEo/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org