[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-07-11 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082512#comment-16082512
 ] 

Cao Manh Dat commented on SOLR-9835:


Thanks [~tomasflobbe]!

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Fix For: 7.0
>
> Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-05-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014895#comment-16014895
 ] 

Tomás Fernández Löbbe commented on SOLR-9835:
-

[~caomanhdat], I have a nocommit in SOLR-10233 related to a change you did as 
part of this Jira:
{noformat}
@@ -260,7 +263,7 @@ public class RecoveryStrategy extends Thread implements 
Closeable {
   UpdateRequest ureq = new UpdateRequest();
   ureq.setParams(new ModifiableSolrParams());
   ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, true);
-  ureq.getParams().set(UpdateParams.OPEN_SEARCHER, false);
+  ureq.getParams().set(UpdateParams.OPEN_SEARCHER, onlyLeaderIndexes);
   ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, true).process(
   client);
 }
{noformat}
Why do we need to open searcher in the leader when doing a commit on leader for 
recovery?

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-04-03 Thread Cassandra Targett (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954479#comment-15954479
 ] 

Cassandra Targett commented on SOLR-9835:
-

bq. will this be ported to 6x and what is the expected timeframe if so? Or will 
it just be a 7.0 feature?

In this comment to SOLR-10233: 
https://issues.apache.org/jira/browse/SOLR-10233?focusedCommentId=15901980=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15901980,
 Tomás notes that some API changes will be made in the course of developing 
that feature, and those changes will impact the work done on this issue 
already. Because of this, these changes will be kept on master for the 
time-being to avoid back-compat issues.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-04-03 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954328#comment-15954328
 ] 

Erick Erickson commented on SOLR-9835:
--

will this be ported to 6x and what is the expected timeframe if so? Or will it 
just be a 7.0 feature?


> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943368#comment-15943368
 ] 

ASF subversion and git services commented on SOLR-9835:
---

Commit cd66a5ff51b7a9ff55edaa9fb5d7df5af42707e4 in lucene-solr's branch 
refs/heads/master from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cd66a5f ]

SOLR-9835: Fixed precommit failure.


> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-27 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942950#comment-15942950
 ] 

ASF subversion and git services commented on SOLR-9835:
---

Commit d156aafd1d5cfda2d7b56e4c73c6ddee43d48e91 in lucene-solr's branch 
refs/heads/master from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d156aaf ]

SOLR-9835: TestInjection.waitForInSyncWithLeader() should rely on commit point 
of searcher


> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-19 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932053#comment-15932053
 ] 

ASF subversion and git services commented on SOLR-9835:
---

Commit 4bc75dbf235145fad5ec1001004c663e15449523 in lucene-solr's branch 
refs/heads/master from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4bc75db ]

SOLR-9835: Fix OnlyLeaderIndexesTest failure, inplace updates is not copied 
over properly


> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931066#comment-15931066
 ] 

Mark Miller commented on SOLR-9835:
---

Yeah, I see this too. It doesn't beast too terribly though. I'll attach some 
fail logs.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930253#comment-15930253
 ] 

Yonik Seeley commented on SOLR-9835:


OnlyLeaderIndexesTest just failed for me. Seems like it fails sometimes for 
jenkins too:
https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/6455/

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923899#comment-15923899
 ] 

Mark Miller commented on SOLR-9835:
---

Nevermind, some local glitch. Got it cleaned up.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923897#comment-15923897
 ] 

Mark Miller commented on SOLR-9835:
---

Looks like this broke the build? I compile errors about missing ZkController 
methods.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923854#comment-15923854
 ] 

ASF subversion and git services commented on SOLR-9835:
---

Commit 120274b451646ddd9e1de9e12a4904414f881c7c in lucene-solr's branch 
refs/heads/master from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=120274b ]

SOLR-9835: Update CHANGES.txt


> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923724#comment-15923724
 ] 

ASF subversion and git services commented on SOLR-9835:
---

Commit 7830462d4b7da3acefff6353419e71cde62d5fee in lucene-solr's branch 
refs/heads/master from [~caomanhdat]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7830462 ]

SOLR-9835: Create another replication mode for SolrCloud


> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-10 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905981#comment-15905981
 ] 

Cao Manh Dat commented on SOLR-9835:


That's right. I will fix that problem today.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905980#comment-15905980
 ] 

Tomás Fernández Löbbe commented on SOLR-9835:
-

bq. The reason for that is LIR is being kicked off when we add document 7 ( 
SOLR-9555 ). 
Yes, saw that in my testing too
bq. But I don't think the code you mentioned is an error.
{{fail("Doc1 is deleted but it's still exist")}} will throw an 
{{AssertionError}} if hit, but this code is swallowing that error

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-10 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905973#comment-15905973
 ] 

Cao Manh Dat commented on SOLR-9835:


@tomasflobbe I also see the tests failed but in this line
{code}
checkRTG(3,7, cluster.getJettySolrRunners());
{code}
The reason for that is LIR is being kicked off when we add document 7 ( 
SOLR-9555 ). But I don't think the code you mentioned is an error.


> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905479#comment-15905479
 ] 

Tomás Fernández Löbbe commented on SOLR-9835:
-

[~caomanhdat], I do see some random failures in {{OnlyLeaderIndexesTest}} when 
beasting, do you too?
Also, I noticed this code in the test
{code:java}
try {
  checkRTG(1, 1, cluster.getJettySolrRunners());
  fail("Doc1 is deleted but it's still exist");
} catch (AssertionError e) {}
{code}
which seems wrong

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-06 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898465#comment-15898465
 ] 

Cao Manh Dat commented on SOLR-9835:


[~tomasflobbe] That sounds great. I will create a branch for this ticket and 
run some jenkins tests for the patch before committing.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-03-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898427#comment-15898427
 ] 

Tomás Fernández Löbbe commented on SOLR-9835:
-

[~caomanhdat], I created SOLR-10233 with some related work

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-02-27 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885586#comment-15885586
 ] 

Shalin Shekhar Mangar commented on SOLR-9835:
-

# Dat, can you please add javadocs for all the new methods in UpdateLog?
# I was looking into SOLR-9706 and it applies to this patch too. However, the 
fix is easy because most of the corruption checks added by SOLR-6640 aren't 
necessary in this new replication mode so we can safely skip them.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : if we *do not use DQBs*, replicas will consistence 
> with master just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-02-23 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880130#comment-15880130
 ] 

Shalin Shekhar Mangar commented on SOLR-9835:
-

bq. I don't think so, switchToNewTlog() is based on commit version at lucene 
index level (commit.getUserData().get(SolrIndexWriter.COMMIT_COMMAND_VERSION)), 
so we will always roll over updates in right way.

I understand that we use the commit version of the latest commit but the 
copyOverOldUpdates method only copies updates from the last tlog. If a hard 
commit happened between the time that we started replication and finished 
replication, the last tlog will not have all updates that should be copied 
over. Example:
* Leader has the following tlog with these versions:
** tlog0: 1,2,3,4,commit
** tlog1: 5,6
* Replica has tlogs:
** tlog0: 1,2,3,4,5
** replication from leader starts
** user calls explicit commit on replica
** tlog1: 6
** replication completes and we call switchToNewTLog which copies over all 
versions greater than 4 from the last tlog
** tlog2: 6

In this case, the update with version 5 is lost and will no longer be available 
in case this replica becomes leader.

bq. CDCR is very complex, I don't think we should support CDCR in this new 
replication mode now.

Okay, let's create a follow-up issue for this. CDCR is important enough that we 
must support it eventually. But I think backup/restore must be supported and 
tested.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-02-22 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879841#comment-15879841
 ] 

Ishan Chattopadhyaya commented on SOLR-9835:


Also, lets add a simple test to ensure that in-place updates work on a replica:
# Index few documents to leader, including a full document with id=0.
# Commit
# Index few more documents
# Commit
# Update id=0 document in-place
# Commit
# Assert that the document with id=0 has the same updated value in the leader 
and the replica.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-02-22 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879721#comment-15879721
 ] 

Cao Manh Dat commented on SOLR-9835:


Thanks [~shalinmangar]!
bq. LeaderInitiatedRecoveryThread – What is the reason behind adding 
SocketTimeoutException in the list of communication errors on which no more 
retries are made?
This change come from a jepsen test. This bug is also affect current mode. I 
created another issue for this bug SOLR-9913. We can skip this change for this 
ticket.
bq.ZkController.register method – The condition for !isLeader && 
onlyLeaderIndexes can be replaced by the isReplicaInOnlyLeaderIndexes variable.
Yeah, that's right
bq. Since there is no log replay on startup on replicas anymore, what if the 
replica is killed (which keeps its state as 'active' in ZK) and then the 
cluster is restarted and the replica becomes leader candidate? If we do not 
replay the discarded log then it could lead to data loss?
Very good catch, I try to resolve this problem.
bq. UpdateLog – Can you please add javadocs outlining the motivation/purpose of 
the new methods such as copyOverBufferingUpdates and switchToNewTlog e.g. why 
does switchToNewTlog require copying over some updates from the old tlog?
Sure!
bq.It seems that any commits that might be triggered explicitly by the user can 
interfere with the index replication. Suppose that a replication is in progress 
and a user explicitly calls commit which is distributed to all replicas, in 
such a case the tlogs will be rolled over and then when the ReplicateFromLeader 
calls switchToNewTlog(), the previous tlog may not have all the updates that 
should have been copied over. We should have a way to either disable explicit 
commits or protect against them on the replicas.
I don't think so, switchToNewTlog() is based on commit version at lucene index 
level ({{commit.getUserData().get(SolrIndexWriter.COMMIT_COMMAND_VERSION)}}), 
so we will always roll over updates in right way.
bq.UpdateLog – why does copyOverBufferUpdates block updates while calling 
switchToNewTlog but ReplicateFromLeader doesn't? How are they both safe?
Good catch I think we should blockUpdates in switchToNewTlog as well.
bq.Can we add tests for testing CDCR and backup/restore with this new 
replication scheme?
CDCR is very complex, I don't think we should support CDCR in this new 
replication mode now.
bq. ZkController.startReplicationFromLeader – Using a ConcurrentHashMap is not 
enough to prevent two simultaneous replications from happening concurrently. 
You should use the atomic putIfAbsent to put a core to the map before starting 
replication.
Yeah, that's sounds a good idea.
bq.Aren't some of the guarantees of real-time-get are relaxed in this new mode 
especially around delete-by-queries which no longer apply on replicas? Can you 
please document them as a comment on the issue that we can transfer to the ref 
guide in future?
I will update the ticket description now. Basically RTG is not consistency for 
DBQs

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a 

[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-02-22 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879022#comment-15879022
 ] 

Shalin Shekhar Mangar commented on SOLR-9835:
-

Thanks Dat. Sorry it took me a while to finish reviewing. A few 
questions/comments

# LeaderInitiatedRecoveryThread -- What is the reason behind adding 
SocketTimeoutException in the list of communication errors on which no more 
retries are made?
# ZkController.register method -- The condition for !isLeader && 
onlyLeaderIndexes can be replaced by the isReplicaInOnlyLeaderIndexes variable.
# Since there is no log replay on startup on replicas anymore, what if the 
replica is killed (which keeps its state as 'active' in ZK) and then the 
cluster is restarted and the replica becomes leader candidate? If we do not 
replay the discarded log then it could lead to data loss?
# UpdateLog -- Can you please add javadocs outlining the motivation/purpose of 
the new methods such as copyOverBufferingUpdates and switchToNewTlog e.g. why 
does switchToNewTlog require copying over some updates from the old tlog?
# It seems that any commits that might be triggered explicitly by the user can 
interfere with the index replication. Suppose that a replication is in progress 
and a user explicitly calls commit which is distributed to all replicas, in 
such a case the tlogs will be rolled over and then when the ReplicateFromLeader 
calls switchToNewTlog(), the previous tlog may not have all the updates that 
should have been copied over. We should have a way to either disable explicit 
commits or protect against them on the replicas.
# UpdateLog -- why does copyOverBufferUpdates block updates while calling 
switchToNewTlog but ReplicateFromLeader doesn't? How are they both safe?
# Can we add tests for testing CDCR and backup/restore with this new 
replication scheme?
# ZkController.startReplicationFromLeader -- Using a ConcurrentHashMap is not 
enough to prevent two simultaneous replications from happening concurrently. 
You should use the atomic putIfAbsent to put a core to the map before starting 
replication.
# Aren't some of the guarantees of real-time-get are relaxed in this new mode 
especially around delete-by-queries which no longer apply on replicas? Can you 
please document them as a comment on the issue that we can transfer to the ref 
guide in future?

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a 
> distributed systems :
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This 
> happen when leader finish a commit and slave is fetching for latest segment. 
> This period can at most {{pollInterval + time to fetch latest segment}}.
> - Consistency for RTG : just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, 
> client must wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-13 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822697#comment-15822697
 ] 

Shalin Shekhar Mangar commented on SOLR-9835:
-

Thanks Dat. I have started reviewing your patch.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-10 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816959#comment-15816959
 ] 

Cao Manh Dat commented on SOLR-9835:


I created a new ReplicationHandler to do the replication process, so it will 
not use the same instance from different threads.
{code}
replicationProcess = new ReplicationHandler();
replicationProcess.init(replicationConfig);
replicationProcess.inform(core);
{code}

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816951#comment-15816951
 ] 

Tomás Fernández Löbbe commented on SOLR-9835:
-

bq. I don't see any reason why we need masterUrl to be volatile? IndexFetcher 
instance is not being shared across threads.
Isn't it being used by the ReplicationHandler? Different requests would use the 
same instance from different threads, right? 

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-10 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816918#comment-15816918
 ] 

Cao Manh Dat commented on SOLR-9835:


Thanks a lot for your comments!

bq. Maybe add a method to DocCollection like {{isOnlyLeaderIndexes()}}
As you know, we won't use {{isOnlyLeaderIndexes()}} in the future ( it won't 
have enough information ), so I just do not want to add a public method on 
DocCollection ( solrj ) and remove it in the future.
bq. Does this need to be synchronized?
Yeah, I think we should.
bq. should masterUrl now be volatile?
I don't see any reason why we need masterUrl to be volatile? IndexFetcher 
instance is not being shared across threads.
bq. In many cases in the tests the leader will change before the replication 
happens, right? Does it make sense to discover the leader inside of the loop? 
Also, is there a way to remove that Thread.sleep(1000) at the beginning? This 
code will be called very frequently in tests.
That's a good idea.



> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816584#comment-15816584
 ] 

Tomás Fernández Löbbe commented on SOLR-9835:
-

As I said, I understand the reason why it was done this way, but the code right 
now is only using the count to identify the mode. In the future, the count 
alone won't be enough in any of those sections to determine what to do, you'll 
have to identify the mode first and then take some more actions (e.g. Am I in 
the list of replicas that index or not?). 

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-10 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816564#comment-15816564
 ] 

Noble Paul commented on SOLR-9835:
--

It will have a proper count in the future. A mixed mode has to be there 
eventually. Enum can't accommodate that

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-10 Thread Yago Riveiro (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816535#comment-15816535
 ] 

Yago Riveiro commented on SOLR-9835:


bq. how about getLiveReplicasCount() ?

If I'm reading the code and found a method called getLiveReplicasCount(), I 
expected that return the number of live replicas for a shard, and if the only 
value that can return is 1 for onlyLeaderIndexes and -1 for the rest is not a 
good name.

Something like: 
{{zkStateReader.getClusterState().getCollection(collection).getReplicationMode()}}
 that returns an enum(ONLY_LEADER_INDEXES, ALL_REPLICAS_INDEXES) or something 
like that.



> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816283#comment-15816283
 ] 

Tomás Fernández Löbbe commented on SOLR-9835:
-

I'm just saying, we use the count to detect the mode, and everywhere we are 
getting the count and inferring the mode depending on the count (count==1 -> 
onlyLeaderIndexes, count==-1 -> default mode), Instead of that, lets put that 
logic inside of the DocCollection and ask it for the mode. Anyway, this is not 
too important, just a suggestion. 

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-10 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816241#comment-15816241
 ] 

Noble Paul commented on SOLR-9835:
--

bq.Maybe add a method to DocCollection like isOnlyLeaderIndexes() (or choose 
other name)? 

how about {{getLiveReplicasCount()}} ?

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815930#comment-15815930
 ] 

Tomás Fernández Löbbe commented on SOLR-9835:
-

Great idea! just took a quick look at the patch to understand this better. I 
have a couple of questions/comments, I know this is work in progress, so feel 
free to disregard any of my comments if you are working on them:

{code}
onlyLeaderIndexes = 
zkStateReader.getClusterState().getCollection(collection).getLiveReplicas() == 
1;
{code}
Maybe add a method to DocCollection like {{isOnlyLeaderIndexes()}} (or choose 
other name)? I understand why you did this, but this code is repeated many 
times, maybe can be improved for now.

{code}
private Map replicateFromLeaders = new HashMap<>();
{code}
Does this need to be synchronized?

{code}
-  private final String masterUrl;
+  private String masterUrl;
{code}
should {{masterUrl}} now be volatile?

{code}
+  public static boolean waitForInSyncWithLeader(SolrCore core, Replica 
leaderReplica) throws InterruptedException {
+if (waitForReplicasInSync == null) return true;
+
+Pair pair = parseValue(waitForReplicasInSync);
+boolean enabled = pair.first();
+if (!enabled) return true;
+
+Thread.sleep(1000);
+HttpSolrClient leaderClient = new 
HttpSolrClient.Builder(leaderReplica.getCoreUrl()).build();
+long leaderVersion = -1;
+String localVersion = null;
+try {
+  for (int i = 0; i < pair.second(); i++) {
+if (core.isClosed()) return true;
+ModifiableSolrParams params = new ModifiableSolrParams();
+params.set(CommonParams.QT, ReplicationHandler.PATH);
+params.set(COMMAND, CMD_DETAILS);
+
+NamedList response = leaderClient.request(new 
QueryRequest(params));
+leaderVersion = (long) 
((NamedList)response.get("details")).get("indexVersion");
+
+localVersion = 
core.getDeletionPolicy().getLatestCommit().getUserData().get(SolrIndexWriter.COMMIT_TIME_MSEC_KEY);
+if (localVersion == null && leaderVersion == 0) return true;
+
+if (localVersion != null && Long.parseLong(localVersion) == 
leaderVersion) {
+  return true;
+} else {
+  Thread.sleep(500);
+}
+  }
+
+} catch (Exception e) {
+  log.error("Exception when wait for replicas in sync with master");
+} finally {
+  try {
+if (leaderClient != null) leaderClient.close();
+  } catch (IOException e) {
+e.printStackTrace();
+  }
+}
+
+return false;
+  }

{code}
In many cases in the tests the leader will change before the replication 
happens, right? Does it make sense to discover the leader inside of the loop? 
Also, is there a way to remove that Thread.sleep(1000) at the beginning? This 
code will be called very frequently in tests.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an 
> additional parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796982#comment-15796982
 ] 

Noble Paul commented on SOLR-9835:
--

I would like this to be added to the description of the ticket

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-03 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796970#comment-15796970
 ] 

Cao Manh Dat commented on SOLR-9835:


Right now, to create a collection in {{onlyLeaderIndexesMode}} we must use 
"Create Collection API" with additional parameter : onlyLeaderIndexes=true.

In further ticket, we can support switch between mode dynamically.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2017-01-03 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795916#comment-15795916
 ] 

Noble Paul commented on SOLR-9835:
--

What is the public interface for this feature? How do I enable/disable this 
feature? Is there a command?

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, 
> SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-30 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788811#comment-15788811
 ] 

Cao Manh Dat commented on SOLR-9835:


Currently, PeerSync sync on tlog, so it is not a problem if the indexes on all 
replicas are the same. So Leader Election will be the same as today except that 
when new leader sync success with other replicas, it must replay its tlog to 
make all necessary changes to its index.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-30 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788716#comment-15788716
 ] 

Pushkar Raste commented on SOLR-9835:
-

How are we handling leader failure here. if replicas are some what out of sync 
with the original leader, how would we elect a new leader. 

When the leader fails and a new leader gets elected, the  new leader asks all 
the replicas to sync with the new leader. My understanding is, "since we are 
replicating index by fetching segments from leader, most of the segments on all 
the replicas should look the same, hence all the replicas will not go into full 
index copying". Is that correct ?

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-26 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15778369#comment-15778369
 ] 

Cao Manh Dat commented on SOLR-9835:


[~yo...@apache.org][~ysee...@gmail.com] : Here are scenario for the problem 
that I encountered today
- an replica ( let's call it rep1 ) is on recovering mode -> its ulog will be 
on buffering state.
- rep1 receives an update ( contain doc1 ), rep1 will write the update to its 
tlog without updating ulog.map for real-time-get
- rep1 replay buffered updates, rep1 will write doc1 to its index, and update 
ulog.map for real-time-get ( but in this case, ulog.map will point doc1 -> 
position = -1 because we don't write updateCommand with REPLAY flag to tlog )
- client call real-time-get for doc1
- rep1 will always open a real-time-searcher for this case

I just wonder why we do that currently? Why don't we just write the update to 
tlog and ulog.map so we don't have to open a new real-time-searcher for this 
case?


> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-14 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748538#comment-15748538
 ] 

Ishan Chattopadhyaya commented on SOLR-9835:


{quote}
Collection property to switch replication scheme

A new property called “onlyLeaderIndexes” will be added to the collection. Any 
collection that has this property set to true will only index to the elected 
leader and the rest of the replicas will only fetch index segments from the 
leader as described above in the document. This property must be set during 
collection creation. It will default to “false”. Existing collections cannot be 
switched to using the new replication scheme. Future work can attempt to fix 
that.
{quote}

Instead of a hardcoded property name, i.e. onlyLeaderIndexes=true/false, I 
suggest that we have something like "replicationScheme=onlyLeaderIndexes" (or 
some other value for the current replication scheme). That would keep the door 
open for us to add any other replication scheme in future.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-12 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15741742#comment-15741742
 ] 

Shalin Shekhar Mangar commented on SOLR-9835:
-

bq. Instead of periodic polling, can leader upon receiving and processing a 
commit command, send a notification to replicas asking them to sync up?

We thought about that but we ended up favoring the polling model because the 
poll request is cheap and we can use existing replication handler code. We will 
keep the poll interval at half the auto commit time for a start. For this 
issue, we want to keep the changes limited as much as possible but we can 
consider a notification feature in later enhancements.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-09 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736051#comment-15736051
 ] 

Pushkar Raste commented on SOLR-9835:
-

Instead of periodic polling, can leader upon receiving and processing a commit 
command, send a notification to replicas asking them to sync up?

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-09 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735547#comment-15735547
 ] 

Cao Manh Dat commented on SOLR-9835:


Here are the detail design that I and Shalin have been working for recently

h2. Overview
The current replication mechanism of SolrCloud is called state machine, which 
replicas start in same initial state and for each input, the input is 
distributed across replicas so all replicas will end up with same next state.

But this type of replication has some drawbacks :
- The indexing and commits have to run on all replicas
- Slow recovery, because if replica miss more than N updates on its down time, 
the replica has to (usually) download the entire index from its leader.

So we introduce another replication for SolrCloud called state transfer, which 
acts like master/slave replication. Basically:
- Leader distribute the update to other replicas, but only the leader applies 
the update to IndexWriter; other replicas just store the update to UpdateLog 
(act like replication)
- Replicas frequently polling the latest segments from leader.

Pros:
- Lightweight for indexing, because only leader are running the updates and 
commits
- Very fast recovery. If a replica fails, it just has to download newest 
segments, instead of re-downloading entire index.

Cons :
- Leader can become a hotspot when there are many replicas (we can optimize 
later)
- Longer turnaround time compared to current NRT replication

h2. Commit
When we commit, we write the version of update command into the commit data in 
addition to the timestamp that is written today.

h2. Update
# When a replica receive an update request it will forward the update request 
to 
#* Correct leader ( in case of add document, delete by id )
#* All leaders ( in case of delete by query, commit )
# Leader assigns version to update as it does today, writes to update log and 
applies the update to IndexWriter
# Leader distributes the update to its replicas
# When replica receives an update, it writes the update to update log and 
return successfully
# Leader return successful

h2. Periodic Segment Replication
Replica will poll the leader for the latest commit point generation and version 
and compare against the latest commit point locally. If it is the same, then 
replication is successful and nothing needs to be done. If not, then:
# Replica downloads all files since the local commit point from the leader
# Installs into the index, synchronize on the update log, close the old tlog, 
create a new one and copy over all records from the old tlog which have version 
greater than the latest commit point’s version. This is to ensure that any 
update which hasn’t made it to the index is preserved in the current tlog. This 
might lead to duplication of updates in the previous and the current tlogs but 
that is okay.
# Replica re-opens the searcher

For example:
Leader has the following tlogs
- TLog1 has versions 1,2,3,commit
- TLog2 has versions 4,5

Before segment replication, the replica have the following tlogs:
- TLog1 - 1,2,3,4,commit

After segment replication, the replica will have the following tlogs:
- TLog1 - 1,2,3,4,commit
- TLog2 - 4,5


During this process, the replica does not need to be put into ‘recovery’ state. 
It continues to be ‘active’ and participate in indexing and searches.

h2. Replica recovery
# Replica puts itself in ‘recovery’ state.
# Replica compares its latest commit point against the leader
# If the index is same as leader, then it performs ‘peersync’ with the leader 
but only writes the peersync updates to its update log
# If the index is not the same as leader or if peersync fails, the replica:
#* Puts its update log in “buffering” state
#* Issues a hard commit to the leader
#* Copies the index commit points from the leader that do not exist locally
#* Publishes itself as ‘active’
# The “buffering” state in the above steps ensures that any updates that 
haven’t been committed to the leader are also present/replicated to the 
replica’s current transaction log

With respect to the current recovery strategy in Solr, we need only one change 
which is to check the index version of leader vs replica before we attempt a 
peersync.

h2. Leader Election
When a leader dies, a candidate replica will become a new leader. The leader 
election algorithm remains mostly the same except that after the “sync” step, 
the leader candidate will replay its transaction log before publishing itself 
as the leader.

h2. Collection property to switch replication scheme
A new property called “onlyLeaderIndexes” will be added to the collection. Any 
collection that has this property set to true will only index to the elected 
leader and the rest of the replicas will only fetch index segments from the 
leader as described above in the document. This property must be set during 
collection creation. It will default to “false”. Existing 

[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-08 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732576#comment-15732576
 ] 

Shalin Shekhar Mangar commented on SOLR-9835:
-

Hold your questions for a bit. Dat and I are working on a design and we will 
try to answer your questions as much as we can. We will post it in a couple of 
days.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-08 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732554#comment-15732554
 ] 

Ishan Chattopadhyaya commented on SOLR-9835:


I'm curious about the handling of searcher reopens (which would create a new 
segment, afaict). Such a searcher reopen can happen if an RTG request uses 
filters. 
{code}
 ulog.openRealtimeSearcher();  // force open a new realtime 
searcher
{code}

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732527#comment-15732527
 ] 

Mark Miller commented on SOLR-9835:
---

Soft commits are for near realtime and doing master->slave replication won't 
benefit from that. Effectively, soft commits won't be useful.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-08 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732174#comment-15732174
 ] 

Pushkar Raste commented on SOLR-9835:
-

I am curious to know how soft commits (in memory segments) would be handled.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-07 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729828#comment-15729828
 ] 

Erick Erickson commented on SOLR-9835:
--

I believe that if we do implement this, it should make SOLR-9706 obsolete. The 
proposed replication mode sounds like it would make manually issuing a 
fetchIndex unnecessary since the use-case for SOLR-9706 I'm familiar with is 
exactly only indexing to the leader and having the followers use the regular 
polling mechanism which doesn't block during replication.

At least we should verify whether SOLR-9706 is made obsolete by this proposal.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org