[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082512#comment-16082512 ] Cao Manh Dat commented on SOLR-9835: Thanks [~tomasflobbe]! > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Fix For: 7.0 > > Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014895#comment-16014895 ] Tomás Fernández Löbbe commented on SOLR-9835: - [~caomanhdat], I have a nocommit in SOLR-10233 related to a change you did as part of this Jira: {noformat} @@ -260,7 +263,7 @@ public class RecoveryStrategy extends Thread implements Closeable { UpdateRequest ureq = new UpdateRequest(); ureq.setParams(new ModifiableSolrParams()); ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, true); - ureq.getParams().set(UpdateParams.OPEN_SEARCHER, false); + ureq.getParams().set(UpdateParams.OPEN_SEARCHER, onlyLeaderIndexes); ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, true).process( client); } {noformat} Why do we need to open searcher in the leader when doing a commit on leader for recovery? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954479#comment-15954479 ] Cassandra Targett commented on SOLR-9835: - bq. will this be ported to 6x and what is the expected timeframe if so? Or will it just be a 7.0 feature? In this comment to SOLR-10233: https://issues.apache.org/jira/browse/SOLR-10233?focusedCommentId=15901980=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15901980, Tomás notes that some API changes will be made in the course of developing that feature, and those changes will impact the work done on this issue already. Because of this, these changes will be kept on master for the time-being to avoid back-compat issues. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954328#comment-15954328 ] Erick Erickson commented on SOLR-9835: -- will this be ported to 6x and what is the expected timeframe if so? Or will it just be a 7.0 feature? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943368#comment-15943368 ] ASF subversion and git services commented on SOLR-9835: --- Commit cd66a5ff51b7a9ff55edaa9fb5d7df5af42707e4 in lucene-solr's branch refs/heads/master from [~shalinmangar] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cd66a5f ] SOLR-9835: Fixed precommit failure. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942950#comment-15942950 ] ASF subversion and git services commented on SOLR-9835: --- Commit d156aafd1d5cfda2d7b56e4c73c6ddee43d48e91 in lucene-solr's branch refs/heads/master from [~caomanhdat] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d156aaf ] SOLR-9835: TestInjection.waitForInSyncWithLeader() should rely on commit point of searcher > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932053#comment-15932053 ] ASF subversion and git services commented on SOLR-9835: --- Commit 4bc75dbf235145fad5ec1001004c663e15449523 in lucene-solr's branch refs/heads/master from [~caomanhdat] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4bc75db ] SOLR-9835: Fix OnlyLeaderIndexesTest failure, inplace updates is not copied over properly > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931066#comment-15931066 ] Mark Miller commented on SOLR-9835: --- Yeah, I see this too. It doesn't beast too terribly though. I'll attach some fail logs. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: OnlyLeaderIndexesTest-fail-1.zip, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930253#comment-15930253 ] Yonik Seeley commented on SOLR-9835: OnlyLeaderIndexesTest just failed for me. Seems like it fails sometimes for jenkins too: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/6455/ > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923899#comment-15923899 ] Mark Miller commented on SOLR-9835: --- Nevermind, some local glitch. Got it cleaned up. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923897#comment-15923897 ] Mark Miller commented on SOLR-9835: --- Looks like this broke the build? I compile errors about missing ZkController methods. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923854#comment-15923854 ] ASF subversion and git services commented on SOLR-9835: --- Commit 120274b451646ddd9e1de9e12a4904414f881c7c in lucene-solr's branch refs/heads/master from [~caomanhdat] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=120274b ] SOLR-9835: Update CHANGES.txt > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923724#comment-15923724 ] ASF subversion and git services commented on SOLR-9835: --- Commit 7830462d4b7da3acefff6353419e71cde62d5fee in lucene-solr's branch refs/heads/master from [~caomanhdat] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7830462 ] SOLR-9835: Create another replication mode for SolrCloud > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905981#comment-15905981 ] Cao Manh Dat commented on SOLR-9835: That's right. I will fix that problem today. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905980#comment-15905980 ] Tomás Fernández Löbbe commented on SOLR-9835: - bq. The reason for that is LIR is being kicked off when we add document 7 ( SOLR-9555 ). Yes, saw that in my testing too bq. But I don't think the code you mentioned is an error. {{fail("Doc1 is deleted but it's still exist")}} will throw an {{AssertionError}} if hit, but this code is swallowing that error > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905973#comment-15905973 ] Cao Manh Dat commented on SOLR-9835: @tomasflobbe I also see the tests failed but in this line {code} checkRTG(3,7, cluster.getJettySolrRunners()); {code} The reason for that is LIR is being kicked off when we add document 7 ( SOLR-9555 ). But I don't think the code you mentioned is an error. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905479#comment-15905479 ] Tomás Fernández Löbbe commented on SOLR-9835: - [~caomanhdat], I do see some random failures in {{OnlyLeaderIndexesTest}} when beasting, do you too? Also, I noticed this code in the test {code:java} try { checkRTG(1, 1, cluster.getJettySolrRunners()); fail("Doc1 is deleted but it's still exist"); } catch (AssertionError e) {} {code} which seems wrong > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898465#comment-15898465 ] Cao Manh Dat commented on SOLR-9835: [~tomasflobbe] That sounds great. I will create a branch for this ticket and run some jenkins tests for the patch before committing. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898427#comment-15898427 ] Tomás Fernández Löbbe commented on SOLR-9835: - [~caomanhdat], I created SOLR-10233 with some related work > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885586#comment-15885586 ] Shalin Shekhar Mangar commented on SOLR-9835: - # Dat, can you please add javadocs for all the new methods in UpdateLog? # I was looking into SOLR-9706 and it applies to this patch too. However, the fix is easy because most of the corruption checks added by SOLR-6640 aren't necessary in this new replication mode so we can safely skip them. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : if we *do not use DQBs*, replicas will consistence > with master just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880130#comment-15880130 ] Shalin Shekhar Mangar commented on SOLR-9835: - bq. I don't think so, switchToNewTlog() is based on commit version at lucene index level (commit.getUserData().get(SolrIndexWriter.COMMIT_COMMAND_VERSION)), so we will always roll over updates in right way. I understand that we use the commit version of the latest commit but the copyOverOldUpdates method only copies updates from the last tlog. If a hard commit happened between the time that we started replication and finished replication, the last tlog will not have all updates that should be copied over. Example: * Leader has the following tlog with these versions: ** tlog0: 1,2,3,4,commit ** tlog1: 5,6 * Replica has tlogs: ** tlog0: 1,2,3,4,5 ** replication from leader starts ** user calls explicit commit on replica ** tlog1: 6 ** replication completes and we call switchToNewTLog which copies over all versions greater than 4 from the last tlog ** tlog2: 6 In this case, the update with version 5 is lost and will no longer be available in case this replica becomes leader. bq. CDCR is very complex, I don't think we should support CDCR in this new replication mode now. Okay, let's create a follow-up issue for this. CDCR is important enough that we must support it eventually. But I think backup/restore must be supported and tested. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879841#comment-15879841 ] Ishan Chattopadhyaya commented on SOLR-9835: Also, lets add a simple test to ensure that in-place updates work on a replica: # Index few documents to leader, including a full document with id=0. # Commit # Index few more documents # Commit # Update id=0 document in-place # Commit # Assert that the document with id=0 has the same updated value in the leader and the replica. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879721#comment-15879721 ] Cao Manh Dat commented on SOLR-9835: Thanks [~shalinmangar]! bq. LeaderInitiatedRecoveryThread – What is the reason behind adding SocketTimeoutException in the list of communication errors on which no more retries are made? This change come from a jepsen test. This bug is also affect current mode. I created another issue for this bug SOLR-9913. We can skip this change for this ticket. bq.ZkController.register method – The condition for !isLeader && onlyLeaderIndexes can be replaced by the isReplicaInOnlyLeaderIndexes variable. Yeah, that's right bq. Since there is no log replay on startup on replicas anymore, what if the replica is killed (which keeps its state as 'active' in ZK) and then the cluster is restarted and the replica becomes leader candidate? If we do not replay the discarded log then it could lead to data loss? Very good catch, I try to resolve this problem. bq. UpdateLog – Can you please add javadocs outlining the motivation/purpose of the new methods such as copyOverBufferingUpdates and switchToNewTlog e.g. why does switchToNewTlog require copying over some updates from the old tlog? Sure! bq.It seems that any commits that might be triggered explicitly by the user can interfere with the index replication. Suppose that a replication is in progress and a user explicitly calls commit which is distributed to all replicas, in such a case the tlogs will be rolled over and then when the ReplicateFromLeader calls switchToNewTlog(), the previous tlog may not have all the updates that should have been copied over. We should have a way to either disable explicit commits or protect against them on the replicas. I don't think so, switchToNewTlog() is based on commit version at lucene index level ({{commit.getUserData().get(SolrIndexWriter.COMMIT_COMMAND_VERSION)}}), so we will always roll over updates in right way. bq.UpdateLog – why does copyOverBufferUpdates block updates while calling switchToNewTlog but ReplicateFromLeader doesn't? How are they both safe? Good catch I think we should blockUpdates in switchToNewTlog as well. bq.Can we add tests for testing CDCR and backup/restore with this new replication scheme? CDCR is very complex, I don't think we should support CDCR in this new replication mode now. bq. ZkController.startReplicationFromLeader – Using a ConcurrentHashMap is not enough to prevent two simultaneous replications from happening concurrently. You should use the atomic putIfAbsent to put a core to the map before starting replication. Yeah, that's sounds a good idea. bq.Aren't some of the guarantees of real-time-get are relaxed in this new mode especially around delete-by-queries which no longer apply on replicas? Can you please document them as a comment on the issue that we can transfer to the ref guide in future? I will update the ticket description now. Basically RTG is not consistency for DBQs > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879022#comment-15879022 ] Shalin Shekhar Mangar commented on SOLR-9835: - Thanks Dat. Sorry it took me a while to finish reviewing. A few questions/comments # LeaderInitiatedRecoveryThread -- What is the reason behind adding SocketTimeoutException in the list of communication errors on which no more retries are made? # ZkController.register method -- The condition for !isLeader && onlyLeaderIndexes can be replaced by the isReplicaInOnlyLeaderIndexes variable. # Since there is no log replay on startup on replicas anymore, what if the replica is killed (which keeps its state as 'active' in ZK) and then the cluster is restarted and the replica becomes leader candidate? If we do not replay the discarded log then it could lead to data loss? # UpdateLog -- Can you please add javadocs outlining the motivation/purpose of the new methods such as copyOverBufferingUpdates and switchToNewTlog e.g. why does switchToNewTlog require copying over some updates from the old tlog? # It seems that any commits that might be triggered explicitly by the user can interfere with the index replication. Suppose that a replication is in progress and a user explicitly calls commit which is distributed to all replicas, in such a case the tlogs will be rolled over and then when the ReplicateFromLeader calls switchToNewTlog(), the previous tlog may not have all the updates that should have been copied over. We should have a way to either disable explicit commits or protect against them on the replicas. # UpdateLog -- why does copyOverBufferUpdates block updates while calling switchToNewTlog but ReplicateFromLeader doesn't? How are they both safe? # Can we add tests for testing CDCR and backup/restore with this new replication scheme? # ZkController.startReplicationFromLeader -- Using a ConcurrentHashMap is not enough to prevent two simultaneous replications from happening concurrently. You should use the atomic putIfAbsent to put a core to the map before starting replication. # Aren't some of the guarantees of real-time-get are relaxed in this new mode especially around delete-by-queries which no longer apply on replicas? Can you please document them as a comment on the issue that we can transfer to the ref guide in future? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > On CAP point of view, this ticket will trying to promise to end users a > distributed systems : > - Partition tolerance > - Weak Consistency for normal query : clusters can serve stale data. This > happen when leader finish a commit and slave is fetching for latest segment. > This period can at most {{pollInterval + time to fetch latest segment}}. > - Consistency for RTG : just like original SolrCloud mode > - Weak Availability : just like original SolrCloud mode. If a leader down, > client must wait until new leader being elected. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822697#comment-15822697 ] Shalin Shekhar Mangar commented on SOLR-9835: - Thanks Dat. I have started reviewing your patch. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816959#comment-15816959 ] Cao Manh Dat commented on SOLR-9835: I created a new ReplicationHandler to do the replication process, so it will not use the same instance from different threads. {code} replicationProcess = new ReplicationHandler(); replicationProcess.init(replicationConfig); replicationProcess.inform(core); {code} > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816951#comment-15816951 ] Tomás Fernández Löbbe commented on SOLR-9835: - bq. I don't see any reason why we need masterUrl to be volatile? IndexFetcher instance is not being shared across threads. Isn't it being used by the ReplicationHandler? Different requests would use the same instance from different threads, right? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816918#comment-15816918 ] Cao Manh Dat commented on SOLR-9835: Thanks a lot for your comments! bq. Maybe add a method to DocCollection like {{isOnlyLeaderIndexes()}} As you know, we won't use {{isOnlyLeaderIndexes()}} in the future ( it won't have enough information ), so I just do not want to add a public method on DocCollection ( solrj ) and remove it in the future. bq. Does this need to be synchronized? Yeah, I think we should. bq. should masterUrl now be volatile? I don't see any reason why we need masterUrl to be volatile? IndexFetcher instance is not being shared across threads. bq. In many cases in the tests the leader will change before the replication happens, right? Does it make sense to discover the leader inside of the loop? Also, is there a way to remove that Thread.sleep(1000) at the beginning? This code will be called very frequently in tests. That's a good idea. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816584#comment-15816584 ] Tomás Fernández Löbbe commented on SOLR-9835: - As I said, I understand the reason why it was done this way, but the code right now is only using the count to identify the mode. In the future, the count alone won't be enough in any of those sections to determine what to do, you'll have to identify the mode first and then take some more actions (e.g. Am I in the list of replicas that index or not?). > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816564#comment-15816564 ] Noble Paul commented on SOLR-9835: -- It will have a proper count in the future. A mixed mode has to be there eventually. Enum can't accommodate that > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816535#comment-15816535 ] Yago Riveiro commented on SOLR-9835: bq. how about getLiveReplicasCount() ? If I'm reading the code and found a method called getLiveReplicasCount(), I expected that return the number of live replicas for a shard, and if the only value that can return is 1 for onlyLeaderIndexes and -1 for the rest is not a good name. Something like: {{zkStateReader.getClusterState().getCollection(collection).getReplicationMode()}} that returns an enum(ONLY_LEADER_INDEXES, ALL_REPLICAS_INDEXES) or something like that. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816283#comment-15816283 ] Tomás Fernández Löbbe commented on SOLR-9835: - I'm just saying, we use the count to detect the mode, and everywhere we are getting the count and inferring the mode depending on the count (count==1 -> onlyLeaderIndexes, count==-1 -> default mode), Instead of that, lets put that logic inside of the DocCollection and ask it for the mode. Anyway, this is not too important, just a suggestion. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816241#comment-15816241 ] Noble Paul commented on SOLR-9835: -- bq.Maybe add a method to DocCollection like isOnlyLeaderIndexes() (or choose other name)? how about {{getLiveReplicasCount()}} ? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815930#comment-15815930 ] Tomás Fernández Löbbe commented on SOLR-9835: - Great idea! just took a quick look at the patch to understand this better. I have a couple of questions/comments, I know this is work in progress, so feel free to disregard any of my comments if you are working on them: {code} onlyLeaderIndexes = zkStateReader.getClusterState().getCollection(collection).getLiveReplicas() == 1; {code} Maybe add a method to DocCollection like {{isOnlyLeaderIndexes()}} (or choose other name)? I understand why you did this, but this code is repeated many times, maybe can be improved for now. {code} private MapreplicateFromLeaders = new HashMap<>(); {code} Does this need to be synchronized? {code} - private final String masterUrl; + private String masterUrl; {code} should {{masterUrl}} now be volatile? {code} + public static boolean waitForInSyncWithLeader(SolrCore core, Replica leaderReplica) throws InterruptedException { +if (waitForReplicasInSync == null) return true; + +Pair pair = parseValue(waitForReplicasInSync); +boolean enabled = pair.first(); +if (!enabled) return true; + +Thread.sleep(1000); +HttpSolrClient leaderClient = new HttpSolrClient.Builder(leaderReplica.getCoreUrl()).build(); +long leaderVersion = -1; +String localVersion = null; +try { + for (int i = 0; i < pair.second(); i++) { +if (core.isClosed()) return true; +ModifiableSolrParams params = new ModifiableSolrParams(); +params.set(CommonParams.QT, ReplicationHandler.PATH); +params.set(COMMAND, CMD_DETAILS); + +NamedList response = leaderClient.request(new QueryRequest(params)); +leaderVersion = (long) ((NamedList)response.get("details")).get("indexVersion"); + +localVersion = core.getDeletionPolicy().getLatestCommit().getUserData().get(SolrIndexWriter.COMMIT_TIME_MSEC_KEY); +if (localVersion == null && leaderVersion == 0) return true; + +if (localVersion != null && Long.parseLong(localVersion) == leaderVersion) { + return true; +} else { + Thread.sleep(500); +} + } + +} catch (Exception e) { + log.error("Exception when wait for replicas in sync with master"); +} finally { + try { +if (leaderClient != null) leaderClient.close(); + } catch (IOException e) { +e.printStackTrace(); + } +} + +return false; + } {code} In many cases in the tests the leader will change before the replication happens, right? Does it make sense to discover the leader inside of the loop? Also, is there a way to remove that Thread.sleep(1000) at the beginning? This code will be called very frequently in tests. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. > To use this new replication mode, a new collection must be created with an > additional parameter {{liveReplicas=1}} > {code} > http://localhost:8983/solr/admin/collections?action=CREATE=newCollection=2=1=1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796982#comment-15796982 ] Noble Paul commented on SOLR-9835: -- I would like this to be added to the description of the ticket > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796970#comment-15796970 ] Cao Manh Dat commented on SOLR-9835: Right now, to create a collection in {{onlyLeaderIndexesMode}} we must use "Create Collection API" with additional parameter : onlyLeaderIndexes=true. In further ticket, we can support switch between mode dynamically. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795916#comment-15795916 ] Noble Paul commented on SOLR-9835: -- What is the public interface for this feature? How do I enable/disable this feature? Is there a command? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, > SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788811#comment-15788811 ] Cao Manh Dat commented on SOLR-9835: Currently, PeerSync sync on tlog, so it is not a problem if the indexes on all replicas are the same. So Leader Election will be the same as today except that when new leader sync success with other replicas, it must replay its tlog to make all necessary changes to its index. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788716#comment-15788716 ] Pushkar Raste commented on SOLR-9835: - How are we handling leader failure here. if replicas are some what out of sync with the original leader, how would we elect a new leader. When the leader fails and a new leader gets elected, the new leader asks all the replicas to sync with the new leader. My understanding is, "since we are replicating index by fetching segments from leader, most of the segments on all the replicas should look the same, hence all the replicas will not go into full index copying". Is that correct ? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15778369#comment-15778369 ] Cao Manh Dat commented on SOLR-9835: [~yo...@apache.org][~ysee...@gmail.com] : Here are scenario for the problem that I encountered today - an replica ( let's call it rep1 ) is on recovering mode -> its ulog will be on buffering state. - rep1 receives an update ( contain doc1 ), rep1 will write the update to its tlog without updating ulog.map for real-time-get - rep1 replay buffered updates, rep1 will write doc1 to its index, and update ulog.map for real-time-get ( but in this case, ulog.map will point doc1 -> position = -1 because we don't write updateCommand with REPLAY flag to tlog ) - client call real-time-get for doc1 - rep1 will always open a real-time-searcher for this case I just wonder why we do that currently? Why don't we just write the update to tlog and ulog.map so we don't have to open a new real-time-searcher for this case? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748538#comment-15748538 ] Ishan Chattopadhyaya commented on SOLR-9835: {quote} Collection property to switch replication scheme A new property called “onlyLeaderIndexes” will be added to the collection. Any collection that has this property set to true will only index to the elected leader and the rest of the replicas will only fetch index segments from the leader as described above in the document. This property must be set during collection creation. It will default to “false”. Existing collections cannot be switched to using the new replication scheme. Future work can attempt to fix that. {quote} Instead of a hardcoded property name, i.e. onlyLeaderIndexes=true/false, I suggest that we have something like "replicationScheme=onlyLeaderIndexes" (or some other value for the current replication scheme). That would keep the door open for us to add any other replication scheme in future. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15741742#comment-15741742 ] Shalin Shekhar Mangar commented on SOLR-9835: - bq. Instead of periodic polling, can leader upon receiving and processing a commit command, send a notification to replicas asking them to sync up? We thought about that but we ended up favoring the polling model because the poll request is cheap and we can use existing replication handler code. We will keep the poll interval at half the auto commit time for a start. For this issue, we want to keep the changes limited as much as possible but we can consider a notification feature in later enhancements. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736051#comment-15736051 ] Pushkar Raste commented on SOLR-9835: - Instead of periodic polling, can leader upon receiving and processing a commit command, send a notification to replicas asking them to sync up? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735547#comment-15735547 ] Cao Manh Dat commented on SOLR-9835: Here are the detail design that I and Shalin have been working for recently h2. Overview The current replication mechanism of SolrCloud is called state machine, which replicas start in same initial state and for each input, the input is distributed across replicas so all replicas will end up with same next state. But this type of replication has some drawbacks : - The indexing and commits have to run on all replicas - Slow recovery, because if replica miss more than N updates on its down time, the replica has to (usually) download the entire index from its leader. So we introduce another replication for SolrCloud called state transfer, which acts like master/slave replication. Basically: - Leader distribute the update to other replicas, but only the leader applies the update to IndexWriter; other replicas just store the update to UpdateLog (act like replication) - Replicas frequently polling the latest segments from leader. Pros: - Lightweight for indexing, because only leader are running the updates and commits - Very fast recovery. If a replica fails, it just has to download newest segments, instead of re-downloading entire index. Cons : - Leader can become a hotspot when there are many replicas (we can optimize later) - Longer turnaround time compared to current NRT replication h2. Commit When we commit, we write the version of update command into the commit data in addition to the timestamp that is written today. h2. Update # When a replica receive an update request it will forward the update request to #* Correct leader ( in case of add document, delete by id ) #* All leaders ( in case of delete by query, commit ) # Leader assigns version to update as it does today, writes to update log and applies the update to IndexWriter # Leader distributes the update to its replicas # When replica receives an update, it writes the update to update log and return successfully # Leader return successful h2. Periodic Segment Replication Replica will poll the leader for the latest commit point generation and version and compare against the latest commit point locally. If it is the same, then replication is successful and nothing needs to be done. If not, then: # Replica downloads all files since the local commit point from the leader # Installs into the index, synchronize on the update log, close the old tlog, create a new one and copy over all records from the old tlog which have version greater than the latest commit point’s version. This is to ensure that any update which hasn’t made it to the index is preserved in the current tlog. This might lead to duplication of updates in the previous and the current tlogs but that is okay. # Replica re-opens the searcher For example: Leader has the following tlogs - TLog1 has versions 1,2,3,commit - TLog2 has versions 4,5 Before segment replication, the replica have the following tlogs: - TLog1 - 1,2,3,4,commit After segment replication, the replica will have the following tlogs: - TLog1 - 1,2,3,4,commit - TLog2 - 4,5 During this process, the replica does not need to be put into ‘recovery’ state. It continues to be ‘active’ and participate in indexing and searches. h2. Replica recovery # Replica puts itself in ‘recovery’ state. # Replica compares its latest commit point against the leader # If the index is same as leader, then it performs ‘peersync’ with the leader but only writes the peersync updates to its update log # If the index is not the same as leader or if peersync fails, the replica: #* Puts its update log in “buffering” state #* Issues a hard commit to the leader #* Copies the index commit points from the leader that do not exist locally #* Publishes itself as ‘active’ # The “buffering” state in the above steps ensures that any updates that haven’t been committed to the leader are also present/replicated to the replica’s current transaction log With respect to the current recovery strategy in Solr, we need only one change which is to check the index version of leader vs replica before we attempt a peersync. h2. Leader Election When a leader dies, a candidate replica will become a new leader. The leader election algorithm remains mostly the same except that after the “sync” step, the leader candidate will replay its transaction log before publishing itself as the leader. h2. Collection property to switch replication scheme A new property called “onlyLeaderIndexes” will be added to the collection. Any collection that has this property set to true will only index to the elected leader and the rest of the replicas will only fetch index segments from the leader as described above in the document. This property must be set during collection creation. It will default to “false”. Existing
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732576#comment-15732576 ] Shalin Shekhar Mangar commented on SOLR-9835: - Hold your questions for a bit. Dat and I are working on a design and we will try to answer your questions as much as we can. We will post it in a couple of days. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732554#comment-15732554 ] Ishan Chattopadhyaya commented on SOLR-9835: I'm curious about the handling of searcher reopens (which would create a new segment, afaict). Such a searcher reopen can happen if an RTG request uses filters. {code} ulog.openRealtimeSearcher(); // force open a new realtime searcher {code} > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732527#comment-15732527 ] Mark Miller commented on SOLR-9835: --- Soft commits are for near realtime and doing master->slave replication won't benefit from that. Effectively, soft commits won't be useful. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732174#comment-15732174 ] Pushkar Raste commented on SOLR-9835: - I am curious to know how soft commits (in memory segments) would be handled. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729828#comment-15729828 ] Erick Erickson commented on SOLR-9835: -- I believe that if we do implement this, it should make SOLR-9706 obsolete. The proposed replication mode sounds like it would make manually issuing a fetchIndex unnecessary since the use-case for SOLR-9706 I'm familiar with is exactly only indexing to the leader and having the followers use the regular polling mechanism which doesn't block during replication. At least we should verify whether SOLR-9706 is made obsolete by this proposal. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org