[jira] [Updated] (SOLR-9446) Leader failure after creating a freshly replicated index can send nodes into recovery even if index was not changed
[ https://issues.apache.org/jira/browse/SOLR-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9446: Attachment: SOLR-9446.patch > Leader failure after creating a freshly replicated index can send nodes into > recovery even if index was not changed > --- > > Key: SOLR-9446 > URL: https://issues.apache.org/jira/browse/SOLR-9446 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Attachments: SOLR-9446.patch > > > We noticed this issue while migrating solr index from machines {{A1, A2 and > A3}} to {{B1, B2, B3}}. We followed following steps (and there were no > updates during the migration process). > * Index had replicas on machines {{A1, A2, A3}}. Let's say {{A1}} was the > leader at the time > * We added 3 more replicas {{B1, B2 and B3}}. These nodes synced with the by > replication. These fresh nodes do not have tlogs. > * We shut down one of the old nodes ({{A3}}). > * We then shut down the leader ({{A1}}) > * New leader got elected (let's say {{A2}}) became the new leader > * Leader asked all the replicas to sync with it > * Fresh nodes (ones without tlogs), first tried PeerSync but since there was > no frame of reference, PeerSync failed and fresh nodes fail back on to try > replication > Although replication would not copy all the segments again, it seems like we > can short circuit sync to put nodes back in active state as soon as possible. > If in case freshly replicated index becomes leader for some reason, it can > still send nodes (both other freshly replicated indexes and old replicas) > into recovery. Here is the scenario > * Freshly replicated becomes the leader. > * New leader however asks all the replicas to sync with it. > * Replicas (including old one) ask for versions from the leader, but the > leader has no update logs, hence replicas can not compute missing versions > and falls back to replication -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9446) Leader failure after creating a freshly replicated index can send nodes into recovery even if index was not changed
[ https://issues.apache.org/jira/browse/SOLR-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9446: Description: We noticed this issue while migrating solr index from machines {{A1, A2 and A3}} to {{B1, B2, B3}}. We followed following steps (and there were no updates during the migration process). * Index had replicas on machines {{A1, A2, A3}}. Let's say {{A1}} was the leader at the time * We added 3 more replicas {{B1, B2 and B3}}. These nodes synced with the by replication. These fresh nodes do not have tlogs. * We shut down one of the old nodes ({{A3}}). * We then shut down the leader ({{A1}}) * New leader got elected (let's say {{A2}}) became the new leader * Leader asked all the replicas to sync with it * Fresh nodes (ones without tlogs), first tried PeerSync but since there was no frame of reference, PeerSync failed and fresh nodes fail back on to try replication Although replication would not copy all the segments again, it seems like we can short circuit sync to put nodes back in active state as soon as possible. If in case freshly replicated index becomes leader for some reason, it can still send nodes (both other freshly replicated indexes and old replicas) into recovery. Here is the scenario * Freshly replicated becomes the leader. * New leader however asks all the replicas to sync with it. * Replicas (including old one) ask for versions from the leader, but the leader has no update logs, hence replicas can not compute missing versions and falls back to replication was: We noticed this issue while migrating solr index from machines {{A1, A2 and A3}} to {{B1, B2, B3}}. We followed following steps (and there were no updates during the migration process). * Index had replicas on machines {{A1, A2, A3}}. Let's say {{A1}} was the leader at the time * We added 3 more replicas {{B1, B2 and B3}}. These nodes synced with the by replication. These fresh nodes do not have tlogs. * We shut down one of the old nodes ({{A3}}). * We then shut down the leader ({{A1}}) * New leader got elected (let's say {{A2}}) became the new leader * Leader asked all the replicas to sync with it * Fresh nodes (ones without tlogs), first tried PeerSync but since there was no frame of reference, PeerSync failed and fresh nodes fail back on to try replication Although replication would not copy all the segments again, it seems like we can short circuit sync to put nodes back in active state as soon as possible. > Leader failure after creating a freshly replicated index can send nodes into > recovery even if index was not changed > --- > > Key: SOLR-9446 > URL: https://issues.apache.org/jira/browse/SOLR-9446 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > > We noticed this issue while migrating solr index from machines {{A1, A2 and > A3}} to {{B1, B2, B3}}. We followed following steps (and there were no > updates during the migration process). > * Index had replicas on machines {{A1, A2, A3}}. Let's say {{A1}} was the > leader at the time > * We added 3 more replicas {{B1, B2 and B3}}. These nodes synced with the by > replication. These fresh nodes do not have tlogs. > * We shut down one of the old nodes ({{A3}}). > * We then shut down the leader ({{A1}}) > * New leader got elected (let's say {{A2}}) became the new leader > * Leader asked all the replicas to sync with it > * Fresh nodes (ones without tlogs), first tried PeerSync but since there was > no frame of reference, PeerSync failed and fresh nodes fail back on to try > replication > Although replication would not copy all the segments again, it seems like we > can short circuit sync to put nodes back in active state as soon as possible. > If in case freshly replicated index becomes leader for some reason, it can > still send nodes (both other freshly replicated indexes and old replicas) > into recovery. Here is the scenario > * Freshly replicated becomes the leader. > * New leader however asks all the replicas to sync with it. > * Replicas (including old one) ask for versions from the leader, but the > leader has no update logs, hence replicas can not compute missing versions > and falls back to replication -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9446) Leader failure after creating a freshly replicated index can send nodes into recovery even if index was not changed
[ https://issues.apache.org/jira/browse/SOLR-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9446: Summary: Leader failure after creating a freshly replicated index can send nodes into recovery even if index was not changed (was: Just replicated index goes into replication recovery on leader failure even if index was not changed) > Leader failure after creating a freshly replicated index can send nodes into > recovery even if index was not changed > --- > > Key: SOLR-9446 > URL: https://issues.apache.org/jira/browse/SOLR-9446 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > > We noticed this issue while migrating solr index from machines {{A1, A2 and > A3}} to {{B1, B2, B3}}. We followed following steps (and there were no > updates during the migration process). > * Index had replicas on machines {{A1, A2, A3}}. Let's say {{A1}} was the > leader at the time > * We added 3 more replicas {{B1, B2 and B3}}. These nodes synced with the by > replication. These fresh nodes do not have tlogs. > * We shut down one of the old nodes ({{A3}}). > * We then shut down the leader ({{A1}}) > * New leader got elected (let's say {{A2}}) became the new leader > * Leader asked all the replicas to sync with it > * Fresh nodes (ones without tlogs), first tried PeerSync but since there was > no frame of reference, PeerSync failed and fresh nodes fail back on to try > replication > Although replication would not copy all the segments again, it seems like we > can short circuit sync to put nodes back in active state as soon as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org