[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744143#comment-14744143 ] Mark Miller commented on SOLR-7932: --- I think when in SolrCloud mode perhaps it's just best to always replicate and count on peer sync as the short circuit. If replication is really not needed, most of the work will be skipped properly via filenames and checksums anyway, rather than this sloppy way that may miss a replication in rare cases. > Solr replication relies on timestamps to sync across machines > - > > Key: SOLR-7932 > URL: https://issues.apache.org/jira/browse/SOLR-7932 > Project: Solr > Issue Type: Bug > Components: replication (java) >Reporter: Ramkumar Aiyengar > Attachments: SOLR-7932.patch, SOLR-7932.patch > > > Spinning off SOLR-7859, noticed there that wall time recorded as commit data > on a commit to check if replication needs to be done. In IndexFetcher, there > is this code: > {code} > if (!forceReplication && > IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { > //master and slave are already in sync just return > LOG.info("Slave in sync with master."); > successfulInstall = true; > return true; > } > {code} > It appears as if we are checking wall times across machines to check if we > are in sync, this could go wrong. > Once a decision is made to replicate, we do seem to use generations instead, > except for this place below checks both generations and timestamps to see if > a full copy is needed.. > {code} > // if the generation of master is older than that of the slave , it > means they are not compatible to be copied > // then a new index directory to be created and all the files need to > be copied > boolean isFullCopyNeeded = IndexDeletionPolicyWrapper > .getCommitTimestamp(commit) >= latestVersion > || commit.getGeneration() >= latestGeneration || forceReplication; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702911#comment-14702911 ] Mark Miller commented on SOLR-7932: --- bq. you still have to deal with clock skew though Why? Don't both times come from the master? bq. Do you agree that the timestamp check can be removed there? I don't think it can just be removed in either case without better replacement logic. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Attachments: SOLR-7932.patch Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703067#comment-14703067 ] Varun Thacker commented on SOLR-7932: - If I understand it correctly we don't need the timestamp check in a master-slave setup. The reason being since the index on the slave is coming from the master both timestamp and generation will be the same. So just checking generation will be enough right? In cloud mode, commits on different replicas happen at different times so the timestamps would always be different. But this code path with only get invoked during a recovery. So we could remove it for this use case as well right? Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Attachments: SOLR-7932.patch Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702927#comment-14702927 ] Ramkumar Aiyengar commented on SOLR-7932: - bq. Why? Don't both times come from the master? A clock skew could cause two different commits to have the same time (commit 1 happens at time X, NTP sets the clock back by 200ms. 200ms later, commit 2 happens). It's not exactly what's in this title (i.e. relying on timestamps across machines), and you have to be a lot more unlucky, but you can't rely on wall time even in the same machine. bq. I don't think it can just be removed in either case without better replacement logic. How does the timestamp help currently in the first case? We are anyway using generations immediately following, so won't you be better off comparing generations instead to check if replication can be skipped? Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Attachments: SOLR-7932.patch Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704185#comment-14704185 ] Yonik Seeley commented on SOLR-7932: bq. If I understand it correctly we don't need the timestamp check in a master-slave setup. The reason being since the index on the slave is coming from the master both timestamp and generation will be the same. So just checking generation will be enough right? Someone can always switch masters, or blow away the index and rebuild from scratch, etc. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Attachments: SOLR-7932.patch Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702773#comment-14702773 ] Ramkumar Aiyengar commented on SOLR-7932: - Thanks for the comments [~ysee...@gmail.com]. With master-slave replication, yes, this is less of a problem (you still have to deal with clock skew though). There are two places where the index time is used.. - To compare if they are equal to skip replication. Unless I am mistaken, the timestamp check is not useful to detect index re-creation in this case. - To check if full index replication should be forced. I see the use here (though I don't see an easy way you can do this in a cloud without stopping the full cloud, blowing an index on one but not all replicas, and making sure it comes up first) I am more concerned really about the first case, as you can lose data if you are unlucky. Do you agree that the timestamp check can be removed there? For the second, probably the index creation time is a better thing to check against rather than the last commit time, as it is less subject to skew? I don't know if Lucene even provides a way to know when the index was initially created though. And this could be tackled as a different issue.. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Attachments: SOLR-7932.patch Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698590#comment-14698590 ] Varun Thacker commented on SOLR-7932: - {code} boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} I think we can just change it to: {{commit.getGeneration() = latestGeneration || forceReplication}} . Comparing timestamps would have been valid for master slave when the actual index was rsynced and hence the commit timestamp would have been the same. And the new check would not break master slave either since the commit generation would be the same as well. {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} This should also check with the generation numbers I guess. This check is only required in the master slave architecture. In cloud mode we would never call IndexFetcher unless we wanted to replicate. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698653#comment-14698653 ] Ramkumar Aiyengar commented on SOLR-7932: - Thanks [~varunthacker], that's what I would have thought, good to have confirmation. Also good point regarding all this being applicable only to master-slave replication. [~elyograg], I can see why detecting this would be good for log analysis, could you raise a separate ticket for this? I would like to keep this to solve the replication issue. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698683#comment-14698683 ] Yonik Seeley commented on SOLR-7932: There's a ton of history behind the replication stuff - and it's had to change over time due to changes in Lucene's versions / generations. bq. It appears as if we are checking wall times across machines to check if we are in sync, At least in traditional master-slave replication, that's not the case... the timestamps being compared are all generated on the master (even if they are compared on the slaves?) Anyway, using index generation is certainly not enough since one can blow away an index and re-create it, reseting the generation to a low number. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Attachments: SOLR-7932.patch Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698474#comment-14698474 ] Shawn Heisey commented on SOLR-7932: I wasn't suggesting that Solr should be responsible for syncing the clocks. That's likely not even possible. We could theoretically compensate for differences if we can detect what the difference is, but I don't think that's our job either. I was suggesting that Solr should proactively DETECT clock sync problems, and let the user know that there's a problem with their install. Fixing it is up to the user. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698478#comment-14698478 ] Shawn Heisey commented on SOLR-7932: If we don't have (or fix as you're suggesting) any functionality where time skew causes problems, then we don't really need to worry about it. But I think we should. Even if Solr itself doesn't care, a user who is troubleshooting a SorlCloud problem may try to compare the logs on two machines, or make those logs available to people on the user list for help. This is probably another in the long list of things I'd like to do, can't get anyone else interested in, and may prove too difficult for me to figure out. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698475#comment-14698475 ] Ramkumar Aiyengar commented on SOLR-7932: - My point was that we shouldn't have to worry about if the clocks are out of sync.. This (replication) is one place where we do currently rely on clocks being sync, and my question is if it needs to be. We do have a replication bug here regardless of how sync'd up the clocks are, it's a race condition waiting to happen.. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698390#comment-14698390 ] Ramkumar Aiyengar commented on SOLR-7932: - Personally I would rather remove any place relying on absolute timestamps. The feature you describe IMO doesn't belong to Solr, and there are well known mechanisms to keep clocks in sync already (ntpd on Linux for example) Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698308#comment-14698308 ] Shawn Heisey commented on SOLR-7932: I cannot comment on the suggested fix of using generations, but I might have something to add regarding wall clocks on multiple machines. I think that Solr should provide an implicit system-level handler whose purpose is to return System.currentTimeMillis as quickly as possible. At certain times, which would include SolrCloud initialization and possibly SolrCloud-related replication requests, a Solr node should compare its own wall clock time with the other relevant node(s) for the particular action, and if a large enough discrepancy is found, a warning should be logged. In the future, we might upgrade that to an error. My initial SWAG for an acceptable discrepancy is no more than half a second. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines
[ https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698311#comment-14698311 ] Shawn Heisey commented on SOLR-7932: After thinking about it for a few minutes, if we have a warn for time discrepancy feature, the default acceptable discrepancy should be geared towards a LAN setup (half a second or smaller), and configurable for those who push the limits with something like satellite networking, which has a minimum round-trip (ping) latency of over 600 milliseconds. Solr replication relies on timestamps to sync across machines - Key: SOLR-7932 URL: https://issues.apache.org/jira/browse/SOLR-7932 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Ramkumar Aiyengar Spinning off SOLR-7859, noticed there that wall time recorded as commit data on a commit to check if replication needs to be done. In IndexFetcher, there is this code: {code} if (!forceReplication IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) { //master and slave are already in sync just return LOG.info(Slave in sync with master.); successfulInstall = true; return true; } {code} It appears as if we are checking wall times across machines to check if we are in sync, this could go wrong. Once a decision is made to replicate, we do seem to use generations instead, except for this place below checks both generations and timestamps to see if a full copy is needed.. {code} // if the generation of master is older than that of the slave , it means they are not compatible to be copied // then a new index directory to be created and all the files need to be copied boolean isFullCopyNeeded = IndexDeletionPolicyWrapper .getCommitTimestamp(commit) = latestVersion || commit.getGeneration() = latestGeneration || forceReplication; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org