[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-09-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744143#comment-14744143
 ] 

Mark Miller commented on SOLR-7932:
---

I think when in SolrCloud mode perhaps it's just best to always replicate and 
count on peer sync as the short circuit. If replication is really not needed, 
most of the work will be skipped properly via filenames and checksums anyway, 
rather than this sloppy way that may miss a replication in rare cases.

> Solr replication relies on timestamps to sync across machines
> -
>
> Key: SOLR-7932
> URL: https://issues.apache.org/jira/browse/SOLR-7932
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Reporter: Ramkumar Aiyengar
> Attachments: SOLR-7932.patch, SOLR-7932.patch
>
>
> Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
> on a commit to check if replication needs to be done. In IndexFetcher, there 
> is this code:
> {code}
>   if (!forceReplication && 
> IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
> //master and slave are already in sync just return
> LOG.info("Slave in sync with master.");
> successfulInstall = true;
> return true;
>   }
> {code}
> It appears as if we are checking wall times across machines to check if we 
> are in sync, this could go wrong.
> Once a decision is made to replicate, we do seem to use generations instead, 
> except for this place below checks both generations and timestamps to see if 
> a full copy is needed..
> {code}
>   // if the generation of master is older than that of the slave , it 
> means they are not compatible to be copied
>   // then a new index directory to be created and all the files need to 
> be copied
>   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
>   .getCommitTimestamp(commit) >= latestVersion
>   || commit.getGeneration() >= latestGeneration || forceReplication;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702911#comment-14702911
 ] 

Mark Miller commented on SOLR-7932:
---

bq. you still have to deal with clock skew though

Why? Don't both times come from the master? 

bq. Do you agree that the timestamp check can be removed there?

I don't think it can just be removed in either case without better replacement 
logic.

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-7932.patch


 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-19 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703067#comment-14703067
 ] 

Varun Thacker commented on SOLR-7932:
-

If I understand it correctly we don't need the timestamp check in a 
master-slave setup. The reason being since the index on the slave is coming 
from the master both timestamp and generation will be the same. So just 
checking generation will be enough right?

In cloud mode, commits on different replicas happen at different times so the 
timestamps would always be different. But this code path with only get invoked 
during a recovery. So we could remove it for this use case as well right? 

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-7932.patch


 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-19 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702927#comment-14702927
 ] 

Ramkumar Aiyengar commented on SOLR-7932:
-

bq. Why? Don't both times come from the master?

A clock skew could cause two different commits to have the same time (commit 1 
happens at time X, NTP sets the clock back by 200ms. 200ms later, commit 2 
happens). It's not exactly what's in this title (i.e. relying on timestamps 
across machines), and you have to be a lot more unlucky, but you can't rely on 
wall time even in the same machine.

bq. I don't think it can just be removed in either case without better 
replacement logic.

How does the timestamp help currently in the first case? We are anyway using 
generations immediately following, so won't you be better off comparing 
generations instead to check if replication can be skipped?


 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-7932.patch


 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704185#comment-14704185
 ] 

Yonik Seeley commented on SOLR-7932:


bq. If I understand it correctly we don't need the timestamp check in a 
master-slave setup. The reason being since the index on the slave is coming 
from the master both timestamp and generation will be the same. So just 
checking generation will be enough right?

Someone can always switch masters, or blow away the index and rebuild from 
scratch, etc.

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-7932.patch


 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-19 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702773#comment-14702773
 ] 

Ramkumar Aiyengar commented on SOLR-7932:
-

Thanks for the comments [~ysee...@gmail.com]. With master-slave replication, 
yes, this is less of a problem (you still have to deal with clock skew though).

There are two places where the index time is used..

 - To compare if they are equal to skip replication. Unless I am mistaken, the 
timestamp check is not useful to detect index re-creation in this case.
 - To check if full index replication should be forced. I see the use here 
(though I don't see an easy way you can do this in a cloud without stopping the 
full cloud, blowing an index on one but not all replicas, and making sure it 
comes up first)

I am more concerned really about the first case, as you can lose data if you 
are unlucky. Do you agree that the timestamp check can be removed there?

For the second, probably the index creation time is a better thing to check 
against rather than the last commit time, as it is less subject to skew? I 
don't know if Lucene even provides a way to know when the index was initially 
created though. And this could be tackled as a different issue..

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-7932.patch


 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-16 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698590#comment-14698590
 ] 

Varun Thacker commented on SOLR-7932:
-

{code}
  boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
  .getCommitTimestamp(commit) = latestVersion
  || commit.getGeneration() = latestGeneration || forceReplication;
{code}

I think we can just change it to: {{commit.getGeneration() = latestGeneration 
|| forceReplication}} . 
Comparing timestamps would have been valid for master slave when the actual 
index was rsynced and hence the commit timestamp would have been the same. And 
the new check would not break master slave either since the commit generation 
would be the same as well.


{code}
if (!forceReplication  IndexDeletionPolicyWrapper.getCommitTimestamp(commit) 
== latestVersion) {
//master and slave are already in sync just return
LOG.info(Slave in sync with master.);
successfulInstall = true;
return true;
  }
{code}

This should also check with the generation numbers I guess. This check is only 
required in the master slave architecture. In cloud mode we would never call 
IndexFetcher unless we wanted to replicate. 

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar

 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-16 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698653#comment-14698653
 ] 

Ramkumar Aiyengar commented on SOLR-7932:
-

Thanks [~varunthacker], that's what I would have thought, good to have 
confirmation. Also good point regarding all this being applicable only to 
master-slave replication.

[~elyograg], I can see why detecting this would be good for log analysis, could 
you raise a separate ticket for this? I would like to keep this to solve the 
replication issue.

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar

 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698683#comment-14698683
 ] 

Yonik Seeley commented on SOLR-7932:


There's a ton of history behind the replication stuff - and it's had to change 
over time due to changes in Lucene's versions / generations.

bq. It appears as if we are checking wall times across machines to check if we 
are in sync,

At least in traditional master-slave replication, that's not the case... the 
timestamps being compared are all generated on the master (even if they are 
compared on the slaves?)

Anyway, using index generation is certainly not enough since one can blow away 
an index and re-create it, reseting the generation to a low number.

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar
 Attachments: SOLR-7932.patch


 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-15 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698474#comment-14698474
 ] 

Shawn Heisey commented on SOLR-7932:


I wasn't suggesting that Solr should be responsible for syncing the clocks.  
That's likely not even possible.  We could theoretically compensate for 
differences if we can detect what the difference is, but I don't think that's 
our job either.

I was suggesting that Solr should proactively DETECT clock sync problems, and 
let the user know that there's a problem with their install.  Fixing it is up 
to the user.

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar

 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-15 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698478#comment-14698478
 ] 

Shawn Heisey commented on SOLR-7932:


If we don't have (or fix as you're suggesting) any functionality where time 
skew causes problems, then we don't really need to worry about it.

But I think we should.  Even if Solr itself doesn't care, a user who is 
troubleshooting a SorlCloud problem may try to compare the logs on two 
machines, or make those logs available to people on the user list for help.

This is probably another in the long list of things I'd like to do, can't get 
anyone else interested in, and may prove too difficult for me to figure out.

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar

 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-15 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698475#comment-14698475
 ] 

Ramkumar Aiyengar commented on SOLR-7932:
-

My point was that we shouldn't have to worry about if the clocks are out of 
sync.. This (replication) is one place where we do currently rely on clocks 
being sync, and my question is if it needs to be. We do have a replication bug 
here regardless of how sync'd up the clocks are, it's a race condition waiting 
to happen..

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar

 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-15 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698390#comment-14698390
 ] 

Ramkumar Aiyengar commented on SOLR-7932:
-

Personally I would rather remove any place relying on absolute timestamps. The 
feature you describe IMO doesn't belong to Solr, and there are well known 
mechanisms to keep clocks in sync already (ntpd on Linux for example)

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar

 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-15 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698308#comment-14698308
 ] 

Shawn Heisey commented on SOLR-7932:


I cannot comment on the suggested fix of using generations, but I might have 
something to add regarding wall clocks on multiple machines.

I think that Solr should provide an implicit system-level handler whose purpose 
is to return System.currentTimeMillis as quickly as possible.  At certain 
times, which would include SolrCloud initialization and possibly 
SolrCloud-related replication requests, a Solr node should compare its own wall 
clock time with the other relevant node(s) for the particular action, and if a 
large enough discrepancy is found, a warning should be logged.  In the future, 
we might upgrade that to an error.  My initial SWAG for an acceptable 
discrepancy is no more than half a second.


 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar

 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7932) Solr replication relies on timestamps to sync across machines

2015-08-15 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698311#comment-14698311
 ] 

Shawn Heisey commented on SOLR-7932:


After thinking about it for a few minutes, if we have a warn for time 
discrepancy feature, the default acceptable discrepancy should be geared 
towards a LAN setup (half a second or smaller), and configurable for those who 
push the limits with something like satellite networking, which has a minimum 
round-trip (ping) latency of over 600 milliseconds.

 Solr replication relies on timestamps to sync across machines
 -

 Key: SOLR-7932
 URL: https://issues.apache.org/jira/browse/SOLR-7932
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Ramkumar Aiyengar

 Spinning off SOLR-7859, noticed there that wall time recorded as commit data 
 on a commit to check if replication needs to be done. In IndexFetcher, there 
 is this code:
 {code}
   if (!forceReplication  
 IndexDeletionPolicyWrapper.getCommitTimestamp(commit) == latestVersion) {
 //master and slave are already in sync just return
 LOG.info(Slave in sync with master.);
 successfulInstall = true;
 return true;
   }
 {code}
 It appears as if we are checking wall times across machines to check if we 
 are in sync, this could go wrong.
 Once a decision is made to replicate, we do seem to use generations instead, 
 except for this place below checks both generations and timestamps to see if 
 a full copy is needed..
 {code}
   // if the generation of master is older than that of the slave , it 
 means they are not compatible to be copied
   // then a new index directory to be created and all the files need to 
 be copied
   boolean isFullCopyNeeded = IndexDeletionPolicyWrapper
   .getCommitTimestamp(commit) = latestVersion
   || commit.getGeneration() = latestGeneration || forceReplication;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org