[GitHub] [lucene-solr] murblanc commented on a change in pull request #1055: SOLR-13932 Review directory locking and Blob interactions

2020-01-02 Thread GitBox
murblanc commented on a change in pull request #1055: SOLR-13932 Review 
directory locking and Blob interactions
URL: https://github.com/apache/lucene-solr/pull/1055#discussion_r362520142
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/store/blob/metadata/SharedStoreResolutionUtil.java
 ##
 @@ -147,42 +139,53 @@ public static SharedMetadataResolutionResult 
resolveMetadata(ServerSideMetadata
 
 if (local == null) {
   // The shard index data does not exist locally. All we can do is pull.  
-  // We've computed blobFilesMissingLocally and localFilesMissingOnBlob is 
empty as it should be.
+  // We've computed blobFilesMissingLocally. localFilesMissingOnBlob is 
empty as it should be.
   return new 
SharedMetadataResolutionResult(localFilesMissingOnBlob.values(), 
blobFilesMissingLocally.values(), blobFilesMissingLocally.values(), false);
 }
 
-boolean localConflictingWithBlob = false;
+// If trying to pull files from Blob, make sure similarly named files do 
not already exist outside the current commit point
+ImmutableSet allLocalFiles = local.getAllFiles();
+
+boolean downloadToNewDir = false;
 
 Review comment:
   pulling from Blob downloads files :)
   Not sure I understand what the link pasted above points to.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1055: SOLR-13932 Review directory locking and Blob interactions

2020-01-02 Thread GitBox
murblanc commented on a change in pull request #1055: SOLR-13932 Review 
directory locking and Blob interactions
URL: https://github.com/apache/lucene-solr/pull/1055#discussion_r362517551
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/store/blob/metadata/ServerSideMetadata.java
 ##
 @@ -132,35 +127,15 @@ public ServerSideMetadata(String coreName, CoreContainer 
container, boolean take
 generation = latestCommit.getGeneration();
 latestCommitFiles = latestCommitBuilder.build();
 
-// Capture now the hash and verify again if we need to pull content 
from the Blob store into this directory,
-// to make sure there are no local changes at the same time that might 
lead to a corruption in case of interaction
-// with the download.
-// TODO: revise with "design assumptions around pull pipeline" 
mentioned in allCommits TODO below
+// Capture now the hash and verify again after files have been pulled 
and before the directory is updated (or before
+// the index is switched to use a new directory) to make sure there 
are no local changes at the same time that might
+// lead to a corruption in case of interaction with the download or 
might be a sign of other problems (it is not
+// expected that indexing can happen on a local directory of a SHARED 
replica if that replica is not up to date with
+// the Blob store version).
 directoryHash = getSolrDirectoryHash(coreDir);
 
-allCommitsFiles = latestCommitFiles;
-// TODO: allCommits was added to detect special cases where inactive 
file segments can potentially conflict
-//   with whats in shared store. But given the recent 
understanding of semantics around index directory locks
-//   we need to revise our design assumptions around pull 
pipeline, including this one.
-//   Disabling this for now so that unreliability around 
introspection of older commits 
-//   might not get in the way of steady state indexing.
-//// A note on listCommits says that it does not guarantee consistent 
results if a commit is in progress.
-//// But in blob context we serialize commits and pulls by proper 
locking therefore we should be good here.
-//List allCommits = DirectoryReader.listCommits(coreDir);
-//
-//// we should always have a commit point as verified in the beginning 
of this method.
-//assert (allCommits.size() > 1) || (allCommits.size() == 1 && 
allCommits.get(0).equals(latestCommit));
-//
-//// optimization:  normally we would only be dealing with one commit 
point. In that case just reuse latest commit files builder.
-//ImmutableCollection.Builder allCommitsBuilder = 
latestCommitBuilder;
-//if (allCommits.size() > 1) {
-//  allCommitsBuilder = new ImmutableSet.Builder<>();
-//  for (IndexCommit commit : allCommits) {
-//// no snapshot for inactive segments files
-//buildCommitFiles(coreDir, commit, allCommitsBuilder, /* 
snapshotDir */ null);
-//  }
-//}
-//allCommitsFiles = allCommitsBuilder.build();
+// Need to inventory all local files in case files that need to be 
pulled from Blob conflict with them.
+allFiles = ImmutableSet.copyOf(coreDir.listAll());
 
 Review comment:
   You're right on both counts. In order to limit complexity at this stage, I'm 
tempted to leave these optimizations for later (likely very minor improvement 
for indexing given the higher cost of pushing to Blob store).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1055: SOLR-13932 Review directory locking and Blob interactions

2020-01-02 Thread GitBox
murblanc commented on a change in pull request #1055: SOLR-13932 Review 
directory locking and Blob interactions
URL: https://github.com/apache/lucene-solr/pull/1055#discussion_r362514824
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/store/blob/metadata/CorePushPull.java
 ##
 @@ -268,7 +264,7 @@ public void pullUpdateFromBlob(long requestQueuedTimeMs, 
boolean waitForSearcher
   }
 
 
 Review comment:
   Pull is indeed exclusive, but let's not rely on this (i.e. be defensive). We 
do check the directory hasn't changed during the pull before adding back the 
files and reopening the IW, so I think we're ok.
   
   Not sure about your reference @mbwaheed to SolrIndexSplitter. The lock is 
acquired there on a directory that I'm not sure is the index directory (it is 
searcher.getRawReader().directory() on the passed instance of 
SolrIndexSearcher). We manipulate solrCore.getIndexDir().
   
   The directory and IndexWriter manipulation in 
IndexFetcher.fetchLatestIndex() is similar to the one we do here (no surprise, 
it was used as "inspiration").


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org