murblanc commented on a change in pull request #1055: SOLR-13932 Review
directory locking and Blob interactions
URL: https://github.com/apache/lucene-solr/pull/1055#discussion_r362517551
##
File path:
solr/core/src/java/org/apache/solr/store/blob/metadata/ServerSideMetadata.java
##
@@ -132,35 +127,15 @@ public ServerSideMetadata(String coreName, CoreContainer
container, boolean take
generation = latestCommit.getGeneration();
latestCommitFiles = latestCommitBuilder.build();
-// Capture now the hash and verify again if we need to pull content
from the Blob store into this directory,
-// to make sure there are no local changes at the same time that might
lead to a corruption in case of interaction
-// with the download.
-// TODO: revise with "design assumptions around pull pipeline"
mentioned in allCommits TODO below
+// Capture now the hash and verify again after files have been pulled
and before the directory is updated (or before
+// the index is switched to use a new directory) to make sure there
are no local changes at the same time that might
+// lead to a corruption in case of interaction with the download or
might be a sign of other problems (it is not
+// expected that indexing can happen on a local directory of a SHARED
replica if that replica is not up to date with
+// the Blob store version).
directoryHash = getSolrDirectoryHash(coreDir);
-allCommitsFiles = latestCommitFiles;
-// TODO: allCommits was added to detect special cases where inactive
file segments can potentially conflict
-// with whats in shared store. But given the recent
understanding of semantics around index directory locks
-// we need to revise our design assumptions around pull
pipeline, including this one.
-// Disabling this for now so that unreliability around
introspection of older commits
-// might not get in the way of steady state indexing.
-//// A note on listCommits says that it does not guarantee consistent
results if a commit is in progress.
-//// But in blob context we serialize commits and pulls by proper
locking therefore we should be good here.
-//List allCommits = DirectoryReader.listCommits(coreDir);
-//
-//// we should always have a commit point as verified in the beginning
of this method.
-//assert (allCommits.size() > 1) || (allCommits.size() == 1 &&
allCommits.get(0).equals(latestCommit));
-//
-//// optimization: normally we would only be dealing with one commit
point. In that case just reuse latest commit files builder.
-//ImmutableCollection.Builder allCommitsBuilder =
latestCommitBuilder;
-//if (allCommits.size() > 1) {
-// allCommitsBuilder = new ImmutableSet.Builder<>();
-// for (IndexCommit commit : allCommits) {
-//// no snapshot for inactive segments files
-//buildCommitFiles(coreDir, commit, allCommitsBuilder, /*
snapshotDir */ null);
-// }
-//}
-//allCommitsFiles = allCommitsBuilder.build();
+// Need to inventory all local files in case files that need to be
pulled from Blob conflict with them.
+allFiles = ImmutableSet.copyOf(coreDir.listAll());
Review comment:
You're right on both counts. In order to limit complexity at this stage, I'm
tempted to leave these optimizations for later (likely very minor improvement
for indexing given the higher cost of pushing to Blob store).
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
With regards,
Apache Git Services
-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org