nfsantos commented on code in PR #1577:
URL: https://github.com/apache/jackrabbit-oak/pull/1577#discussion_r1745538475
##########
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileNodeStoreBuilder.java:
##########
@@ -283,7 +324,7 @@ public List<IndexStore> buildList(IndexHelper indexHelper,
IndexerSupport indexe
private IndexStoreFiles createdSortedStoreFiles() throws IOException,
CompositeException {
// Check system property defined path
String sortedFilePath =
System.getProperty(OAK_INDEXER_SORTED_FILE_PATH);
- if (StringUtils.isNotBlank(sortedFilePath)) {
+ if (sortedFilePath != null && !sortedFilePath.isEmpty()) {
Review Comment:
This is not equivalent. If the string contains only whitespace,
`StringUtils::isNotBlank` returns false, while this test here returns true, as
`#isEmpty` tests that the String contains no characters.
##########
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/DocumentStoreIndexerBase.java:
##########
@@ -239,6 +239,12 @@ nodeStore, getMongoDocumentStore(), traversalLog))
return storeList;
}
+ public IndexStore buildTreeStore() throws IOException,
CommitFailedException {
+
System.setProperty(FlatFileNodeStoreBuilder.OAK_INDEXER_SORT_STRATEGY_TYPE,
Review Comment:
This will leave this system property behind, even after finishing building
the tree store, so it changes global state. Couldn't this cause unintended
problems?
##########
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileNodeStoreBuilder.java:
##########
@@ -19,17 +19,32 @@
package org.apache.jackrabbit.oak.index.indexer.document.flatfile;
-import com.mongodb.MongoClientURI;
-import com.mongodb.client.MongoDatabase;
-import org.apache.commons.lang3.StringUtils;
+import static java.util.Collections.unmodifiableSet;
+import static org.apache.jackrabbit.guava.common.base.Preconditions.checkState;
+import static
org.apache.jackrabbit.oak.index.indexer.document.indexstore.IndexStoreUtils.OAK_INDEXER_USE_LZ4;
+import static
org.apache.jackrabbit.oak.index.indexer.document.indexstore.IndexStoreUtils.OAK_INDEXER_USE_ZIP;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+
Review Comment:
The order of imports is different from the one of InteliJ. We do not have an
agreed formatting style, but since most of us are using InteliJ and the file is
already with the InteliJ ordering, I'd prefer we leave it the current order.
##########
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/pipelined/PipelinedTransformTask.java:
##########
@@ -160,7 +160,7 @@ public Result call() throws Exception {
for (NodeDocument nodeDoc : nodeDocumentBatch) {
statistics.incrementMongoDocumentsTraversed();
mongoObjectsProcessed++;
- if (mongoObjectsProcessed % 50000 == 0) {
+ if (mongoObjectsProcessed % 200_000 == 0) {
Review Comment:
Why the change? I don't find that this log line is printed too often. But
adding a thousand separator to the `50000` would be nice.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]