[GitHub] lucene-solr issue #531: SOLR-12768
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/531 Code changes seem good; I'll look closer and run tests. PathHierarchyTokenizerFactory _might_ eventually be used at query time if we had a syntax to say "get me all my ancestors". But given it's conditional based on syntax, it can't easily go into the query analyzer, so I think it's better to leave the query time chain simple/direct and we do this stuff when looking at the syntax. I know this has evolved recently from where we were. In this issue I removed PathHierarchyTokenizer from the index side because I feel it's better to err on lighter-weight indexing at the expense of slower queries since I think it's very likely a very low cost for what I think are typical use-cases. RE issue to discuss the syntax: yes a subtask of SOLR-12298. no hurry on that one --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #531: SOLR-12768
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/531 Nonetheless we could stop trying to solve this ambiguous case further _for now_ (thus commit what you have here) since (a) this syntax is very experimental (b) it's documented nowhere (c) and wasn't developed very openly. RE openness; given (a) & (b), the non-openness of (c) is okay but if we _really_ want to make this feature known, it deserves it's own issue to discuss with the syntax ought to be publicly. A syntax to match paths is big enough that it shouldn't be buried within the scope of some other issue. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #531: SOLR-12768
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/531 > Perhaps we could test whether "foo" is a defined field in the current collection? Eh; that sounds too much like a "guess what the user might mean" kind of solution. Even if "foo" is in the schema, it doesn't prevent an element "foo" from being used. > I personally prefer keeping the API as simple as possible(I'm a believer of the KISS principle). At the expense of ambiguity? Trade-offs. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #:
Github user dsmiley commented on the pull request: https://github.com/apache/lucene-solr/commit/c04a3ac306b017054173d986e32ea217e2fc6595#commitcomment-31829970 In solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java: In solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java on line 162: I think this illustrates a problem with this syntax that you invented. The code here says the absence of any '/' means it's a "regular filter, not hierarchy based". But how then do you articulate a path filter for all elements named "foo" regardless of parentage? So I think "childFilter" is trying to do double-duty here leading to some edge cases as we try to guess what was intended. I propose instead we have a different local param "pathFilter". Both filters will be AND'ed together. WDYT? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #531: SOLR-12768
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/531 The way this field is indexed has been experimental and will remain so for a bit. 8.0 will be it's more public debut, after which changing it will be more effort than today. Nonetheless I will make this change clear in CHANGES.txt in case someone was using it. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #531: SOLR-12768
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/531 Looks good. I thought you were going to test with something to show we don't screw it up? You could search for 'redients' which would have returned something before but now should not. I wonder... hmmm... maybe the nest path should always start with a '/'? It would seem more correct since paths in general usually start with one, and it would also make implementing this case simpler. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #531: SOLR-12768
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/531 Thanks for helping out! It's insufficient to simply prepend a `*` since it would match other labels that share the same suffix. For example: `childFilter='ingredients/name\_s:cocoa'` would get transformed into a query that might match a label `secretingredients` but we don't want it to. I think a solution is to build an OR query where one clause has a `*/` prepended to it, and the other has nothing modified and so matches the term exactly. WDYT? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #528: SOLR-12955 2
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/528 Man this is a tangled mess we are trying to unravel! I wish someone undertook this years ago. Lets not rename DUP#processDelete as it is an URP defined method we override, and I doubt it'd worth it to expand the change to URP. postProcessDeleteByQuery can be renamed to doDistribDeleteByQuery, and postProcessDeleteById can be renamed doDistribDeleteById. The flow of doDeleteById & doDeleteByQuery are different, which is a bit annoying. It's hard for me to make much sense of it right now. I know the ZK version entirely overrides processCommit but I think there the flow was more clear, at least to me. But on the delete side, I find it confusing that the ZK version overrides doDeleteByQuery. ... gotta go for now Let me know when you're done with the issue and I can try some thing too. It's such a time sink unfortunately. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #528: SOLR-12955 2
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/528 I like that the zk version of processCommit completely overrides the superclass since the logic is much more complicated. But it's a little unfortunate to call super.processCommit which will have some redundant setupRequest. One slight change would be for these method calls to change back to call localCommit() as they did before, but also put the cmd.setVersion logic at the front of localCommit's logic since this is always done first. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #528: SOLR-12955 2
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/528 * DUP.next can be removed as it's redundant with the same in the superclass * DUP.MAX_RETRIES_TO_FOLLOWERS_DEFAULT you accidentally added an 'R' to the ** of the javadoc. * DUP.isLeader method ought to go after the constructor. Generally, the constructor goes before instance methods. * DUP.isIndexChanged() remove this method; strictly internal and it's inconsistent with the fact that there are many fields that don't have accessors, so why should this one. * DUP.versionAdd shouldClone: I think it would be a little simpler to have the second conditional to "shouldClone" instead check if clonedDoc != null. At that point you will only have one var reference to shouldClone and you could inline it to how you initialize it. * DUP.postProcessAdd: I think this method would be better named doDistribAdd * Maybe we can avoid DistributedZkUpdateProcessor overriding processAdd. Somehow checkReplicationTracker() needs to get called. One way to do this would be for setupRequest to check if the updateCommand is an instance of AddUpdateCommand)). * It'd be excellent if we could get some consistency of calling setupRequest -- all processXXX methods should first call setupRequest and if needed an overloaded variant. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #528: SOLR-12955 2
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/528 * DUP.postProcessAdd's javadoc is confusing; maybe drop it. It says "This method can be overridden to tamper with the cmd" but I'm looking at where we override it and I don't see any tampering to this.updateCommand. * DUP: Lets remove the Standalone subclass in favor of making the base class do the standalone functionality; I think it's not really doing much of anything other than relying on the base impl. There's practically no functionality (real code) in the standalone class right now so it's kind of pointless. * DistributedZkUpdateProcessor's constructor that takes a docMerger isn't called externally so lets have just one constructor --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r243999855 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedStandaloneUpdateProcessor.java --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.HashSet; +import java.util.Set; +import java.util.concurrent.CompletionService; +import java.util.concurrent.ExecutorCompletionService; +import java.util.concurrent.Future; + +import org.apache.solr.common.cloud.Replica; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.CommitUpdateCommand; +import org.apache.solr.update.DeleteUpdateCommand; +import org.apache.solr.update.UpdateCommand; +import org.apache.solr.util.TestInjection; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class DistributedStandaloneUpdateProcessor extends DistributedUpdateProcessor { + + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + public DistributedStandaloneUpdateProcessor(SolrQueryRequest req, + SolrQueryResponse rsp, UpdateRequestProcessor next) { +super(req, rsp, next); + } + + public DistributedStandaloneUpdateProcessor(SolrQueryRequest req, + SolrQueryResponse rsp, AtomicUpdateDocumentMerger docMerger, + UpdateRequestProcessor next) { +super(req, rsp, docMerger, next); + } + + @Override + Replica.Type computeReplicaType() { +return Replica.Type.NRT; + } + + @Override + public void processCommit(CommitUpdateCommand cmd) throws IOException { + +assert TestInjection.injectFailUpdateRequests(); + +updateCommand = cmd; + +CompletionService completionService = new ExecutorCompletionService<>(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); +Set> pending = new HashSet<>(); +if (replicaType == Replica.Type.TLOG) { + if (isLeader) { +super.processCommit(cmd); + } +} else if (replicaType == Replica.Type.PULL) { + log.warn("Commit not supported on replicas of type " + Replica.Type.PULL); +} else { + // NRT replicas will always commit + super.processCommit(cmd); +} + } + + @Override + public void processAdd(AddUpdateCommand cmd) throws IOException { +assert TestInjection.injectFailUpdateRequests(); + +updateCommand = cmd; +isLeader = getNonZkLeaderAssumption(req); + +super.processAdd(cmd); + } + + @Override + protected void doDeleteById(DeleteUpdateCommand cmd) throws IOException { +isLeader = getNonZkLeaderAssumption(req); +super.doDeleteById(cmd); + } + + @Override + protected void doDeleteByQuery(DeleteUpdateCommand cmd) throws IOException { +// even in non zk mode, tests simulate updates from a leader +isLeader = getNonZkLeaderAssumption(req); +super.doDeleteByQuery(cmd, null, null); + } + + @Override + public void finish() throws IOException { --- End diff -- nevermind; just leave this be then. Lets keep our changes to more clear/obvious ones. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r243847609 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java --- @@ -1250,23 +1250,10 @@ private UpdateCommand fetchFullUpdateFromLeader(AddUpdateCommand inplaceAdd, lon params.set("onlyIfActive", true); SolrRequest ur = new GenericSolrRequest(METHOD.GET, "/get", params); -String leaderUrl = req.getParams().get(DISTRIB_FROM); - -if (leaderUrl == null) { - // An update we're dependent upon didn't arrive! This is unexpected. Perhaps likely our leader is - // down or partitioned from us for some reason. Lets force refresh cluster state, and request the - // leader for the update. - if (zkController == null) { // we should be in cloud mode, but wtf? could be a unit test -throw new SolrException(ErrorCode.SERVER_ERROR, "Can't find document with id=" + id + ", but fetching from leader " -+ "failed since we're not in cloud mode."); - } - Replica leader; - try { -leader = zkController.getZkStateReader().getLeaderRetry(collection, cloudDesc.getShardId()); - } catch (InterruptedException e) { -throw new SolrException(ErrorCode.SERVER_ERROR, "Exception during fetching from leader.", e); - } - leaderUrl = leader.getCoreUrl(); +String leaderUrl = getLeaderUrl(id); --- End diff -- I looked again and it seems fine. I could be the diff I was looking at confused the matter. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r243847326 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedStandaloneUpdateProcessor.java --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.HashSet; +import java.util.Set; +import java.util.concurrent.CompletionService; +import java.util.concurrent.ExecutorCompletionService; +import java.util.concurrent.Future; + +import org.apache.solr.common.cloud.Replica; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.CommitUpdateCommand; +import org.apache.solr.update.DeleteUpdateCommand; +import org.apache.solr.update.UpdateCommand; +import org.apache.solr.util.TestInjection; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class DistributedStandaloneUpdateProcessor extends DistributedUpdateProcessor { + + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + public DistributedStandaloneUpdateProcessor(SolrQueryRequest req, + SolrQueryResponse rsp, UpdateRequestProcessor next) { +super(req, rsp, next); + } + + public DistributedStandaloneUpdateProcessor(SolrQueryRequest req, + SolrQueryResponse rsp, AtomicUpdateDocumentMerger docMerger, + UpdateRequestProcessor next) { +super(req, rsp, docMerger, next); + } + + @Override + Replica.Type computeReplicaType() { +return Replica.Type.NRT; + } + + @Override + public void processCommit(CommitUpdateCommand cmd) throws IOException { + +assert TestInjection.injectFailUpdateRequests(); + +updateCommand = cmd; + +CompletionService completionService = new ExecutorCompletionService<>(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); +Set> pending = new HashSet<>(); +if (replicaType == Replica.Type.TLOG) { + if (isLeader) { +super.processCommit(cmd); + } +} else if (replicaType == Replica.Type.PULL) { + log.warn("Commit not supported on replicas of type " + Replica.Type.PULL); +} else { + // NRT replicas will always commit + super.processCommit(cmd); +} + } + + @Override + public void processAdd(AddUpdateCommand cmd) throws IOException { +assert TestInjection.injectFailUpdateRequests(); + +updateCommand = cmd; +isLeader = getNonZkLeaderAssumption(req); + +super.processAdd(cmd); + } + + @Override + protected void doDeleteById(DeleteUpdateCommand cmd) throws IOException { +isLeader = getNonZkLeaderAssumption(req); +super.doDeleteById(cmd); + } + + @Override + protected void doDeleteByQuery(DeleteUpdateCommand cmd) throws IOException { +// even in non zk mode, tests simulate updates from a leader +isLeader = getNonZkLeaderAssumption(req); +super.doDeleteByQuery(cmd, null, null); + } + + @Override + public void finish() throws IOException { --- End diff -- isn't this what a base finish() will do? so why subclass. If there really is a reason then a comment here would help --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r243847270 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedStandaloneUpdateProcessor.java --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.HashSet; +import java.util.Set; +import java.util.concurrent.CompletionService; +import java.util.concurrent.ExecutorCompletionService; +import java.util.concurrent.Future; + +import org.apache.solr.common.cloud.Replica; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.CommitUpdateCommand; +import org.apache.solr.update.DeleteUpdateCommand; +import org.apache.solr.update.UpdateCommand; +import org.apache.solr.util.TestInjection; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class DistributedStandaloneUpdateProcessor extends DistributedUpdateProcessor { + + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + public DistributedStandaloneUpdateProcessor(SolrQueryRequest req, + SolrQueryResponse rsp, UpdateRequestProcessor next) { +super(req, rsp, next); + } + + public DistributedStandaloneUpdateProcessor(SolrQueryRequest req, + SolrQueryResponse rsp, AtomicUpdateDocumentMerger docMerger, + UpdateRequestProcessor next) { +super(req, rsp, docMerger, next); + } + + @Override + Replica.Type computeReplicaType() { +return Replica.Type.NRT; + } + + @Override + public void processCommit(CommitUpdateCommand cmd) throws IOException { + +assert TestInjection.injectFailUpdateRequests(); + +updateCommand = cmd; + +CompletionService completionService = new ExecutorCompletionService<>(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); +Set> pending = new HashSet<>(); +if (replicaType == Replica.Type.TLOG) { + if (isLeader) { +super.processCommit(cmd); + } +} else if (replicaType == Replica.Type.PULL) { + log.warn("Commit not supported on replicas of type " + Replica.Type.PULL); +} else { + // NRT replicas will always commit + super.processCommit(cmd); +} + } + + @Override + public void processAdd(AddUpdateCommand cmd) throws IOException { +assert TestInjection.injectFailUpdateRequests(); + +updateCommand = cmd; +isLeader = getNonZkLeaderAssumption(req); + +super.processAdd(cmd); + } + + @Override + protected void doDeleteById(DeleteUpdateCommand cmd) throws IOException { +isLeader = getNonZkLeaderAssumption(req); +super.doDeleteById(cmd); + } + + @Override + protected void doDeleteByQuery(DeleteUpdateCommand cmd) throws IOException { +// even in non zk mode, tests simulate updates from a leader +isLeader = getNonZkLeaderAssumption(req); --- End diff -- I think just call setupRequest instead of getNonZkLeaderAssumption? setupRequest calls that and sets isLeader. This comment goes for all/most of the methods here. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r243844185 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java --- @@ -1250,23 +1250,10 @@ private UpdateCommand fetchFullUpdateFromLeader(AddUpdateCommand inplaceAdd, lon params.set("onlyIfActive", true); SolrRequest ur = new GenericSolrRequest(METHOD.GET, "/get", params); -String leaderUrl = req.getParams().get(DISTRIB_FROM); - -if (leaderUrl == null) { - // An update we're dependent upon didn't arrive! This is unexpected. Perhaps likely our leader is - // down or partitioned from us for some reason. Lets force refresh cluster state, and request the - // leader for the update. - if (zkController == null) { // we should be in cloud mode, but wtf? could be a unit test -throw new SolrException(ErrorCode.SERVER_ERROR, "Can't find document with id=" + id + ", but fetching from leader " -+ "failed since we're not in cloud mode."); - } - Replica leader; - try { -leader = zkController.getZkStateReader().getLeaderRetry(collection, cloudDesc.getShardId()); - } catch (InterruptedException e) { -throw new SolrException(ErrorCode.SERVER_ERROR, "Exception during fetching from leader.", e); - } - leaderUrl = leader.getCoreUrl(); +String leaderUrl = getLeaderUrl(id); --- End diff -- I'm not sure how this diff is a refactoring (keeps semantics identical). For example... the exception message isn't the same. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r243843542 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedStandaloneUpdateProcessor.java --- @@ -42,15 +53,36 @@ public DistributedStandaloneUpdateProcessor(SolrQueryRequest req, } @Override - String getCollectionName(CloudDescriptor cloudDesc) { + String computeCollectionName(CloudDescriptor cloudDesc) { return null; } @Override - Replica.Type getReplicaType(CloudDescriptor cloudDesc) { + Replica.Type computeReplicaType(CloudDescriptor cloudDesc) { return Replica.Type.NRT; } + @Override + public void processCommit(CommitUpdateCommand cmd) throws IOException { + +assert TestInjection.injectFailUpdateRequests(); + +updateCommand = cmd; + +CompletionService completionService = new ExecutorCompletionService<>(req.getCore().getCoreContainer().getUpdateShardHandler().getUpdateExecutor()); --- End diff -- Why is all this stuff here in standalone mode? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #528: SOLR-12955 2
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/528 Looks good. More to do? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r241967674 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java --- @@ -1417,45 +1407,7 @@ private void doDeleteById(DeleteUpdateCommand cmd) throws IOException { return; } -if (zkEnabled && isLeader && !isSubShardLeader) { - DocCollection coll = zkController.getClusterState().getCollection(collection); - List subShardLeaders = getSubShardLeaders(coll, cloudDesc.getShardId(), cmd.getId(), null); - // the list will actually have only one element for an add request - if (subShardLeaders != null && !subShardLeaders.isEmpty()) { -ModifiableSolrParams params = new ModifiableSolrParams(filterParams(req.getParams())); -params.set(DISTRIB_UPDATE_PARAM, DistribPhase.FROMLEADER.toString()); -params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl( -zkController.getBaseUrl(), req.getCore().getName())); -params.set(DISTRIB_FROM_PARENT, cloudDesc.getShardId()); -cmdDistrib.distribDelete(cmd, subShardLeaders, params, true, null, null); - } - - final List nodesByRoutingRules = getNodesByRoutingRules(zkController.getClusterState(), coll, cmd.getId(), null); - if (nodesByRoutingRules != null && !nodesByRoutingRules.isEmpty()) { -ModifiableSolrParams params = new ModifiableSolrParams(filterParams(req.getParams())); -params.set(DISTRIB_UPDATE_PARAM, DistribPhase.FROMLEADER.toString()); -params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl( -zkController.getBaseUrl(), req.getCore().getName())); -params.set(DISTRIB_FROM_COLLECTION, collection); -params.set(DISTRIB_FROM_SHARD, cloudDesc.getShardId()); -cmdDistrib.distribDelete(cmd, nodesByRoutingRules, params, true, null, null); - } -} - -if (nodes != null) { - ModifiableSolrParams params = new ModifiableSolrParams(filterParams(req.getParams())); - params.set(DISTRIB_UPDATE_PARAM, - (isLeader || isSubShardLeader ? DistribPhase.FROMLEADER.toString() - : DistribPhase.TOLEADER.toString())); - params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl( - zkController.getBaseUrl(), req.getCore().getName())); - - if (req.getParams().get(UpdateRequest.MIN_REPFACT) != null) { -// TODO: Kept for rolling upgrades only. Remove in Solr 9 -params.add(UpdateRequest.MIN_REPFACT, req.getParams().get(UpdateRequest.MIN_REPFACT)); - } - cmdDistrib.distribDelete(cmd, nodes, params, false, rollupReplicationTracker, leaderReplicationTracker); -} +postProcessDeleteById(cmd); --- End diff -- I can understand why you took the current approach. Lets put this off towards near the end to see if it's straight-forward or not. Maybe it won't be and we shouldn't do it. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r241967637 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedStandaloneUpdateProcessor.java --- @@ -56,8 +59,15 @@ public void doDeleteByQuery(DeleteUpdateCommand cmd) throws IOException { super.doDeleteByQuery(cmd, Collections.emptyList(), null); } + @Override + protected void postProcessDeleteByQuery(DeleteUpdateCommand cmd, + List replicas, DocCollection coll) throws IOException { +// no op + } + @Override boolean isLeader(UpdateCommand cmd) { +updateCommand = cmd; --- End diff -- +1 to setupRequest name. I don't think it needs to return "isLeader"; it should set this and return void. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r241945926 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java --- @@ -1417,45 +1407,7 @@ private void doDeleteById(DeleteUpdateCommand cmd) throws IOException { return; } -if (zkEnabled && isLeader && !isSubShardLeader) { - DocCollection coll = zkController.getClusterState().getCollection(collection); - List subShardLeaders = getSubShardLeaders(coll, cloudDesc.getShardId(), cmd.getId(), null); - // the list will actually have only one element for an add request - if (subShardLeaders != null && !subShardLeaders.isEmpty()) { -ModifiableSolrParams params = new ModifiableSolrParams(filterParams(req.getParams())); -params.set(DISTRIB_UPDATE_PARAM, DistribPhase.FROMLEADER.toString()); -params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl( -zkController.getBaseUrl(), req.getCore().getName())); -params.set(DISTRIB_FROM_PARENT, cloudDesc.getShardId()); -cmdDistrib.distribDelete(cmd, subShardLeaders, params, true, null, null); - } - - final List nodesByRoutingRules = getNodesByRoutingRules(zkController.getClusterState(), coll, cmd.getId(), null); - if (nodesByRoutingRules != null && !nodesByRoutingRules.isEmpty()) { -ModifiableSolrParams params = new ModifiableSolrParams(filterParams(req.getParams())); -params.set(DISTRIB_UPDATE_PARAM, DistribPhase.FROMLEADER.toString()); -params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl( -zkController.getBaseUrl(), req.getCore().getName())); -params.set(DISTRIB_FROM_COLLECTION, collection); -params.set(DISTRIB_FROM_SHARD, cloudDesc.getShardId()); -cmdDistrib.distribDelete(cmd, nodesByRoutingRules, params, true, null, null); - } -} - -if (nodes != null) { - ModifiableSolrParams params = new ModifiableSolrParams(filterParams(req.getParams())); - params.set(DISTRIB_UPDATE_PARAM, - (isLeader || isSubShardLeader ? DistribPhase.FROMLEADER.toString() - : DistribPhase.TOLEADER.toString())); - params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl( - zkController.getBaseUrl(), req.getCore().getName())); - - if (req.getParams().get(UpdateRequest.MIN_REPFACT) != null) { -// TODO: Kept for rolling upgrades only. Remove in Solr 9 -params.add(UpdateRequest.MIN_REPFACT, req.getParams().get(UpdateRequest.MIN_REPFACT)); - } - cmdDistrib.distribDelete(cmd, nodes, params, false, rollupReplicationTracker, leaderReplicationTracker); -} +postProcessDeleteById(cmd); --- End diff -- I wonder if we actually need this postPorcessDeleteById(cmd) method... could the subclass put its logic after calling super.doDeleteById()? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r241945549 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedZkUpdateProcessor.java --- @@ -223,6 +289,7 @@ private String getLeaderUrlZk(String id) { @Override boolean isLeader(UpdateCommand cmd) { +updateCommand = cmd; --- End diff -- again; this line makes me frown --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r241945502 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedStandaloneUpdateProcessor.java --- @@ -56,8 +59,15 @@ public void doDeleteByQuery(DeleteUpdateCommand cmd) throws IOException { super.doDeleteByQuery(cmd, Collections.emptyList(), null); } + @Override + protected void postProcessDeleteByQuery(DeleteUpdateCommand cmd, + List replicas, DocCollection coll) throws IOException { +// no op + } + @Override boolean isLeader(UpdateCommand cmd) { +updateCommand = cmd; --- End diff -- ouch this piece is ugly. a method like "is..." (or get...) should not have side effects. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #528: SOLR-12955 2
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/528#discussion_r241945460 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedZkUpdateProcessor.java --- @@ -58,6 +75,127 @@ String getCollectionName(CloudDescriptor cloudDesc) { return cloudDesc.getReplicaType(); } + @Override + public void doDeleteByQuery(DeleteUpdateCommand cmd) throws IOException { +// even in non zk mode, tests simulate updates from a leader --- End diff -- this particular comment is I think intended for the standalone mode --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #519: WIP: SOLR-12955: rebase and split changes to smaller...
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/519 I'm not sure if the "base" functionality needed to be in a new class... I could imagine simply adding subclasses and leave DUP named as it is with it's base functionality. WDYT? It may help keep git history better too. p.s. I'm GMT+1 this week --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #519: WIP: SOLR-12955: rebase and split changes to smaller...
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/519 good start --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #489: SOLR-12955: separate DistribUpdateProcessor to Zk an...
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/489 There is no "development" branch, so I don't know what you mean by that. You don't _have_ to run the full test suite after each commit, but _ideally_ it ought to pass on each commit if it were to be run. I don't think I will do it. It's up to you to pause on each commit for my input but I don't think necessary since I'm overall happy with the final result, I just want to see the incremental steps to observe where the code is going at each juncture. When I finally commit something, I'm not sure yet if I'm going to do it in one commit or do it separately... but don't let that stop you from breaking it into as small steps as you please. Please use commit comments to point out anything non-obvious. The purpose here is mostly to show what went where since a single all-in-one change is kinda unreviewable (or at least would require committing in blind faith). --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #489: SOLR-12955: separate DistribUpdateProcessor t...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/489#discussion_r238527873 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedStandAloneUpdateProcessor.java --- @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import java.io.IOException; + +import org.apache.solr.common.cloud.Replica; +import org.apache.solr.core.CoreContainer; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.response.SolrQueryResponse; +import org.apache.solr.update.AddUpdateCommand; +import org.apache.solr.update.DeleteUpdateCommand; +import org.apache.solr.update.UpdateCommand; + +import static org.apache.solr.update.processor.DistributingUpdateProcessorFactory.DISTRIB_UPDATE_PARAM; + +public class DistributedStandAloneUpdateProcessor extends DistributedUpdateProcessor { --- End diff -- preferably lowercase the 'A' thus "DistributedStandaloneUpdateProcessor" --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #489: SOLR-12955: separate DistribUpdateProcessor t...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/489#discussion_r238528464 --- Diff: solr/core/src/java/org/apache/solr/update/processor/CdcrUpdateProcessor.java --- @@ -16,115 +16,12 @@ */ package org.apache.solr.update.processor; -import java.io.IOException; -import java.lang.invoke.MethodHandles; - -import org.apache.solr.common.params.CommonParams; -import org.apache.solr.common.params.ModifiableSolrParams; -import org.apache.solr.common.params.SolrParams; -import org.apache.solr.request.SolrQueryRequest; -import org.apache.solr.response.SolrQueryResponse; -import org.apache.solr.update.AddUpdateCommand; -import org.apache.solr.update.DeleteUpdateCommand; -import org.apache.solr.update.UpdateCommand; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - /** * - * Extends {@link org.apache.solr.update.processor.DistributedUpdateProcessor} to force peer sync logic - * for every updates. This ensures that the version parameter sent by the source cluster is kept - * by the target cluster. + * An Interface for Shared params for Cdcr Update Processors --- End diff -- Do we really need this interface just for this param? I think not. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #489: SOLR-12955: separate DistribUpdateProcessor t...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/489#discussion_r238527608 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessorFactory.java --- @@ -50,8 +51,16 @@ public static void addParamToDistributedRequestWhitelist(final SolrQueryRequest public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { // note: will sometimes return DURP (no overhead) instead of wrapping --- End diff -- this comment is now misplaced. Before it was immediately prior to TimeRoutedAliasUpdateProcessor.wrap and I think it should be maintained as-such. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #455: SOLR-12638
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/455 What does "so _root_ is not hard-coded into childless documents" mean? If you think SOLR-5211 is missing something, can you please post a patch file to the issue and explain what you're adding? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #501: SOLR-5211
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/501#discussion_r235022057 --- Diff: solr/core/src/test/org/apache/solr/BlockUpdateTest.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr; + +import java.util.List; + +import org.apache.solr.SolrJettyTestBase; +import org.apache.solr.client.solrj.SolrClient; +import org.apache.solr.client.solrj.SolrQuery; +import org.apache.solr.common.SolrDocument; +import org.apache.solr.common.SolrDocumentList; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.params.CommonParams; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; + +import static org.hamcrest.CoreMatchers.is; + +public class BlockUpdateTest extends SolrJettyTestBase { --- End diff -- I think it tests with / without root field then, so it's name could reflect _that_. e.g. RootFieldTest --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #501: SOLR-5211
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/501#discussion_r235001356 --- Diff: solr/core/src/test/org/apache/solr/BlockUpdateTest.java --- @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr; + +import java.util.List; + +import org.apache.solr.SolrJettyTestBase; +import org.apache.solr.client.solrj.SolrClient; +import org.apache.solr.client.solrj.SolrQuery; +import org.apache.solr.common.SolrDocument; +import org.apache.solr.common.SolrDocumentList; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.params.CommonParams; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; + +import static org.hamcrest.CoreMatchers.is; + +public class BlockUpdateTest extends SolrJettyTestBase { --- End diff -- Here's that "Block" terminology again. Can the tests here merge with another test for nested docs? (one you made?) --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #501: SOLR-5211
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/501 Nice. Do we even need the test config / schema at all? If they don't add anything, we shouldn't have them. Also I see there is still some "Block" terminology added here we can avoid to prefer "Nested". --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #501: SOLR-5211
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/501#discussion_r234869970 --- Diff: solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java --- @@ -194,20 +197,33 @@ public String getHashableId() { return null; // caller should call getLuceneDocument() instead } -String rootId = getHashableId(); - -boolean isVersion = version != 0; +final String rootId = getHashableId(); +final boolean hasVersion = version != 0; +final SolrInputField versionSif = hasVersion? solrDoc.get(CommonParams.VERSION_FIELD): null; --- End diff -- What I meant was that *if* the solrDoc has a \_version\_ then we can copy it to all nested children. We need not examine this.version at all. At least this is my understanding of reading the code; it'd be good to ensure tests pass. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #501: SOLR-5211
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/501 Also please rename "block" terminology to "nested" where possible. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #501: SOLR-5211
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/501#discussion_r234646277 --- Diff: solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java --- @@ -975,6 +976,14 @@ private void updateDocOrDocValues(AddUpdateCommand cmd, IndexWriter writer) thro } } + boolean forciblyAddBlockId() { --- End diff -- Come to think of it, who cares if root is populated or not for "old" schemas? Lets just do it always; no conditional test. If someone doesn't want root to be populated, that same person probably doesn't have child docs and should just as well omit the root field from their schema. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #501: SOLR-5211
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/501#discussion_r234498494 --- Diff: solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java --- @@ -194,20 +203,22 @@ public String getHashableId() { return null; // caller should call getLuceneDocument() instead } -String rootId = getHashableId(); - -boolean isVersion = version != 0; - +final String rootId = getHashableId(); for (SolrInputDocument sdoc : all) { - sdoc.setField(IndexSchema.ROOT_FIELD_NAME, rootId); - if(isVersion) sdoc.setField(CommonParams.VERSION_FIELD, version); - // TODO: if possible concurrent modification exception (if SolrInputDocument not cloned and is being forwarded to replicas) - // then we could add this field to the generated lucene document instead. + addBlockId(sdoc, rootId); } return () -> all.stream().map(sdoc -> DocumentBuilder.toDocument(sdoc, req.getSchema())).iterator(); } + private void addBlockId(SolrInputDocument sdoc, String rootId) { +boolean isVersion = version != 0; +sdoc.setField(IndexSchema.ROOT_FIELD_NAME, rootId); +if(isVersion) sdoc.setField(CommonParams.VERSION_FIELD, version); --- End diff -- although it may be a slight bit of scope creep; I think this logic would be more clear why it's needed if it merely propagated the SolrInputField from the root doc to the child docs. My understanding is that this will wind up being equivalent to what happens now, which is very non-obvious; it took a bunch of digging on my part to see. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #501: SOLR-5211
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/501#discussion_r234496047 --- Diff: solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java --- @@ -96,12 +96,21 @@ public SolrInputDocument getSolrInputDocument() { * Any changes made to the returned Document will not be reflected in the SolrInputDocument, or future calls to this * method. * Note that the behavior of this is sensitive to {@link #isInPlaceUpdate()}. + * @param withBlockId If true, then block id is forcibly added to the doc */ - public Document getLuceneDocument() { + Document getLuceneDocument(boolean withBlockId) { --- End diff -- actually I'm not sure we actually need to overload getLuceneDocument with this boolean after all. if "isInPlaceUpdate()" then don't add root but otherwise add it. That'll be fine; right? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #501: SOLR-5211
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/501#discussion_r234496573 --- Diff: solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java --- @@ -975,6 +976,14 @@ private void updateDocOrDocValues(AddUpdateCommand cmd, IndexWriter writer) thro } } + boolean forciblyAddBlockId() { --- End diff -- this is here in part so that we can test _not_ doing this (i.e. current behavior)... but I don't think it's worth it. If we want to _not_ do it conditionally then it could be via luceneMatchVersion but I don't care either way as it's minor to add it unconditionally. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #501: SOLR-5211
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/501#discussion_r234495484 --- Diff: solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java --- @@ -96,12 +96,21 @@ public SolrInputDocument getSolrInputDocument() { * Any changes made to the returned Document will not be reflected in the SolrInputDocument, or future calls to this * method. * Note that the behavior of this is sensitive to {@link #isInPlaceUpdate()}. + * @param withBlockId If true, then block id is forcibly added to the doc */ - public Document getLuceneDocument() { + Document getLuceneDocument(boolean withBlockId) { --- End diff -- can we call this addRootField ? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #501: SOLR-5211
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/501#discussion_r234496324 --- Diff: solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java --- @@ -194,20 +203,22 @@ public String getHashableId() { return null; // caller should call getLuceneDocument() instead } -String rootId = getHashableId(); - -boolean isVersion = version != 0; - +final String rootId = getHashableId(); for (SolrInputDocument sdoc : all) { - sdoc.setField(IndexSchema.ROOT_FIELD_NAME, rootId); - if(isVersion) sdoc.setField(CommonParams.VERSION_FIELD, version); - // TODO: if possible concurrent modification exception (if SolrInputDocument not cloned and is being forwarded to replicas) - // then we could add this field to the generated lucene document instead. + addBlockId(sdoc, rootId); } return () -> all.stream().map(sdoc -> DocumentBuilder.toDocument(sdoc, req.getSchema())).iterator(); } + private void addBlockId(SolrInputDocument sdoc, String rootId) { --- End diff -- Can we rename this addRootField? and separate out the version part to addVersionField which should _not_ be called by this method; the caller should. addVersionField only needs to be called for the nested case, not by getLuceneDocument. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #501: SOLR-5211
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/501#discussion_r234496354 --- Diff: solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java --- @@ -194,20 +203,22 @@ public String getHashableId() { return null; // caller should call getLuceneDocument() instead } -String rootId = getHashableId(); - -boolean isVersion = version != 0; - +final String rootId = getHashableId(); for (SolrInputDocument sdoc : all) { - sdoc.setField(IndexSchema.ROOT_FIELD_NAME, rootId); - if(isVersion) sdoc.setField(CommonParams.VERSION_FIELD, version); - // TODO: if possible concurrent modification exception (if SolrInputDocument not cloned and is being forwarded to replicas) - // then we could add this field to the generated lucene document instead. + addBlockId(sdoc, rootId); } return () -> all.stream().map(sdoc -> DocumentBuilder.toDocument(sdoc, req.getSchema())).iterator(); } + private void addBlockId(SolrInputDocument sdoc, String rootId) { +boolean isVersion = version != 0; +sdoc.setField(IndexSchema.ROOT_FIELD_NAME, rootId); +if(isVersion) sdoc.setField(CommonParams.VERSION_FIELD, version); --- End diff -- needs a comment as said in JIRA why we add the version here --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r231566214 --- Diff: solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java --- @@ -259,9 +264,8 @@ public boolean doInPlaceUpdateMerge(AddUpdateCommand cmd, Set updatedFie SolrInputDocument oldDocument = RealTimeGetComponent.getInputDocument (cmd.getReq().getCore(), idBytes, null, // don't want the version to be returned - true, // avoid stored fields from index updatedFields, - true); // resolve the full document + RealTimeGetComponent.Resolution.FULL_DOC); // get the partial document for the in-place update --- End diff -- it is not a partial document; the original comment was correct. Now with an enum, we don't even need a comment as the enum has a self explanatory name. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r231567341 --- Diff: solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java --- @@ -1164,6 +1225,31 @@ public void processGetUpdates(ResponseBuilder rb) throws IOException return new ArrayList<>(versionsToRet); } + /** + * + *Lookup strategy for {@link #getInputDocument(SolrCore, BytesRef, AtomicLong, Set, Resolution)}. + * + * + *{@link #FULL_DOC} + *{@link #DOC_CHILDREN} + *{@link #FULL_HIERARCHY} + * + */ + public static enum Resolution { --- End diff -- Suggested new names: "DOC", "DOC_WITH_CHILDREN", "ROOT_WITH_CHILDREN". Resolution is now only used where there is no "in place" so no need to mention partial documents. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r231127727 --- Diff: solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java --- @@ -639,12 +651,32 @@ public static SolrInputDocument getInputDocument(SolrCore core, BytesRef idBytes sid = new SolrInputDocument(); } else { Document luceneDocument = docFetcher.doc(docid); - sid = toSolrInputDocument(luceneDocument, core.getLatestSchema()); + sid = toSolrInputDocument(luceneDocument, schema); } -if (onlyTheseNonStoredDVs != null) { - docFetcher.decorateDocValueFields(sid, docid, onlyTheseNonStoredDVs); -} else { - docFetcher.decorateDocValueFields(sid, docid, docFetcher.getNonStoredDVsWithoutCopyTargets()); +ensureDocFieldsDecorated(onlyTheseNonStoredDVs, sid, docid, docFetcher, resolveRootDoc || +resolveChildren || schema.hasExplicitField(IndexSchema.NEST_PATH_FIELD_NAME)); +SolrInputField rootField = sid.getField(IndexSchema.ROOT_FIELD_NAME); +if((resolveChildren || resolveRootDoc) && schema.isUsableForChildDocs() && rootField!=null) { + // doc is part of a nested structure + String id = resolveRootDoc? (String) rootField.getFirstValue(): (String) sid.getField(idField.getName()).getFirstValue(); + ModifiableSolrParams params = new ModifiableSolrParams() + .set("fl", "*, _nest_path_, [child]") + .set("limit", "-1"); + SolrQueryRequest nestedReq = new LocalSolrQueryRequest(core, params); + final BytesRef rootIdBytes = new BytesRef(id); + final int rootDocId = searcher.getFirstMatch(new Term(idField.getName(), rootIdBytes)); + final DocTransformer childDocTransformer = TransformerFactory.defaultFactories.get("child").create("child", params, nestedReq); --- End diff -- no, use `core.getTransformerFactory("child")` --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #:
Github user dsmiley commented on the pull request: https://github.com/apache/lucene-solr/commit/43d7f5d104802bc3281d5995c6c2d71bebf1f369#commitcomment-31121980 I understand it's a design decision. I'm not certain about the correctness of the logic; I've been leaving that to you here; DUP is a mess to untangle. I think your method name is still inappropriate -- for example "if (sdoc.hasChildDocuments()) return true" thus even it's not an update, we still can return true. So instead lets name the method based on why the caller cares -- it wants to know wether to get the version or not. Put on DUP. The method is not clean enough or general enough to go on the Cmd. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #:
Github user dsmiley commented on the pull request: https://github.com/apache/lucene-solr/commit/43d7f5d104802bc3281d5995c6c2d71bebf1f369#commitcomment-31096103 isNestedAtomicUpdate seems to return true even it it isn't an atomic update. So it's name isn't right or the method is doing too much at once (i.e. should be split up?). I'm not sure why it's needed -- I mean I see where you're calling it but I don't follow the rationale for its existence. 'course the overall code in this class is hideously complicated even before we came along so that doesn't help ;-) --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r228197896 --- Diff: solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java --- @@ -609,9 +618,10 @@ public static SolrInputDocument getInputDocument(SolrCore core, BytesRef idBytes * @param resolveFullDocument In case the document is fetched from the tlog, it could only be a partial document if the last update * was an in-place update. In that case, should this partial document be resolved to a full document (by following * back prevPointer/prevVersion)? + * @param resolveBlock Check whether the document is part of a block. If so, return the whole block. */ public static SolrInputDocument getInputDocument(SolrCore core, BytesRef idBytes, AtomicLong versionReturned, boolean avoidRetrievingStoredFields, - Set onlyTheseNonStoredDVs, boolean resolveFullDocument) throws IOException { + Set onlyTheseNonStoredDVs, boolean resolveFullDocument, boolean resolveBlock) throws IOException { SolrInputDocument sid = null; --- End diff -- @moshebla I'm concerned about this method having so many parameters... it's a code smell. resolveBlock & resolveChildren booleans... not sure how they differ. Isn't resolveChildren enough? if resolveChildren true, then isn't effectively resolveFullDocument also true? (if so the constraint could be documented and enforced with an assertion). versionReturned is a dubious parameter I'm skeptical needs to be here. I'm aware you didn't add it, but the number of parameters is getting troublingly long with your changes; it's good to review what's needed. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r228174440 --- Diff: solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java --- @@ -262,6 +263,11 @@ private void flattenAnonymous(List unwrappedDocs, SolrInputDo flattenAnonymous(unwrappedDocs, currentDoc, false); } + public String getRouteFieldVal() { --- End diff -- What code duplication? I think we need to standardize/harmonize route a bit. Notice UpdateCommand has setRoute & getRoute. Lets initialize route in the constructor here by the presence of \_route\_ in the params. Then lets not look for _route_ in params later since we know we can get it here. Then I think `org.apache.solr.update.processor.DistributedUpdateProcessor#setupRequest(java.lang.String, org.apache.solr.common.SolrInputDocument)` can be removed and instead insist everyone call the overloaded version that takes a route, and each caller looks up the route from the command. It's not clear to me if "null" actually should be passed to route in any circumstance. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r228026284 --- Diff: solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java --- @@ -262,6 +263,11 @@ private void flattenAnonymous(List unwrappedDocs, SolrInputDo flattenAnonymous(unwrappedDocs, currentDoc, false); } + public String getRouteFieldVal() { --- End diff -- I'm skeptical of this method. It's name seems innocent enough looking at the code here. But then also consider some collections have a "router.field" and this method is named in such a way that one would think this method returns that field's value... yet it does not. Some callers put this into a variable named "id" or similar. Given that, I propose you remove it but incorporate the logic into getHashableId which seems the proper place for it. It _is_ the hashable Id... the hashable ID of a nested doc is it's root. But... I do also wonder if we need this at all. Somewhere Solr already has code that looks at \_route\_ and acts on that if present. Perhaps the code path for an atomic update isn't doing this yet but should do it? Then we wouldn't need this change to AddUpdateCommand. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #455: SOLR-12638
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/455 As a general point, I feel we should prefer "nested" terminology instead of "block". If we were purely working within Lucene then I think "block" might be okay but at the Solr layer people see this stuff as "nested" docs. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r226709714 --- Diff: solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateBlockTest.java --- @@ -59,39 +52,147 @@ public void before() { @Test public void testMergeChildDoc() throws Exception { -SolrInputDocument doc = new SolrInputDocument(); -doc.setField("id", "1"); -doc.setField("cat_ss", new String[]{"aaa", "ccc"}); -doc.setField("child", Collections.singletonList(sdoc("id", "2", "cat_ss", "child"))); +SolrInputDocument newChildDoc = sdoc("id", "3", "cat_ss", "child"); +SolrInputDocument addedDoc = sdoc("id", "1", +"cat_ss", Collections.singletonMap("add", "bbb"), +"child", Collections.singletonMap("add", sdocs(newChildDoc))); + +SolrInputDocument dummyBlock = sdoc("id", "1", +"cat_ss", new ArrayList<>(Arrays.asList("aaa", "ccc")), +"_root_", "1", "child", new ArrayList<>(sdocs(addedDoc))); +dummyBlock.removeField(VERSION); + +SolrInputDocument preMergeDoc = new SolrInputDocument(dummyBlock); +AtomicUpdateDocumentMerger docMerger = new AtomicUpdateDocumentMerger(req()); +docMerger.merge(addedDoc, dummyBlock); +assertEquals("merged document should have the same id", preMergeDoc.getFieldValue("id"), dummyBlock.getFieldValue("id")); +assertDocContainsSubset(preMergeDoc, dummyBlock); +assertDocContainsSubset(addedDoc, dummyBlock); +assertDocContainsSubset(newChildDoc, (SolrInputDocument) ((List) dummyBlock.getFieldValues("child")).get(1)); +assertEquals(dummyBlock.getFieldValue("id"), dummyBlock.getFieldValue("id")); + } + + @Test + public void testBlockAtomicQuantities() throws Exception { +SolrInputDocument doc = sdoc("id", "1", "string_s", "root"); addDoc(adoc(doc), "nested-rtg"); -BytesRef rootDocId = new BytesRef("1"); -SolrCore core = h.getCore(); -SolrInputDocument block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); -// assert block doc has child docs -assertTrue(block.containsKey("child")); +List docs = IntStream.range(10, 20).mapToObj(x -> sdoc("id", String.valueOf(x), "string_s", "child")).collect(Collectors.toList()); +doc = sdoc("id", "1", "children", Collections.singletonMap("add", docs)); +addAndGetVersion(doc, params("update.chain", "nested-rtg", "wt", "json")); -assertJQ(req("q","id:1") -,"/response/numFound==0" +assertU(commit()); + +assertJQ(req("q", "_root_:1"), +"/response/numFound==11"); + +assertJQ(req("q", "string_s:child", "fl", "*", "rows", "100"), +"/response/numFound==10"); + +// ensure updates work when block has more than 10 children +for(int i = 10; i < 20; ++i) { + System.out.println("indexing " + i); + docs = IntStream.range(i * 10, (i * 10) + 5).mapToObj(x -> sdoc("id", String.valueOf(x), "string_s", "grandChild")).collect(Collectors.toList()); + doc = sdoc("id", String.valueOf(i), "grandChildren", Collections.singletonMap("add", docs)); + addAndGetVersion(doc, params("update.chain", "nested-rtg", "wt", "json")); + assertU(commit()); +} + +assertJQ(req("q", "id:114", "fl", "*", "rows", "100"), --- End diff -- Why set the "fl" or "rows" in these queries? Your assertion only checks numFound and not the content of those that were found. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r226709264 --- Diff: solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateBlockTest.java --- @@ -59,39 +52,147 @@ public void before() { @Test public void testMergeChildDoc() throws Exception { -SolrInputDocument doc = new SolrInputDocument(); -doc.setField("id", "1"); -doc.setField("cat_ss", new String[]{"aaa", "ccc"}); -doc.setField("child", Collections.singletonList(sdoc("id", "2", "cat_ss", "child"))); +SolrInputDocument newChildDoc = sdoc("id", "3", "cat_ss", "child"); +SolrInputDocument addedDoc = sdoc("id", "1", +"cat_ss", Collections.singletonMap("add", "bbb"), +"child", Collections.singletonMap("add", sdocs(newChildDoc))); + +SolrInputDocument dummyBlock = sdoc("id", "1", +"cat_ss", new ArrayList<>(Arrays.asList("aaa", "ccc")), +"_root_", "1", "child", new ArrayList<>(sdocs(addedDoc))); +dummyBlock.removeField(VERSION); + +SolrInputDocument preMergeDoc = new SolrInputDocument(dummyBlock); +AtomicUpdateDocumentMerger docMerger = new AtomicUpdateDocumentMerger(req()); +docMerger.merge(addedDoc, dummyBlock); +assertEquals("merged document should have the same id", preMergeDoc.getFieldValue("id"), dummyBlock.getFieldValue("id")); +assertDocContainsSubset(preMergeDoc, dummyBlock); +assertDocContainsSubset(addedDoc, dummyBlock); +assertDocContainsSubset(newChildDoc, (SolrInputDocument) ((List) dummyBlock.getFieldValues("child")).get(1)); +assertEquals(dummyBlock.getFieldValue("id"), dummyBlock.getFieldValue("id")); + } + + @Test + public void testBlockAtomicQuantities() throws Exception { +SolrInputDocument doc = sdoc("id", "1", "string_s", "root"); addDoc(adoc(doc), "nested-rtg"); -BytesRef rootDocId = new BytesRef("1"); -SolrCore core = h.getCore(); -SolrInputDocument block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); -// assert block doc has child docs -assertTrue(block.containsKey("child")); +List docs = IntStream.range(10, 20).mapToObj(x -> sdoc("id", String.valueOf(x), "string_s", "child")).collect(Collectors.toList()); +doc = sdoc("id", "1", "children", Collections.singletonMap("add", docs)); +addAndGetVersion(doc, params("update.chain", "nested-rtg", "wt", "json")); -assertJQ(req("q","id:1") -,"/response/numFound==0" +assertU(commit()); + +assertJQ(req("q", "_root_:1"), +"/response/numFound==11"); + +assertJQ(req("q", "string_s:child", "fl", "*", "rows", "100"), +"/response/numFound==10"); + +// ensure updates work when block has more than 10 children --- End diff -- Why is 10 special? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224133917 --- Diff: solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateBlockTest.java --- @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.List; + +import org.apache.lucene.util.BytesRef; +import org.apache.solr.SolrTestCaseJ4; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.core.SolrCore; +import org.apache.solr.handler.component.RealTimeGetComponent; +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; + +public class AtomicUpdateBlockTest extends SolrTestCaseJ4 { + + private final static String VERSION = "_version_"; + private static String PREVIOUS_ENABLE_UPDATE_LOG_VALUE; + + @BeforeClass + public static void beforeTests() throws Exception { +PREVIOUS_ENABLE_UPDATE_LOG_VALUE = System.getProperty("enable.update.log"); +System.setProperty("enable.update.log", "true"); +initCore("solrconfig-block-atomic-update.xml", "schema-nest.xml"); // use "nest" schema + } + + @AfterClass + public static void afterTests() throws Exception { +// restore enable.update.log +System.setProperty("enable.update.log", PREVIOUS_ENABLE_UPDATE_LOG_VALUE); + } + + @Before + public void before() { +clearIndex(); +assertU(commit()); + } + + @Test + public void testMergeChildDoc() throws Exception { +SolrInputDocument doc = new SolrInputDocument(); +doc.setField("id", "1"); +doc.setField("cat_ss", new String[]{"aaa", "ccc"}); +doc.setField("child", Collections.singletonList(sdoc("id", "2", "cat_ss", "child"))); +addDoc(adoc(doc), "nested-rtg"); + +BytesRef rootDocId = new BytesRef("1"); +SolrCore core = h.getCore(); +SolrInputDocument block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +// assert block doc has child docs +assertTrue(block.containsKey("child")); + +assertJQ(req("q","id:1") +,"/response/numFound==0" +); + +// commit the changes +assertU(commit()); + +SolrInputDocument newChildDoc = sdoc("id", "3", "cat_ss", "child"); +SolrInputDocument addedDoc = sdoc("id", "1", +"cat_ss", Collections.singletonMap("add", "bbb"), +"child", Collections.singletonMap("add", sdocs(newChildDoc))); +block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +block.removeField(VERSION); +SolrInputDocument preMergeDoc = new SolrInputDocument(block); +AtomicUpdateDocumentMerger docMerger = new AtomicUpdateDocumentMerger(req()); +docMerger.merge(addedDoc, block); +assertEquals("merged document should have the same id", preMergeDoc.getFieldValue("id"), block.getFieldValue("id")); +assertDocContainsSubset(preMergeDoc, block); +assertDocContainsSubset(addedDoc, block); +assertDocContainsSubset(newChildDoc, (SolrInputDocument) ((List) block.getFieldValues("child")).get(1)); +assertEquals(doc.getFieldValue("id"), block.getFieldValue("id")); + } + + @Test + public void testBlockAtom
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224127559 --- Diff: solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateBlockTest.java --- @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.List; + +import org.apache.lucene.util.BytesRef; +import org.apache.solr.SolrTestCaseJ4; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.core.SolrCore; +import org.apache.solr.handler.component.RealTimeGetComponent; +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; + +public class AtomicUpdateBlockTest extends SolrTestCaseJ4 { + + private final static String VERSION = "_version_"; + private static String PREVIOUS_ENABLE_UPDATE_LOG_VALUE; + + @BeforeClass + public static void beforeTests() throws Exception { +PREVIOUS_ENABLE_UPDATE_LOG_VALUE = System.getProperty("enable.update.log"); +System.setProperty("enable.update.log", "true"); --- End diff -- The `solrconfig-block-atomic-update.xml` file is not toggled by this system property (perhaps others are). Why set this system property? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r223747918 --- Diff: solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java --- @@ -442,5 +442,58 @@ protected void doRemoveRegex(SolrInputDocument toDoc, SolrInputField sif, Object } return patterns; } + + private Object getNativeFieldValue(String fieldName, Object val) { +if(isChildDoc(val)) { + return val; +} +SchemaField sf = schema.getField(fieldName); +return sf.getType().toNativeType(val); + } + + private static boolean isChildDoc(Object obj) { +if(!(obj instanceof Collection)) { + return obj instanceof SolrDocumentBase; +} +Collection objValues = (Collection) obj; +if(objValues.size() == 0) { + return false; +} +return objValues.iterator().next() instanceof SolrDocumentBase; + } + + private void removeObj(Collection original, Object toRemove, String fieldName) { +if(isChildDoc(toRemove)) { + removeChildDoc(original, (SolrInputDocument) toRemove); +} else { + original.remove(getNativeFieldValue(fieldName, toRemove)); +} + } + + private static void removeChildDoc(Collection original, SolrInputDocument docToRemove) { +for(SolrInputDocument doc: (Collection) original) { + if(isDerivedFromDoc(doc, docToRemove)) { +original.remove(doc); +return; + } +} + } + + /** + * + * @param fullDoc the document to be tested + * @param subDoc the sub document that should be a subset of fullDoc + * @return whether subDoc is a subset of fullDoc + */ + private static boolean isDerivedFromDoc(SolrInputDocument fullDoc, SolrInputDocument subDoc) { +for(SolrInputField subSif: subDoc) { + String fieldName = subSif.getName(); + if(!fullDoc.containsKey(fieldName)) return false; --- End diff -- This results in a double-lookup of the values with the next line. Remove this line and after the next one simply do a null-check. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224306158 --- Diff: solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java --- @@ -639,12 +650,30 @@ public static SolrInputDocument getInputDocument(SolrCore core, BytesRef idBytes sid = new SolrInputDocument(); } else { Document luceneDocument = docFetcher.doc(docid); - sid = toSolrInputDocument(luceneDocument, core.getLatestSchema()); + sid = toSolrInputDocument(luceneDocument, schema); } -if (onlyTheseNonStoredDVs != null) { - docFetcher.decorateDocValueFields(sid, docid, onlyTheseNonStoredDVs); -} else { - docFetcher.decorateDocValueFields(sid, docid, docFetcher.getNonStoredDVsWithoutCopyTargets()); +ensureDocDecorated(onlyTheseNonStoredDVs, sid, docid, docFetcher, resolveBlock || schema.hasExplicitField(IndexSchema.NEST_PATH_FIELD_NAME)); +SolrInputField rootField; +if(resolveBlock && schema.isUsableForChildDocs() && (rootField = sid.getField(IndexSchema.ROOT_FIELD_NAME))!=null) { + // doc is part of a nested structure + ModifiableSolrParams params = new ModifiableSolrParams() + .set("q", core.getLatestSchema().getUniqueKeyField().getName()+ ":" +rootField.getFirstValue()) --- End diff -- It seems the LocalSolrQueryRequest here is a dummy needed to satisfy some of the methods below. This threw me; there should be comments and/or choice of var names (e.g. dummyReq) to reflect this. "q" isn't needed; just the "fl". It seems we don't even need the "fl" here since that can be supplied as the first parameter to SolrReturnFields, which seems better if it works. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224306892 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java --- @@ -1360,24 +1385,47 @@ boolean getUpdatedDocument(AddUpdateCommand cmd, long versionOnUpdate) throws IO } // full (non-inplace) atomic update +final boolean isNestedSchema = req.getSchema().isUsableForChildDocs(); SolrInputDocument sdoc = cmd.getSolrInputDocument(); BytesRef id = cmd.getIndexedId(); -SolrInputDocument oldDoc = RealTimeGetComponent.getInputDocument(cmd.getReq().getCore(), id); +SolrInputDocument blockDoc = RealTimeGetComponent.getInputDocument(cmd.getReq().getCore(), id, null, +false, null, true, true); -if (oldDoc == null) { - // create a new doc by default if an old one wasn't found - if (versionOnUpdate <= 0) { -oldDoc = new SolrInputDocument(); - } else { +if (blockDoc == null) { + if (versionOnUpdate > 0) { // could just let the optimistic locking throw the error throw new SolrException(ErrorCode.CONFLICT, "Document not found for update. id=" + cmd.getPrintableId()); } } else { - oldDoc.remove(CommonParams.VERSION_FIELD); + blockDoc.remove(CommonParams.VERSION_FIELD); } -cmd.solrDoc = docMerger.merge(sdoc, oldDoc); +SolrInputDocument mergedDoc; +if(idField == null || blockDoc == null) { + // create a new doc by default if an old one wasn't found + mergedDoc = docMerger.merge(sdoc, new SolrInputDocument()); +} else { + if(isNestedSchema && req.getSchema().hasExplicitField(IndexSchema.NEST_PATH_FIELD_NAME) && + blockDoc.containsKey(IndexSchema.ROOT_FIELD_NAME) && !id.utf8ToString().equals(blockDoc.getFieldValue(IndexSchema.ROOT_FIELD_NAME))) { --- End diff -- I don't think we can assume id.utf8ToString() is correct. I think we have to consult the corresponding FieldType to get the "external value". Also, cast blockDoc.getFieldValue as a String to make it clear we expected it to be one. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r223747455 --- Diff: solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java --- @@ -442,5 +442,58 @@ protected void doRemoveRegex(SolrInputDocument toDoc, SolrInputField sif, Object } return patterns; } + + private Object getNativeFieldValue(String fieldName, Object val) { +if(isChildDoc(val)) { + return val; +} +SchemaField sf = schema.getField(fieldName); +return sf.getType().toNativeType(val); + } + + private static boolean isChildDoc(Object obj) { +if(!(obj instanceof Collection)) { + return obj instanceof SolrDocumentBase; +} +Collection objValues = (Collection) obj; +if(objValues.size() == 0) { + return false; +} +return objValues.iterator().next() instanceof SolrDocumentBase; + } + + private void removeObj(Collection original, Object toRemove, String fieldName) { +if(isChildDoc(toRemove)) { + removeChildDoc(original, (SolrInputDocument) toRemove); +} else { + original.remove(getNativeFieldValue(fieldName, toRemove)); +} + } + + private static void removeChildDoc(Collection original, SolrInputDocument docToRemove) { +for(SolrInputDocument doc: (Collection) original) { + if(isDerivedFromDoc(doc, docToRemove)) { +original.remove(doc); +return; + } +} + } + + /** + * + * @param fullDoc the document to be tested + * @param subDoc the sub document that should be a subset of fullDoc + * @return whether subDoc is a subset of fullDoc + */ + private static boolean isDerivedFromDoc(SolrInputDocument fullDoc, SolrInputDocument subDoc) { +for(SolrInputField subSif: subDoc) { + String fieldName = subSif.getName(); + if(!fullDoc.containsKey(fieldName)) return false; + Collection fieldValues = fullDoc.getFieldValues(fieldName); + if(fieldValues.size() < subSif.getValueCount()) return false; + if(!fullDoc.getFieldValues(fieldName).containsAll(subSif.getValues())) return false; --- End diff -- `fullDoc.getFieldValues(fieldName)` on this line can be replaced by `fieldValues` since we already have the values. And the previous line on the count is unnecessary since the containsAll check on this line would fail. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224307768 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java --- @@ -1360,24 +1385,47 @@ boolean getUpdatedDocument(AddUpdateCommand cmd, long versionOnUpdate) throws IO } // full (non-inplace) atomic update +final boolean isNestedSchema = req.getSchema().isUsableForChildDocs(); SolrInputDocument sdoc = cmd.getSolrInputDocument(); BytesRef id = cmd.getIndexedId(); -SolrInputDocument oldDoc = RealTimeGetComponent.getInputDocument(cmd.getReq().getCore(), id); +SolrInputDocument blockDoc = RealTimeGetComponent.getInputDocument(cmd.getReq().getCore(), id, null, +false, null, true, true); -if (oldDoc == null) { - // create a new doc by default if an old one wasn't found - if (versionOnUpdate <= 0) { -oldDoc = new SolrInputDocument(); - } else { +if (blockDoc == null) { + if (versionOnUpdate > 0) { // could just let the optimistic locking throw the error throw new SolrException(ErrorCode.CONFLICT, "Document not found for update. id=" + cmd.getPrintableId()); } } else { - oldDoc.remove(CommonParams.VERSION_FIELD); + blockDoc.remove(CommonParams.VERSION_FIELD); } -cmd.solrDoc = docMerger.merge(sdoc, oldDoc); +SolrInputDocument mergedDoc; +if(idField == null || blockDoc == null) { + // create a new doc by default if an old one wasn't found + mergedDoc = docMerger.merge(sdoc, new SolrInputDocument()); +} else { + if(isNestedSchema && req.getSchema().hasExplicitField(IndexSchema.NEST_PATH_FIELD_NAME) && + blockDoc.containsKey(IndexSchema.ROOT_FIELD_NAME) && !id.utf8ToString().equals(blockDoc.getFieldValue(IndexSchema.ROOT_FIELD_NAME))) { +SolrInputDocument oldDoc = RealTimeGetComponent.getInputDocument(cmd.getReq().getCore(), id, null, +false, null, true, false); +mergedDoc = docMerger.merge(sdoc, oldDoc); +String docPath = (String) mergedDoc.getFieldValue(IndexSchema.NEST_PATH_FIELD_NAME); +List docPaths = StrUtils.splitSmart(docPath, PATH_SEP_CHAR); +SolrInputField replaceDoc = blockDoc.getField(docPaths.remove(0).replaceAll(PATH_SEP_CHAR + "|" + NUM_SEP_CHAR, "")); --- End diff -- The logic here (not just this line) is non-obvious to me. There are no comments. Please add comments and maybe refactor out a method. The use of replaceAll with a regexp is suspicious to me. None of the tests you added triggered a breakpoint within the docPaths loop below. Needs more testing or possible error. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224306195 --- Diff: solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java --- @@ -661,6 +690,21 @@ public static SolrInputDocument getInputDocument(SolrCore core, BytesRef idBytes return sid; } + private static void ensureDocDecorated(Set onlyTheseNonStoredDVs, SolrDocumentBase doc, int docid, SolrDocumentFetcher docFetcher) throws IOException { --- End diff -- I suggest renaming these methods `ensureDocDecorated` since it's what it calls. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224209409 --- Diff: solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateBlockTest.java --- @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.List; + +import org.apache.lucene.util.BytesRef; +import org.apache.solr.SolrTestCaseJ4; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.core.SolrCore; +import org.apache.solr.handler.component.RealTimeGetComponent; +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; + +public class AtomicUpdateBlockTest extends SolrTestCaseJ4 { + + private final static String VERSION = "_version_"; + private static String PREVIOUS_ENABLE_UPDATE_LOG_VALUE; + + @BeforeClass + public static void beforeTests() throws Exception { +PREVIOUS_ENABLE_UPDATE_LOG_VALUE = System.getProperty("enable.update.log"); +System.setProperty("enable.update.log", "true"); +initCore("solrconfig-block-atomic-update.xml", "schema-nest.xml"); // use "nest" schema + } + + @AfterClass + public static void afterTests() throws Exception { +// restore enable.update.log +System.setProperty("enable.update.log", PREVIOUS_ENABLE_UPDATE_LOG_VALUE); + } + + @Before + public void before() { +clearIndex(); +assertU(commit()); + } + + @Test + public void testMergeChildDoc() throws Exception { +SolrInputDocument doc = new SolrInputDocument(); +doc.setField("id", "1"); +doc.setField("cat_ss", new String[]{"aaa", "ccc"}); +doc.setField("child", Collections.singletonList(sdoc("id", "2", "cat_ss", "child"))); +addDoc(adoc(doc), "nested-rtg"); + +BytesRef rootDocId = new BytesRef("1"); +SolrCore core = h.getCore(); +SolrInputDocument block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +// assert block doc has child docs +assertTrue(block.containsKey("child")); + +assertJQ(req("q","id:1") +,"/response/numFound==0" +); + +// commit the changes +assertU(commit()); + +SolrInputDocument newChildDoc = sdoc("id", "3", "cat_ss", "child"); +SolrInputDocument addedDoc = sdoc("id", "1", +"cat_ss", Collections.singletonMap("add", "bbb"), +"child", Collections.singletonMap("add", sdocs(newChildDoc))); +block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +block.removeField(VERSION); +SolrInputDocument preMergeDoc = new SolrInputDocument(block); +AtomicUpdateDocumentMerger docMerger = new AtomicUpdateDocumentMerger(req()); +docMerger.merge(addedDoc, block); +assertEquals("merged document should have the same id", preMergeDoc.getFieldValue("id"), block.getFieldValue("id")); +assertDocContainsSubset(preMergeDoc, block); +assertDocContainsSubset(addedDoc, block); +assertDocContainsSubset(newChildDoc, (SolrInputDocument) ((List) block.getFieldValues("child")).get(1)); +assertEquals(doc.getFieldValue("id"), block.getFieldValue("id")); + } + + @Test + public void testBlockAtomicAdd() throws Exception { + +SolrInputDocument
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224283996 --- Diff: solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java --- @@ -609,9 +618,10 @@ public static SolrInputDocument getInputDocument(SolrCore core, BytesRef idBytes * @param resolveFullDocument In case the document is fetched from the tlog, it could only be a partial document if the last update * was an in-place update. In that case, should this partial document be resolved to a full document (by following * back prevPointer/prevVersion)? + * @param resolveBlock Check whether the document is part of a block. If so, return the whole block. */ public static SolrInputDocument getInputDocument(SolrCore core, BytesRef idBytes, AtomicLong versionReturned, boolean avoidRetrievingStoredFields, - Set onlyTheseNonStoredDVs, boolean resolveFullDocument) throws IOException { + Set onlyTheseNonStoredDVs, boolean resolveFullDocument, boolean resolveBlock) throws IOException { SolrInputDocument sid = null; --- End diff -- It would be helpful to add a javadoc comment to say that if the id refers to a nested document (which isn't known up-front), then it'll never be found in the tlog (at least if you follow the rules of nested docs). Also, perhaps the parameter "resolveBlock" should be "resolveToRootDocument" since I think the "root" terminology may be more widely used as it's even in the schema, whereas "block" is I think not so much. If you disagree, a compromise may be to use both "root" and "Block" together -- "resolveRootBlock". --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224300831 --- Diff: solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java --- @@ -639,12 +650,30 @@ public static SolrInputDocument getInputDocument(SolrCore core, BytesRef idBytes sid = new SolrInputDocument(); } else { Document luceneDocument = docFetcher.doc(docid); - sid = toSolrInputDocument(luceneDocument, core.getLatestSchema()); + sid = toSolrInputDocument(luceneDocument, schema); } -if (onlyTheseNonStoredDVs != null) { - docFetcher.decorateDocValueFields(sid, docid, onlyTheseNonStoredDVs); -} else { - docFetcher.decorateDocValueFields(sid, docid, docFetcher.getNonStoredDVsWithoutCopyTargets()); +ensureDocDecorated(onlyTheseNonStoredDVs, sid, docid, docFetcher, resolveBlock || schema.hasExplicitField(IndexSchema.NEST_PATH_FIELD_NAME)); +SolrInputField rootField; --- End diff -- no big deal to simply initialize rootField up front. You are doing it as an expression with a side-effect below which is needlessly awkward. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224130315 --- Diff: solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateBlockTest.java --- @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.List; + +import org.apache.lucene.util.BytesRef; +import org.apache.solr.SolrTestCaseJ4; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.core.SolrCore; +import org.apache.solr.handler.component.RealTimeGetComponent; +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; + +public class AtomicUpdateBlockTest extends SolrTestCaseJ4 { + + private final static String VERSION = "_version_"; + private static String PREVIOUS_ENABLE_UPDATE_LOG_VALUE; + + @BeforeClass + public static void beforeTests() throws Exception { +PREVIOUS_ENABLE_UPDATE_LOG_VALUE = System.getProperty("enable.update.log"); +System.setProperty("enable.update.log", "true"); +initCore("solrconfig-block-atomic-update.xml", "schema-nest.xml"); // use "nest" schema + } + + @AfterClass + public static void afterTests() throws Exception { +// restore enable.update.log +System.setProperty("enable.update.log", PREVIOUS_ENABLE_UPDATE_LOG_VALUE); + } + + @Before + public void before() { +clearIndex(); +assertU(commit()); + } + + @Test + public void testMergeChildDoc() throws Exception { +SolrInputDocument doc = new SolrInputDocument(); +doc.setField("id", "1"); +doc.setField("cat_ss", new String[]{"aaa", "ccc"}); +doc.setField("child", Collections.singletonList(sdoc("id", "2", "cat_ss", "child"))); +addDoc(adoc(doc), "nested-rtg"); + +BytesRef rootDocId = new BytesRef("1"); +SolrCore core = h.getCore(); +SolrInputDocument block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +// assert block doc has child docs +assertTrue(block.containsKey("child")); + +assertJQ(req("q","id:1") +,"/response/numFound==0" +); + +// commit the changes +assertU(commit()); + +SolrInputDocument newChildDoc = sdoc("id", "3", "cat_ss", "child"); +SolrInputDocument addedDoc = sdoc("id", "1", +"cat_ss", Collections.singletonMap("add", "bbb"), +"child", Collections.singletonMap("add", sdocs(newChildDoc))); +block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +block.removeField(VERSION); +SolrInputDocument preMergeDoc = new SolrInputDocument(block); +AtomicUpdateDocumentMerger docMerger = new AtomicUpdateDocumentMerger(req()); +docMerger.merge(addedDoc, block); --- End diff -- It seems the point of this test is to test AtomicUpdateDocumentMerger.merge? Then why even index anything at all (the first half of this test)? Simply create the documents directly and call the merge method and test the result. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r223749656 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java --- @@ -111,6 +114,8 @@ public static final String DISTRIB_FROM = "distrib.from"; public static final String DISTRIB_INPLACE_PREVVERSION = "distrib.inplace.prevversion"; private static final String TEST_DISTRIB_SKIP_SERVERS = "test.distrib.skip.servers"; + private static final char PATH_SEP_CHAR = '/'; --- End diff -- Please don't create frivolous constants like this. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224131263 --- Diff: solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateBlockTest.java --- @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.List; + +import org.apache.lucene.util.BytesRef; +import org.apache.solr.SolrTestCaseJ4; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.core.SolrCore; +import org.apache.solr.handler.component.RealTimeGetComponent; +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; + +public class AtomicUpdateBlockTest extends SolrTestCaseJ4 { + + private final static String VERSION = "_version_"; + private static String PREVIOUS_ENABLE_UPDATE_LOG_VALUE; + + @BeforeClass + public static void beforeTests() throws Exception { +PREVIOUS_ENABLE_UPDATE_LOG_VALUE = System.getProperty("enable.update.log"); +System.setProperty("enable.update.log", "true"); +initCore("solrconfig-block-atomic-update.xml", "schema-nest.xml"); // use "nest" schema + } + + @AfterClass + public static void afterTests() throws Exception { +// restore enable.update.log +System.setProperty("enable.update.log", PREVIOUS_ENABLE_UPDATE_LOG_VALUE); + } + + @Before + public void before() { +clearIndex(); +assertU(commit()); + } + + @Test + public void testMergeChildDoc() throws Exception { +SolrInputDocument doc = new SolrInputDocument(); +doc.setField("id", "1"); +doc.setField("cat_ss", new String[]{"aaa", "ccc"}); +doc.setField("child", Collections.singletonList(sdoc("id", "2", "cat_ss", "child"))); +addDoc(adoc(doc), "nested-rtg"); + +BytesRef rootDocId = new BytesRef("1"); +SolrCore core = h.getCore(); +SolrInputDocument block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +// assert block doc has child docs +assertTrue(block.containsKey("child")); + +assertJQ(req("q","id:1") +,"/response/numFound==0" +); + +// commit the changes +assertU(commit()); + +SolrInputDocument newChildDoc = sdoc("id", "3", "cat_ss", "child"); +SolrInputDocument addedDoc = sdoc("id", "1", +"cat_ss", Collections.singletonMap("add", "bbb"), +"child", Collections.singletonMap("add", sdocs(newChildDoc))); +block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +block.removeField(VERSION); +SolrInputDocument preMergeDoc = new SolrInputDocument(block); +AtomicUpdateDocumentMerger docMerger = new AtomicUpdateDocumentMerger(req()); +docMerger.merge(addedDoc, block); +assertEquals("merged document should have the same id", preMergeDoc.getFieldValue("id"), block.getFieldValue("id")); +assertDocContainsSubset(preMergeDoc, block); +assertDocContainsSubset(addedDoc, block); +assertDocContainsSubset(newChildDoc, (SolrInputDocument) ((List) block.getFieldValues("child")).get(1)); +assertEquals(doc.getFieldValue("id"), block.getFieldValue("id")); + } + + @Test + public void testBlockAtomicAdd() throws Exception { + +
[GitHub] lucene-solr pull request #455: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r224131571 --- Diff: solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateBlockTest.java --- @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.update.processor; + +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.List; + +import org.apache.lucene.util.BytesRef; +import org.apache.solr.SolrTestCaseJ4; +import org.apache.solr.common.SolrInputDocument; +import org.apache.solr.common.SolrInputField; +import org.apache.solr.core.SolrCore; +import org.apache.solr.handler.component.RealTimeGetComponent; +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; + +public class AtomicUpdateBlockTest extends SolrTestCaseJ4 { + + private final static String VERSION = "_version_"; + private static String PREVIOUS_ENABLE_UPDATE_LOG_VALUE; + + @BeforeClass + public static void beforeTests() throws Exception { +PREVIOUS_ENABLE_UPDATE_LOG_VALUE = System.getProperty("enable.update.log"); +System.setProperty("enable.update.log", "true"); +initCore("solrconfig-block-atomic-update.xml", "schema-nest.xml"); // use "nest" schema + } + + @AfterClass + public static void afterTests() throws Exception { +// restore enable.update.log +System.setProperty("enable.update.log", PREVIOUS_ENABLE_UPDATE_LOG_VALUE); + } + + @Before + public void before() { +clearIndex(); +assertU(commit()); + } + + @Test + public void testMergeChildDoc() throws Exception { +SolrInputDocument doc = new SolrInputDocument(); +doc.setField("id", "1"); +doc.setField("cat_ss", new String[]{"aaa", "ccc"}); +doc.setField("child", Collections.singletonList(sdoc("id", "2", "cat_ss", "child"))); +addDoc(adoc(doc), "nested-rtg"); + +BytesRef rootDocId = new BytesRef("1"); +SolrCore core = h.getCore(); +SolrInputDocument block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +// assert block doc has child docs +assertTrue(block.containsKey("child")); + +assertJQ(req("q","id:1") +,"/response/numFound==0" +); + +// commit the changes +assertU(commit()); + +SolrInputDocument newChildDoc = sdoc("id", "3", "cat_ss", "child"); +SolrInputDocument addedDoc = sdoc("id", "1", +"cat_ss", Collections.singletonMap("add", "bbb"), +"child", Collections.singletonMap("add", sdocs(newChildDoc))); +block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +block.removeField(VERSION); +SolrInputDocument preMergeDoc = new SolrInputDocument(block); +AtomicUpdateDocumentMerger docMerger = new AtomicUpdateDocumentMerger(req()); +docMerger.merge(addedDoc, block); +assertEquals("merged document should have the same id", preMergeDoc.getFieldValue("id"), block.getFieldValue("id")); +assertDocContainsSubset(preMergeDoc, block); +assertDocContainsSubset(addedDoc, block); +assertDocContainsSubset(newChildDoc, (SolrInputDocument) ((List) block.getFieldValues("child")).get(1)); +assertEquals(doc.getFieldValue("id"), block.getFieldValue("id")); + } + + @Test + public void testBlockAtomicAdd() throws Exception { + +SolrInputDocument doc = sdoc("id", "1", +"cat_ss", new String[] {"aaa", "ccc"}, +"child1", sdoc("id", "2", "cat_ss", "child") +); +json(doc); --- End diff -- accidentally added this json line? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: WIP: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r223561565 --- Diff: solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java --- @@ -461,5 +466,33 @@ private static boolean isChildDoc(Object obj) { } return objValues.iterator().next() instanceof SolrDocumentBase; } + + private void removeObj(Collection original, Object toRemove, String fieldName) { +if(isChildDoc(toRemove)) { + removeChildDoc(original, (SolrInputDocument) toRemove); +} else { + original.remove(getNativeFieldValue(fieldName, toRemove)); +} + } + + private static void removeChildDoc(Collection original, SolrInputDocument docToRemove) { +for(SolrInputDocument doc: (Collection) original) { + if(isDerivedFromDoc(doc, docToRemove)) { +original.remove(doc); +return; + } +} + } + + private static boolean isDerivedFromDoc(SolrInputDocument fullDoc, SolrInputDocument subDoc) { --- End diff -- I see the new docs; still it's unclear without the context of our conversation. Should "subDoc" be "partialDoc"? How is it that the fullDoc is "the document to be tested" when the return value is "wether subDoc is a subset of fullDoc"? Isn't the subject of the test subDoc? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: WIP: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r223244019 --- Diff: solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java --- @@ -461,5 +466,33 @@ private static boolean isChildDoc(Object obj) { } return objValues.iterator().next() instanceof SolrDocumentBase; } + + private void removeObj(Collection original, Object toRemove, String fieldName) { +if(isChildDoc(toRemove)) { + removeChildDoc(original, (SolrInputDocument) toRemove); +} else { + original.remove(getNativeFieldValue(fieldName, toRemove)); +} + } + + private static void removeChildDoc(Collection original, SolrInputDocument docToRemove) { +for(SolrInputDocument doc: (Collection) original) { + if(isDerivedFromDoc(doc, docToRemove)) { +original.remove(doc); +return; + } +} + } + + private static boolean isDerivedFromDoc(SolrInputDocument fullDoc, SolrInputDocument subDoc) { --- End diff -- Was does it mean to be "derived" here? Perhaps do you mean that subDoc is a partial update? Javadocs would help. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: WIP: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r223243871 --- Diff: solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java --- @@ -461,5 +466,33 @@ private static boolean isChildDoc(Object obj) { } return objValues.iterator().next() instanceof SolrDocumentBase; } + + private void removeObj(Collection original, Object toRemove, String fieldName) { +if(isChildDoc(toRemove)) { + removeChildDoc(original, (SolrInputDocument) toRemove); +} else { + original.remove(getNativeFieldValue(fieldName, toRemove)); +} + } + + private static void removeChildDoc(Collection original, SolrInputDocument docToRemove) { +for(SolrInputDocument doc: (Collection) original) { + if(isDerivedFromDoc(doc, docToRemove)) { +original.remove(doc); --- End diff -- eh... I'm not sure if it's as simple as `original.remove(doc)` since if it was, we wouldn't need the method `isDerivedFromDoc`. Assuming we do need that "derived" predicate method, then we probably need to call remove on an iterator here. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: WIP: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r223244909 --- Diff: solr/core/src/test/org/apache/solr/update/processor/AtomicUpdateBlockTest.java --- @@ -181,6 +182,180 @@ public void testBlockRealTimeGet() throws Exception { ); } + @Test + public void testBlockAtomicSet() throws Exception { +SolrInputDocument doc = sdoc("id", "1", +"cat_ss", new String[] {"aaa", "ccc"}, +"child1", Collections.singleton(sdoc("id", "2", "cat_ss", "child")) +); +json(doc); +addDoc(adoc(doc), "nested-rtg"); + +BytesRef rootDocId = new BytesRef("1"); +SolrCore core = h.getCore(); +SolrInputDocument block = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +// assert block doc has child docs +assertTrue(block.containsKey("child1")); + +assertJQ(req("q","id:1") +,"/response/numFound==0" +); + +// commit the changes +assertU(commit()); + +SolrInputDocument committedBlock = RealTimeGetComponent.getInputDocument(core, rootDocId, true); +BytesRef childDocId = new BytesRef("2"); +// ensure the whole block is returned when resolveBlock is true and id of a child doc is provided +assertEquals(committedBlock.toString(), RealTimeGetComponent.getInputDocument(core, childDocId, true).toString()); + +assertJQ(req("q","id:1") +,"/response/numFound==1" +); + +assertJQ(req("qt","/get", "id","1", "fl","id, cat_ss, child1, [child]") +,"=={\"doc\":{'id':\"1\"" + +", cat_ss:[\"aaa\",\"ccc\"], child1:[{\"id\":\"2\",\"cat_ss\":[\"child\"]}]" + +" }}" +); + +assertU(commit()); + +assertJQ(req("qt","/get", "id","1", "fl","id, cat_ss, child1, [child]") +,"=={\"doc\":{'id':\"1\"" + +", cat_ss:[\"aaa\",\"ccc\"], child1:[{\"id\":\"2\",\"cat_ss\":[\"child\"]}]" + +" }}" +); + +doc = sdoc("id", "1", +"cat_ss", ImmutableMap.of("set", Arrays.asList("aaa", "bbb")), --- End diff -- Minor: in cases where the JDK has alternatives to Guava, use them. Here, it's `Collections.singletonMap` I recall. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: WIP: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r223244186 --- Diff: solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java --- @@ -461,5 +466,33 @@ private static boolean isChildDoc(Object obj) { } return objValues.iterator().next() instanceof SolrDocumentBase; } + + private void removeObj(Collection original, Object toRemove, String fieldName) { +if(isChildDoc(toRemove)) { + removeChildDoc(original, (SolrInputDocument) toRemove); +} else { + original.remove(getNativeFieldValue(fieldName, toRemove)); +} + } + + private static void removeChildDoc(Collection original, SolrInputDocument docToRemove) { +for(SolrInputDocument doc: (Collection) original) { + if(isDerivedFromDoc(doc, docToRemove)) { +original.remove(doc); +return; + } +} + } + + private static boolean isDerivedFromDoc(SolrInputDocument fullDoc, SolrInputDocument subDoc) { +for(SolrInputField subSif: subDoc) { + String fieldName = subSif.getName(); + if(!fullDoc.containsKey(fieldName)) return false; + Collection fieldValues = subDoc.getFieldValues(fieldName); --- End diff -- Can't we get this directly off the subSif via subSif.getValues()? And why would subSif.getValueCount ever be different than fieldValues.size()? Comments would help. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #455: WIP: SOLR-12638
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/455#discussion_r218293921 --- Diff: solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java --- @@ -1184,7 +1196,16 @@ protected boolean versionAdd(AddUpdateCommand cmd) throws IOException { // TODO: possibly set checkDeleteByQueries as a flag on the command? doLocalAdd(cmd); - + +if(lastKnownVersion != null && req.getSchema().isUsableForChildDocs() && --- End diff -- This may very well be right but can you tell me why you added this delete here at this line and why the delete is version-dependent? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #455: WIP: SOLR-12638
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/455 ConvertedLegacyTest fails in part because line 306 adds with overwrite=false -- a bit of a dubious Solr feature that is probably not properly compatible with the UpdateLog which makes various assumptions about the uniqueKey being unique. I'll email the dev list to see what others think but I'm inclined to think overwrite=false ought to be explicitly forbidden with an UpdateLog in place. That ancient legacy test can use a config that doesn't have an updateLog. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214773746 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java --- @@ -132,6 +134,12 @@ public void transform(SolrDocument rootDoc, int rootDocId) { // load the doc SolrDocument doc = searcher.getDocFetcher().solrDoc(docId, childReturnFields); + if(childReturnFields.getTransformer() != null) { +if(childReturnFields.getTransformer().context != this.context) { --- End diff -- maybe this line should only check if context is null? By checking if the context is different, it's suggestive there may already be a context but if that were true then we ought to reinstate the former context when done. I suspect it's always going to be null. Can you look? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214773781 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java --- @@ -106,9 +123,20 @@ public DocTransformer create(String field, SolrParams params, SolrQueryRequest r } } +String childReturnFields = params.get("fl"); +SolrReturnFields childSolrReturnFields; +if(childReturnFields != null) { + childSolrReturnFields = new SolrReturnFields(childReturnFields, req); +} else if(req.getSchema().getDefaultLuceneMatchVersion().major < 8) { --- End diff -- Nice --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214541964 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java --- @@ -57,12 +58,34 @@ static final char PATH_SEP_CHAR = '/'; static final char NUM_SEP_CHAR = '#'; + private static final ThreadLocal transformerInitialized = new ThreadLocal(){ --- End diff -- This is the classic but more verbose way to declare a ThreadLocal. In Java 8 this can be a one-liner: ```ThreadLocal recursionCheckThreadLocal = ThreadLocal.withInitial(() -> Boolean.FALSE);``` And please consider another field name, not "transformerInitialized"? I suggest "recursionCheckThreadLocal" since it declares it's purpose (to detect recursion) and it's type (it's a ThreadLocal). "initialized" is dubious to me because it suggests an instance field of a class but this one is global. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214542527 --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java --- @@ -243,11 +250,39 @@ private void testChildDocNonStoredDVFields() throws Exception { "fl", "*,[child parentFilter=\"subject:parentDocument\"]"), test1); assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ", -"fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2); +"fl", "intDvoDefault, subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2); assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ", -"fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:bar\" limit=2]"), test3); +"fl", "intDvoDefault, subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:bar\" limit=2]"), test3); + + } + + private void testChildReturnFields() throws Exception { +assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ", +"fl", "*,[child parentFilter=\"subject:parentDocument\" fl=\"intDvoDefault,child_fl:[value v='child_fl_test']\"]"), +"/response/docs/[0]/intDefault==42", +"/response/docs/[0]/_childDocuments_/[0]/intDvoDefault==42", + "/response/docs/[0]/_childDocuments_/[0]/child_fl=='child_fl_test'"); + +try(SolrQueryRequest req = req("q", "*:*", "fq", "subject:\"parentDocument\" ", --- End diff -- Do you know XPath? assertQ takes xpath expressions that is powerful enough to do everything you are asserting here in less code. (assertQ is used in a *ton* of tests; plenty of examples to learn from). The assertJQ thing is less powerful. You don't *have* to rewrite it but it's at least worth being aware for future assertions. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214542110 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java --- @@ -57,12 +58,34 @@ static final char PATH_SEP_CHAR = '/'; static final char NUM_SEP_CHAR = '#'; + private static final ThreadLocal transformerInitialized = new ThreadLocal(){ +@Override +protected Boolean initialValue() { + this.set(false); + return false; +} + }; private static final BooleanQuery rootFilter = new BooleanQuery.Builder() .add(new BooleanClause(new MatchAllDocsQuery(), BooleanClause.Occur.MUST)) .add(new BooleanClause(new DocValuesFieldExistsQuery(NEST_PATH_FIELD_NAME), BooleanClause.Occur.MUST_NOT)).build(); @Override public DocTransformer create(String field, SolrParams params, SolrQueryRequest req) { +if(transformerInitialized.get()) { + // this is a recursive call by SolrReturnFields see #createChildDocTransformer + return new DocTransformer.NoopFieldTransformer(); +} else { + try { +// transformer is yet to be initialized in this thread, create it +transformerInitialized.set(true); +return createChildDocTransformer(field, params, req); + } finally { +transformerInitialized.remove(); --- End diff -- Hmm; maybe clearer to set(false) --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214541721 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java --- @@ -54,24 +54,26 @@ private final DocSet childDocSet; private final int limit; private final boolean isNestedSchema; + private final SolrReturnFields childReturnFields; - //TODO ought to be provided/configurable - private final SolrReturnFields childReturnFields = new SolrReturnFields(); - - ChildDocTransformer(String name, BitSetProducer parentsFilter, - DocSet childDocSet, boolean isNestedSchema, int limit) { + ChildDocTransformer(String name, BitSetProducer parentsFilter, DocSet childDocSet, + SolrReturnFields returnFields, boolean isNestedSchema, int limit) { this.name = name; this.parentsFilter = parentsFilter; this.childDocSet = childDocSet; this.limit = limit; this.isNestedSchema = isNestedSchema; +this.childReturnFields = returnFields!=null? returnFields: new SolrReturnFields(); } @Override public String getName() { return name; } + @Override + public boolean needsSolrIndexSearcher() { return true; } --- End diff -- +1 nice catch! --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214542721 --- Diff: solr/solrj/src/test/org/apache/solr/client/solrj/SolrExampleTests.java --- @@ -1831,7 +1831,7 @@ public void testChildDoctransformer() throws IOException, SolrServerException { q = new SolrQuery("q", "*:*", "indent", "true"); q.setFilterQueries(parentFilter); - q.setFields("id,[child parentFilter=\"" + parentFilter + + q.setFields("id, level_i, [child parentFilter=\"" + parentFilter + --- End diff -- I'm confused; this change suggests there is a backwards-compatibility change; is there? I think you said that the child "fl" is effectively constrained by whatever the top "fl" is; which if totally true means (I think) we needn't worry by defaulting to the top "fl". If this isn't totally correct (hence your change) then we need to be careful about changing the default behavior in a minor release; we'd have to tweak the default choice (absence of specific "fl") using the luceneMatchVersion. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214542262 --- Diff: solr/core/src/java/org/apache/solr/response/transform/DocTransformer.java --- @@ -114,4 +114,27 @@ public void transform(SolrDocument doc, int docid, float score) throws IOExcepti public String toString() { return getName(); } + + /** + * Trivial Impl that ensure that the specified field is requested as an "extra" field, + * but then does nothing during the transformation phase. + */ + public static final class NoopFieldTransformer extends DocTransformer { --- End diff -- Oh woops; I recommended you do this but overlooked it's not *just* a no-op; it's doing this extra field requesting task. Well I suppose it's okay; not a big deal. Alternatively, this could changed to be *just* a no-op, and then the former impl in RawValueTransformerFactory.java could exist as a subclass that is a bit simpler. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214542325 --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java --- @@ -91,10 +98,10 @@ private void testChildDoctransformerXML() throws Exception { "fl", "*,[child parentFilter=\"subject:parentDocument\"]"), test1); assertQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ", -"fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2); +"fl", "id, subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2); --- End diff -- nitpick: remove that extra space after the comma here and in the other lines you modified. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214541692 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java --- @@ -131,6 +131,12 @@ public void transform(SolrDocument rootDoc, int rootDocId) { // load the doc SolrDocument doc = searcher.getDocFetcher().solrDoc(docId, childReturnFields); --- End diff -- I was thinking the same thing! But yeah; out of scope for this PR. CC @ErickErickson --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #433: SOLR-12357 Premptive creation of collections ...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/433#discussion_r214121353 --- Diff: solr/core/src/java/org/apache/solr/update/processor/TimeRoutedAliasUpdateProcessor.java --- @@ -198,32 +209,32 @@ private String createCollectionsIfRequired(Instant docTimestamp, String targetCo // probably don't write to the same alias. As such, we have deferred any solution to the "many clients causing // collection creation simultaneously" problem until such time as someone actually has that problem in a // real world use case that isn't just an anti-pattern. +Map.Entry candidateCollectionDesc = findCandidateGivenTimestamp(docTimestamp, cmd.getPrintableId()); +String candidateCollectionName = candidateCollectionDesc.getValue(); try { - CreationType creationType = requiresCreateCollection(docTimestamp, timeRoutedAlias.getPreemptiveCreateWindow()); - switch (creationType) { + switch (typeOfCreationRequired(docTimestamp, candidateCollectionDesc.getKey())) { case SYNCHRONOUS: // This next line blocks until all collections required by the current document have been created - return maintain(targetCollection, docTimestamp, printableId, false); + return createAllRequiredCollections(docTimestamp, printableId, candidateCollectionDesc); case ASYNC_PREEMPTIVE: - // Note: creating an executor and throwing it away is slightly expensive, but this is only likely to happen - // once per hour/day/week (depending on time slice size for the TRA). If the executor were retained, it - // would need to be shut down in a close hook to avoid test failures due to thread leaks which is slightly - // more complicated from a code maintenance and readability stand point. An executor must used instead of a - // thread to ensure we pick up the proper MDC logging stuff from ExecutorUtil. T if (preemptiveCreationExecutor == null) { -DefaultSolrThreadFactory threadFactory = new DefaultSolrThreadFactory("TRA-preemptive-creation"); -preemptiveCreationExecutor = newMDCAwareSingleThreadExecutor(threadFactory); -preemptiveCreationExecutor.execute(() -> { - maintain(targetCollection, docTimestamp, printableId, true); - preemptiveCreationExecutor.shutdown(); - preemptiveCreationExecutor = null; -}); +// It's important not to add code between here and the prior call to findCandidateGivenTimestamp() +// in processAdd() that invokes updateParsedCollectionAliases(). Doing so would update parsedCollectionsDesc +// and create a race condition. We are relying on the fact that get(0) is returning the head of the parsed +// collections that existed when candidateCollectionDesc was created. If this class updates it's notion of +// parsedCollectionsDesc since candidateCollectionDesc was chosen, we could create collection n+2 +// instead of collection n+1. + +// This line does not block and the document can be added immediately +preemptiveAsync(() -> createNextCollection(this.parsedCollectionsDesc.get(0).getValue())); --- End diff -- bug: do not refer to this.parsedCollectionsDesc from the lambda runnable as it is not thread-safe (and I don't think we should try to make it so. the input should be gathered in the immediate line prior (thus in the main thread) so that it's already resolved. To make this bug harder (you did it twice), you could change preemptiveAsync() to take the collection name instead of taking a lambda. It could then be named createNextCollectionAsync --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #433: SOLR-12357 Premptive creation of collections ...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/433#discussion_r214122537 --- Diff: solr/core/src/java/org/apache/solr/update/processor/TimeRoutedAliasUpdateProcessor.java --- @@ -165,31 +165,42 @@ private String getAliasName() { @Override public void processAdd(AddUpdateCommand cmd) throws IOException { -SolrInputDocument solrInputDocument = cmd.getSolrInputDocument(); -final Object routeValue = solrInputDocument.getFieldValue(timeRoutedAlias.getRouteField()); -final Instant docTimestampToRoute = parseRouteKey(routeValue); -updateParsedCollectionAliases(); -String candidateCollection = findCandidateCollectionGivenTimestamp(docTimestampToRoute, cmd.getPrintableId()); -final Instant maxFutureTime = Instant.now().plusMillis(timeRoutedAlias.getMaxFutureMs()); +final Instant docTimestamp = + parseRouteKey(cmd.getSolrInputDocument().getFieldValue(timeRoutedAlias.getRouteField())); + // TODO: maybe in some cases the user would want to ignore/warn instead? -if (docTimestampToRoute.isAfter(maxFutureTime)) { +if (docTimestamp.isAfter(Instant.now().plusMillis(timeRoutedAlias.getMaxFutureMs( { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, - "The document's time routed key of " + docTimestampToRoute + " is too far in the future given " + + "The document's time routed key of " + docTimestamp + " is too far in the future given " + TimeRoutedAlias.ROUTER_MAX_FUTURE + "=" + timeRoutedAlias.getMaxFutureMs()); } -String targetCollection = createCollectionsIfRequired(docTimestampToRoute, candidateCollection, cmd.getPrintableId()); + +// to avoid potential for race conditions, this next method should not get called again unless +// we have created a collection synchronously +updateParsedCollectionAliases(); + +String targetCollection = createCollectionsIfRequired(docTimestamp, cmd.getPrintableId(), cmd); + if (thisCollection.equals(targetCollection)) { // pass on through; we've reached the right collection super.processAdd(cmd); } else { // send to the right collection - SolrCmdDistributor.Node targetLeaderNode = routeDocToSlice(targetCollection, solrInputDocument); + SolrCmdDistributor.Node targetLeaderNode = routeDocToSlice(targetCollection, cmd.getSolrInputDocument()); cmdDistrib.distribAdd(cmd, Collections.singletonList(targetLeaderNode), new ModifiableSolrParams(outParamsToLeader)); } } - - private String createCollectionsIfRequired(Instant docTimestamp, String targetCollection, String printableId) { + /** + * Create any required collections and return the name of the collection to which the current document should be sent. + * + * @param docTimestamp the date for the document taken from the field specified in the TRA config --- End diff -- These parameter-level docs are fine but FYI don't feel that you have to do this, especially for private methods, and especially for obvious parameters (i.e. you don't *have* to document all; you could just do some at your discretion, or none). This is subjective of course; I'm sharing my opinion and not insisting on anything. My preference leans towards... "if you having something helpful to say then say it, but avoid writing the obvious" (for various reasons) Oh look... you don't need to pass printableId since you could get it from AddUpdateCommand if needed. printableId is a wart on these method signatures IMO. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #433: SOLR-12357 Premptive creation of collections ...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/433#discussion_r214120233 --- Diff: solr/core/src/java/org/apache/solr/update/processor/TimeRoutedAliasUpdateProcessor.java --- @@ -437,45 +461,46 @@ protected void doClose() { /** - * Create as many collections as required. This method loops to allow for the possibility that the routeTimestamp + * Create as many collections as required. This method loops to allow for the possibility that the docTimestamp * requires more than one collection to be created. Since multiple threads may be invoking maintain on separate * requests to the same alias, we must pass in the name of the collection that this thread believes to be the most * recent collection. This assumption is checked when the command is executed in the overseer. When this method * finds that all collections required have been created it returns the (possibly new) most recent collection. * The return value is ignored by the calling code in the async preemptive case. * - * @param targetCollection the initial notion of the latest collection available. * @param docTimestamp the timestamp from the document that determines routing * @param printableId an identifier for the add command used in error messages + * @param targetCollectionDesc the descriptor for the presently selected collection which should also be + * the most recent collection in all cases where this method is invoked. * @return The latest collection, including collections created during maintenance */ - public String maintain(String targetCollection, Instant docTimestamp, String printableId, boolean asyncSinglePassOnly) { -do { // typically we don't loop; it's only when we need to create a collection - - // Note: This code no longer short circuits immediately when it sees that the expected latest - // collection is the current latest collection. With the advent of preemptive collection creation - // we always need to do the time based checks. Otherwise, we cannot handle the case where the - // preemptive window is larger than our TRA's time slices - - // Check the doc isn't too far in the future - - if (NONE == requiresCreateCollection(docTimestamp, timeRoutedAlias.getPreemptiveCreateWindow())) -return targetCollection; // thus we don't need another collection + private String createAllRequiredCollections( Instant docTimestamp, String printableId, + Map.Entry targetCollectionDesc) { +do { + switch(typeOfCreationRequired(docTimestamp, targetCollectionDesc.getKey())) { +case NONE: + return targetCollectionDesc.getValue(); // we don't need another collection +case ASYNC_PREEMPTIVE: + // can happen when preemptive interval is longer than one time slice + preemptiveAsync(() -> createNextCollection(this.parsedCollectionsDesc.get(0).getValue())); --- End diff -- bug: do not refer to this.parsedCollectionsDesc from the lambda runnable as it is not thread-safe (and I don't think we should try to make it so. the input should be gathered in the immediate line prior (thus in the main thread) so that it's already resolved. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #433: SOLR-12357 Premptive creation of collections ...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/433#discussion_r214118388 --- Diff: solr/core/src/java/org/apache/solr/update/processor/TimeRoutedAliasUpdateProcessor.java --- @@ -437,45 +461,46 @@ protected void doClose() { /** - * Create as many collections as required. This method loops to allow for the possibility that the routeTimestamp + * Create as many collections as required. This method loops to allow for the possibility that the docTimestamp * requires more than one collection to be created. Since multiple threads may be invoking maintain on separate * requests to the same alias, we must pass in the name of the collection that this thread believes to be the most * recent collection. This assumption is checked when the command is executed in the overseer. When this method * finds that all collections required have been created it returns the (possibly new) most recent collection. * The return value is ignored by the calling code in the async preemptive case. * - * @param targetCollection the initial notion of the latest collection available. * @param docTimestamp the timestamp from the document that determines routing * @param printableId an identifier for the add command used in error messages + * @param targetCollectionDesc the descriptor for the presently selected collection which should also be + * the most recent collection in all cases where this method is invoked. * @return The latest collection, including collections created during maintenance */ - public String maintain(String targetCollection, Instant docTimestamp, String printableId, boolean asyncSinglePassOnly) { -do { // typically we don't loop; it's only when we need to create a collection - - // Note: This code no longer short circuits immediately when it sees that the expected latest - // collection is the current latest collection. With the advent of preemptive collection creation - // we always need to do the time based checks. Otherwise, we cannot handle the case where the - // preemptive window is larger than our TRA's time slices - - // Check the doc isn't too far in the future - - if (NONE == requiresCreateCollection(docTimestamp, timeRoutedAlias.getPreemptiveCreateWindow())) -return targetCollection; // thus we don't need another collection + private String createAllRequiredCollections( Instant docTimestamp, String printableId, + Map.Entry targetCollectionDesc) { +do { + switch(typeOfCreationRequired(docTimestamp, targetCollectionDesc.getKey())) { +case NONE: + return targetCollectionDesc.getValue(); // we don't need another collection +case ASYNC_PREEMPTIVE: + // can happen when preemptive interval is longer than one time slice + preemptiveAsync(() -> createNextCollection(this.parsedCollectionsDesc.get(0).getValue())); + return targetCollectionDesc.getValue(); +case SYNCHRONOUS: + createNextCollection(targetCollectionDesc.getValue()); // *should* throw if fails for some reason but... + if (!updateParsedCollectionAliases()) { // thus we didn't make progress... +// this is not expected, even in known failure cases, but we check just in case +throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, +"We need to create a new time routed collection but for unknown reasons were unable to do so."); + } + // then retry the loop ... have to do find again in case other requests also added collections + // that were made visible when we called updateParsedCollectionAliases() + targetCollectionDesc = findCandidateGivenTimestamp(docTimestamp, printableId); + break; +default: + throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Unknown creation type while adding " + --- End diff -- I wouldn't waste your virtual breath (lines of code with thoughtful explanation) on a default case of an enum switch.Another approach I like is to `assert ENUMNAME.values().length == 3;` before the switch which will be caught at test time. I forget but if java still insists we have a default then throw IllegalStateException or something like that in a one-liner without explaination. (keep it short & sweet) --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr issue #443: SOLR-12722: add fl param to ChildDocTransformer
Github user dsmiley commented on the issue: https://github.com/apache/lucene-solr/pull/443 Oh right; we'd get an infinite loop as the ChildDocTransformerFactory is already being called by SolrReturnFields! Ouch! Sorry but I don't like your solution of trying to parse/strip the child transformer "fl"; it's very error-prone to do such things. Lets define a ThreadLocal that allows us to know if we're being asked to create this transformer from within a recursive call to SolrReturnFields. When we see this happening, we can return a no-op transformer (see RawValueTransformerFactory.NoopFieldTransformer which could be made public and moved to an inner class of DocTransformer). This thread local would be set & cleared in a try-finally, and this logic would be in some utility method to keep it from being distracting to the caller. In general I avoid ThreadLocals as it's a singleton design pattern which is usually bad, but it can be used sparingly/judiciously. BTW it appears that doc transformers will not be executed on the child docs. The transformer is not invoking them, and I suspect they won't be invoked on the way out to the client. Can you check/test if this is true? If I'm correct, this is a pre-existing bug/limitation that would be good to mark as a TODO; or you can try to address here since it's related (since you before couldn't specify what 'fl' was therefore it wasn't possible to specify a transformer until now). --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #:
Github user dsmiley commented on the pull request: https://github.com/apache/lucene-solr/commit/da0856f1dc41b7ea85ce71801900d63c06422db2#commitcomment-30344425 In solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java: In solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java on line 111: BTW the pre-parsed returnFields is on the SolrQueryResponse. If it's inconvenient to get it then you can just do what you have here. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214044533 --- Diff: solr/solr-ref-guide/src/transforming-result-documents.adoc --- @@ -137,6 +137,7 @@ When using this transformer, the `parentFilter` parameter must be specified, and * `childFilter` - query to filter which child documents should be included, this can be particularly useful when you have multiple levels of hierarchical documents (default: all children) * `limit` - the maximum number of child documents to be returned per parent document (default: 10) +* `fl` - the field list which the transformer is to return. --- End diff -- ... "There is a further limitation in which the fields here should be a subset of those specified by the top level fl param." And say defaults to the top level "fl" ? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214044911 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java --- @@ -106,9 +107,12 @@ public DocTransformer create(String field, SolrParams params, SolrQueryRequest r } } +String childReturnFields = params.get("fl"); +SolrReturnFields childSolrReturnFields = childReturnFields==null? new SolrReturnFields(): new SolrReturnFields(childReturnFields ,req); --- End diff -- Shouldn't this default to the top level "fl"? I think so. Even if it didn't do this before, it in-effect behaves this way and it makes sense (to me) that it would. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #443: SOLR-12722: add fl param to ChildDocTransform...
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/443#discussion_r214018618 --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java --- @@ -297,6 +297,35 @@ public void testNoChildren() throws Exception { "/response/docs/[0]/type_s==cake"); } + @Test + public void testChildReturnFields() throws Exception { +indexSampleData(numberOfDocsPerNestedTest); + +assertJQ(req("q", "test_s:testing", +"sort", "id asc", +"fl", "*, [child childFilter='lonely/lonelyGrandChild/test2_s:secondTest' fl='*, _nest_path_']", --- End diff -- Lets leave `_nest_path_` as internal and not mention it since testing "fl" has nothing to do with this field or nested documents. So instead use some other field, foo_s or whatever. And can you move this test to TestChildDocTransformer.java as it's not related to hierarchy? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #416: WIP: SOLR-12519
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/416#discussion_r213541579 --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java --- @@ -124,10 +124,11 @@ public void testParentFilterLimitJSON() throws Exception { assertJQ(req("q", "type_s:donut", "sort", "id asc", -"fl", "id, type_s, toppings, _nest_path_, [child limit=1]", +"fl", "id, type_s, lonely, lonelyGrandChild, test_s, test2_s, _nest_path_, [child limit=1]", --- End diff -- To my point I wrote in JIRA: It's sad that when I see this I have no idea if it's right/wrong without having to go look at indexSampleData then think about it. No? (this isn't a critique of you in particular; lots of tests including some I've written look like the current tests here). imagine one doc with some nested docs, all of which only have their ID. Since they only have their ID, it's not a lot of literal text in JSON. The BeforeClass unmatched docs cold use negative IDs to easily know who's who. Any way if you would rather leave this as a "TODO" for another day then I understand. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #416: WIP: SOLR-12519
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/416#discussion_r213540255 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java --- @@ -123,6 +124,16 @@ public void transform(SolrDocument rootDoc, int rootDocId) { // Do we need to do anything with this doc (either ancestor or matched the child query) if (isAncestor || childDocSet == null || childDocSet.exists(docId)) { + + if(limit != -1) { +if(!isAncestor) { + if(matches == limit) { +continue; + } + ++matches; --- End diff -- I think matches should be incremented if it's in childDocSet (includes childDocSet being null). Wether it's an ancestor or not doesn't matter I think. You could pull out a new variable isInChildDocSet. Or I suppose simply consider all a match; what I see what you just did as I write this; that's fine too. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #416: WIP: SOLR-12519
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/416#discussion_r213295374 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java --- @@ -99,6 +96,9 @@ public void transform(SolrDocument rootDoc, int rootDocId) { // we'll need this soon... final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME); + // passing a different SortedDocValues obj since the child documents which come after are of smaller docIDs, + // and the iterator can not be reversed. + final String transformedDocPath = getPathByDocId(segRootId, DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME)); --- End diff -- Can you call this rootDocPath since we refer to this doc as the "root doc" elsewhere here? I can see why you chose this name. You could add a comment that the "root doc" is the input doc we are adding information to, and is usually but not necessarily the root of the block of documents (i.e. the root doc may itself be a child doc of another doc). --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #416: WIP: SOLR-12519
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/416#discussion_r213291939 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java --- @@ -0,0 +1,263 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.response.transform; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import com.google.common.collect.ArrayListMultimap; +import com.google.common.collect.Multimap; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.ReaderUtil; +import org.apache.lucene.index.SortedDocValues; +import org.apache.lucene.search.join.BitSetProducer; +import org.apache.lucene.util.BitSet; +import org.apache.solr.common.SolrDocument; +import org.apache.solr.search.DocSet; +import org.apache.solr.search.SolrDocumentFetcher; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.search.SolrReturnFields; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR; +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR; +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME; + +class ChildDocTransformer extends DocTransformer { + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + private static final String ANON_CHILD_KEY = "_childDocuments_"; + + private final String name; + private final BitSetProducer parentsFilter; + private final DocSet childDocSet; + private final int limit; + private final boolean isNestedSchema; + + private final SolrReturnFields childReturnFields = new SolrReturnFields(); + + ChildDocTransformer(String name, BitSetProducer parentsFilter, + DocSet childDocSet, boolean isNestedSchema, int limit) { +this.name = name; +this.parentsFilter = parentsFilter; +this.childDocSet = childDocSet; +this.limit = limit; +this.isNestedSchema = isNestedSchema; + } + + @Override + public String getName() { +return name; + } + + @Override + public void transform(SolrDocument rootDoc, int rootDocId) { +// note: this algorithm works if both if we have have _nest_path_ and also if we don't! + +try { + + // lookup what the *previous* rootDocId is, and figure which segment this is + final SolrIndexSearcher searcher = context.getSearcher(); + final List leaves = searcher.getIndexReader().leaves(); + final int seg = ReaderUtil.subIndex(rootDocId, leaves); + final LeafReaderContext leafReaderContext = leaves.get(seg); + final int segBaseId = leafReaderContext.docBase; + final int segRootId = rootDocId - segBaseId; + final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext); + + final int segPrevRootId = segRootId==0? -1: segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay + + if(segPrevRootId == (segRootId - 1)) { +// doc has no children, return fast +return; + } + + // we'll need this soon... + final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME); + // passing a different SortedDocValues obj since the child documents which come after are of smaller docIDs, + // and the iterator can not be reversed. + final String transformedDocPath = getPathByDocId(segRootId, DocValues.getSorted(leafReaderContext.
[GitHub] lucene-solr pull request #416: WIP: SOLR-12519
Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/416#discussion_r213292329 --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java --- @@ -109,9 +109,14 @@ public void transform(SolrDocument rootDoc, int rootDocId) { // Loop each child ID up to the parent (exclusive). for (int docId = calcDocIdToIterateFrom(lastChildId, rootDocId); docId < rootDocId; ++docId) { -// get the path. (note will default to ANON_CHILD_KEY if not in schema or oddly blank) +// get the path. (note will default to ANON_CHILD_KEY if schema is not nested or empty string if blank) String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues); +if(isNestedSchema && !fullDocPath.contains(transformedDocPath)) { + // is not a descendant of the transformed doc, return fast. + return; --- End diff -- woah; shouldn't this be "continue"? We should have a test that would fail on this bug. All it would take would be an additional child doc that is not underneath the input root/transformed doc but follows it (as provided on input). Some of the first docs we iterate over here might not descend from rootDocId --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org