[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603185#comment-16603185 ] mosh commented on SOLR-12685: - {quote}In light of that, perhaps mark this issue closed?{quote} I will close this issue. Since we have come to common ground that the API should stay consistent, the little changes that have to be made to RTG component will be submitted in SOLR-12638. > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602699#comment-16602699 ] mosh commented on SOLR-12685: - {quote}A ticket for the RTG handler(RealTimeGetComponent#process). {quote} I was thinking about this one, it has been a tough one alright! Wouldn't this result in a major difference from an API perspective between the search component and the RTG component? When querying a document by id, unless ChildDocTransformer is specified in the fl, only the matching document is returned. The same applies to the RTG handler. When a request is made where ChildDocTranformer is supplied, the whole block is returned. This makes sense to me, unless there is a specific need for the "/get" handler to possess the ability of returning the whole block if exists, I can not foresee any reason for this change. Unless, of course, there is a concrete requirement for this logic in the replication process. > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602682#comment-16602682 ] mosh commented on SOLR-12685: - {quote}I thought that's what you set out to do in this issue, which subsumes SOLR-9006; right? Are you changing your mind? {quote} Perhaps this ticket's title is a little bit misleading. There is a need for RTG component to return the whole block if schema is nested, but in the scope of child documents I do not for-see a need for the RTG handler to return the block. I could be mistaken, and to simplify debate and development I propose we open two separate tickets: - A ticket for RTG public methods(ex: RealTimeGetComponent#getInputDocument) - A ticket for the RTG handler(RealTimeGetComponent#process). WDYT? > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602580#comment-16602580 ] David Smiley commented on SOLR-12685: - bq. I was thinking RTG should not return the whole block when queried directly by the RTG handler ... I thought that's what you set out to do in this issue, which subsumes SOLR-9006; right? Are you changing your mind? bq. ... but rather should explicitly perform these checks when running RealTimeGetComponent#getInputDocument, which is used by AtomicUpdateDocumentMerger If you want to work on that, then wouldn't that be SOLR-12638? > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601844#comment-16601844 ] mosh commented on SOLR-12685: - Please correct me if mistaken. I was thinking RTG should not return the whole block when queried directly by the RTG handler, but rather should explicitly perform these checks when running RealTimeGetComponent#getInputDocument, which is used by AtomicUpdateDocumentMerger. {code:java} SolrInputDocument oldDocument = RealTimeGetComponent.getInputDocument (cmd.getReq().getCore(), idBytes, null, // don't want the version to be returned true, // avoid stored fields from index updatedFields, true); // resolve the full document{code} Unless, of course, RTG block lookup is needed by the replication process, which, unfortunately, I am unfamiliar with. Running through the code it seems like the transaction log lookup is written in RealTimeGetComponent#getInputDocumentFromTlog and in process, twice. We could leverage that to ensure AtomicUpdateDocumentMerger gets the block when needed, avoiding further collision and interference with the RealTimeGetHandler. > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601759#comment-16601759 ] mosh commented on SOLR-12685: - Please correct me if mistaken. I was thinking RTG should not return the whole block when queried directly by the RTG handler, but rather should explicitly perform these checks when running _RealTimeGetComponent#_getInputDocument, which is used by AtomicUpdateDocumentMerger. > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601227#comment-16601227 ] David Smiley commented on SOLR-12685: - I think I see what's going on. This is a bit tricky. ChildDocTransformer _does_ use the IndexSearcher so should declare itself as doing so (as you pointed out). But it's only there to return child docs. If the parent doc is in the update log, then it's children ought to be there too (it's a requirement of nested docs after all), and in that event we actually don't need the child doc transformer since the child docs will already be exactly where they need to be in-place. Perhaps the use of ChildDocTransformer should become automatic in RTG, thus RTG can know when it's needed or not? Hmmm. I'll think about this some more; I have to go for the day. > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601188#comment-16601188 ] mosh commented on SOLR-12685: - {quote}Perhaps there is a need though to only get that specific doc, and not the whole block, even though the document is in the block, perhaps in place update(I am not very familiar with those features, just to make sure)? {quote} Now that SOLR-12519 was committed to master, ChildDocTransformer requires SolrIndexSearcher. This requirement causes documents to skip transaction log lookup, instead using a SolrIndexSearcher for docId lookup. {code:java} // true in any situation where we have to use a realtime searcher rather then returning docs // directly from the UpdateLog final boolean mustUseRealtimeSearcher = // if we have filters, we need to check those against the indexed form of the doc (rb.getFilters() != null) || ((null != transformer) && transformer.needsSolrIndexSearcher()); {code} {code:java} if (mustUseRealtimeSearcher) { // close handles to current searchers & result context searcherInfo.clear(); resultContext = null; ulog.openRealtimeSearcher(); // force open a new realtime searcher o = null; // pretend we never found this record and fall through to use the searcher break; }{code} I am not quite sure of the performance implications of this requirement. In case these implications are not deemed as a limiting factor, the trigger for block lookups could be determined purely by IndexSchema#isUsableForChildDocs, removing the need for an additional flag. [~dsmiley], your insights would be of the highest of aids. > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589802#comment-16589802 ] mosh commented on SOLR-12685: - {quote}in RTG we need the realtime searcher{quote} Oh right I guess I sort of missed that part... {quote} I'm also dubious you needed to change the method signature in RTG to take SolrQueryRequest; we'll see.{quote} I guess if there is no need for req#getSearcher, I do not foresee a need to pass req as a parameter. > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589082#comment-16589082 ] David Smiley commented on SOLR-12685: - bq. This ticket incorporates it, since it should return child docs from the transaction log in addition to the index, in case the document was not found there. Okay; it's debatable as it seems the scope of that issue ought to be both tlog & index (why do one and not the other?). Any way we can just do it here and close that one when this is done. bq. In my opinion adding extra flags only complicates this, and should instead check if the schema is usable for child docs... I agree, as you might have guessed based on my feedback on related issues. bq. Perhaps there is a need though to only get that specific doc, and not the whole block, even though the document is in the block, perhaps in place update(I am not very familiar with those features, just to make sure)? Ehh; I dunno. It hasn't been possible before; we don't have to. It'd be more work to handle this scenario. I suggest tabling it until some day someone can express a need for this. I looked at your rough POC. I see what you're trying to do. Be aware that not all "SolrIndexSearchers" are created equal... in RTG we need the *realtime* searcher (thus will see non-committed docs), not req.getSearcher(). I'm also dubious you needed to change the method signature in RTG to take SolrQueryRequest; we'll see. bq. ... which is needed for the atomic update to be merged in the right path inside the block(AtomicUpdateDocumentMerger#doAdd) ... Lets not deal with updates in this issue. I think it's separate, perhaps SOLR-12638. > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588388#comment-16588388 ] mosh commented on SOLR-12685: - I have uploaded a very rough patch, which should only be viewed as a POC, to get my hands dirty before opening sub tasks. The main changes are in RealTimeGetComponent, specifically in lines 655 to 669. I added a new test AtomicUpdateNestedTest#testAddChild, which does not yet pass. After using the ChildDocTransformer in RealTimeGetComponent, I have yet to figure out a clean way to get the child document's _nest_path_, which is needed for the atomic update to be merged in the right path inside the block(AtomicUpdateDocumentMerger#doAdd). > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > Attachments: SOLR-12638-no-commit.patch > > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588379#comment-16588379 ] mosh commented on SOLR-12685: - {quote}Is this a duplicate of SOLR-9006?{quote} This ticket incorporates it, since it should return child docs from the transaction log in addition to the index, in case the document was not found there. Quoting [~ariel_lieber...@hotmail.com] in SOLR-9006: {quote}Therefore, I think the capability (e.g. additional flag) to get parent with all its children is very important..{quote} In my opinion adding extra flags only complicates this, and should instead check if the schema is usable for child docs... Perhaps there is a need though to only get that specific doc, and not the whole block, even though the document is in the block, perhaps in place update(I am not very familiar with those features, just to make sure)? > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587640#comment-16587640 ] David Smiley commented on SOLR-12685: - Is this a duplicate of SOLR-9006? If so, it's not quite clear to me how atomic updates of child documents is related (which I don't even thinks works yet – SOLR-12638). BTW ChildDocTransformer really ought to take an "fl"! (definitely its own issue, if you want to tackle that) Please provide the nocommit patch; thanks. > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested
[ https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587388#comment-16587388 ] mosh commented on SOLR-12685: - I have started working on this, and currently the biggest hold I have encountered is the lack of the ability to set the return fields for the ChildDocTransformer, or a way to get the document's _nest_path_(docValue) without having to build a new SolrInputDocument, or a way to get a DocValue(__nest_path__) by docId. The path of the parent which is updated is needed to get the document which is to be changed by the atomic update in the block(nested SolrInputDocument). e.g. when there's an atomic update {code:javascript} {"id": "2", "children": {"add": {"id":4, "test_s": "test"}}{code} is made, and the doc with id:2 is itself a child doc of doc id:1, we need to get the path of doc id:2, to add the new child doc to it. I could upload a very rough no-commit patch, if needed. > RTG should return the whole block if schema is nested > - > > Key: SOLR-12685 > URL: https://issues.apache.org/jira/browse/SOLR-12685 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: mosh >Priority: Major > > Currently Solr's RealTimeGet component return the document if provided a > docId when consulting the index. For AtomicUpdates for child documents, RTG > should return the whole block when dealing with a nested schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org