[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-04 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603185#comment-16603185
 ] 

mosh commented on SOLR-12685:
-

{quote}In light of that, perhaps mark this issue closed?{quote}
I will close this issue.
Since we have come to common ground that the API should stay consistent,
the little changes that have to be made to RTG component will be submitted in 
SOLR-12638.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-04 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602699#comment-16602699
 ] 

mosh commented on SOLR-12685:
-

{quote}A ticket for the RTG handler(RealTimeGetComponent#process).
{quote}
I was thinking about this one, it has been a tough one alright!
 Wouldn't this result in a major difference from an API perspective between the 
search component and the RTG component?
 When querying a document by id, unless ChildDocTransformer is specified in the 
fl, only the matching document is returned.
 The same applies to the RTG handler. When a request is made where 
ChildDocTranformer is supplied, the whole block is returned.
 This makes sense to me,
 unless there is a specific need for the "/get" handler to possess the ability 
of returning the whole block if exists,
 I can not foresee any reason for this change.
 Unless, of course, there is a concrete requirement for this logic in the 
replication process.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-04 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602682#comment-16602682
 ] 

mosh commented on SOLR-12685:
-

{quote}I thought that's what you set out to do in this issue, which subsumes 
SOLR-9006; right? Are you changing your mind?
{quote}
Perhaps this ticket's title is a little bit misleading. There is a need for RTG 
component to return the whole block if schema is nested, but in the scope of 
child documents I do not for-see a need for the RTG handler to return the block.
 I could be mistaken, and to simplify debate and development I propose we open 
two separate tickets:
- A ticket for RTG public methods(ex: RealTimeGetComponent#getInputDocument)
- A ticket for the RTG handler(RealTimeGetComponent#process).
 WDYT?

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-03 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602580#comment-16602580
 ] 

David Smiley commented on SOLR-12685:
-

bq. I was thinking RTG should not return the whole block when queried directly 
by the RTG handler ...

I thought that's what you set out to do in this issue, which subsumes 
SOLR-9006; right?  Are you changing your mind?

bq. ...  but rather should explicitly perform these checks when running 
RealTimeGetComponent#getInputDocument, which is used by 
AtomicUpdateDocumentMerger

If you want to work on that, then wouldn't that be SOLR-12638?

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-03 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601844#comment-16601844
 ] 

mosh commented on SOLR-12685:
-

Please correct me if mistaken.
 I was thinking RTG should not return the whole block when queried directly by 
the RTG handler, but rather should explicitly perform these checks when running 
RealTimeGetComponent#getInputDocument, which is used by 
AtomicUpdateDocumentMerger.
{code:java}
SolrInputDocument oldDocument = RealTimeGetComponent.getInputDocument
  (cmd.getReq().getCore(), idBytes,
   null, // don't want the version to be returned
   true, // avoid stored fields from index
   updatedFields,
   true); // resolve the full document{code}
Unless, of course, RTG block lookup is needed by the replication process, 
which, unfortunately, I am unfamiliar with.
Running through the code it seems like the transaction log lookup is written in 
RealTimeGetComponent#getInputDocumentFromTlog and in process, twice.
We could leverage that to ensure AtomicUpdateDocumentMerger gets the block when 
needed, avoiding further collision and interference with the RealTimeGetHandler.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-02 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601759#comment-16601759
 ] 

mosh commented on SOLR-12685:
-

Please correct me if mistaken.
I was thinking RTG should not return the whole block when queried directly by 
the RTG handler, but rather should explicitly perform these checks when running 
_RealTimeGetComponent#_getInputDocument, which is used by 
AtomicUpdateDocumentMerger.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-02 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601227#comment-16601227
 ] 

David Smiley commented on SOLR-12685:
-

I think I see what's going on.  This is a bit tricky.  ChildDocTransformer 
_does_ use the IndexSearcher so should declare itself as doing so (as you 
pointed out).  But it's only there to return child docs.  If the parent doc is 
in the update log, then it's children ought to be there too (it's a requirement 
of nested docs after all), and in that event we actually don't need the child 
doc transformer since the child docs will already be exactly where they need to 
be in-place.

Perhaps the use of ChildDocTransformer should become automatic in RTG, thus RTG 
can know when it's needed or not?  Hmmm.  I'll think about this some more; I 
have to go for the day.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-09-02 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16601188#comment-16601188
 ] 

mosh commented on SOLR-12685:
-

{quote}Perhaps there is a need though to only get that specific doc, and not 
the whole block, even though the document is in the block, perhaps in place 
update(I am not very familiar with those features, just to make sure)?
{quote}
Now that SOLR-12519 was committed to master, ChildDocTransformer requires 
SolrIndexSearcher.
 This requirement causes documents to skip transaction log lookup, instead 
using a SolrIndexSearcher for docId lookup.
{code:java}
// true in any situation where we have to use a realtime searcher rather then 
returning docs
// directly from the UpdateLog
final boolean mustUseRealtimeSearcher =
  // if we have filters, we need to check those against the indexed form of 
the doc
  (rb.getFilters() != null)
  || ((null != transformer) && transformer.needsSolrIndexSearcher());
{code}
{code:java}
if (mustUseRealtimeSearcher) {
// close handles to current searchers & result context
searcherInfo.clear();
resultContext = null;
ulog.openRealtimeSearcher();  // force open a new realtime searcher
o = null;  // pretend we never found this record and fall through to use 
the searcher
break;
}{code}
I am not quite sure of the performance implications of this requirement.
 In case these implications are not deemed as a limiting factor, the trigger 
for block lookups could be determined purely by 
IndexSchema#isUsableForChildDocs,
removing the need for an additional flag.
 [~dsmiley],
your insights would be of the highest of aids.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-08-23 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589802#comment-16589802
 ] 

mosh commented on SOLR-12685:
-

{quote}in RTG we need the realtime searcher{quote}
Oh right I guess I sort of missed that part...
{quote} I'm also dubious you needed to change the method signature in RTG to 
take SolrQueryRequest; we'll see.{quote}
I guess if there is no need for req#getSearcher, I do not foresee a need to 
pass req as a parameter.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-08-22 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589082#comment-16589082
 ] 

David Smiley commented on SOLR-12685:
-

bq. This ticket incorporates it, since it should return child docs from the 
transaction log in addition to the index,
in case the document was not found there.

Okay; it's debatable as it seems the scope of that issue ought to be both tlog 
& index (why do one and not the other?).  Any way we can just do it here and 
close that one when this is done.

bq. In my opinion adding extra flags only complicates this, and should instead 
check if the schema is usable for child docs...

I agree, as you might have guessed based on my feedback on related issues.

bq. Perhaps there is a need though to only get that specific doc, and not the 
whole block, even though the document is in the block, perhaps in place 
update(I am not very familiar with those features, just to make sure)?

Ehh; I dunno.  It hasn't been possible before; we don't have to.  It'd be more 
work to handle this scenario.  I suggest tabling it until some day someone can 
express a need for this.

I looked at your rough POC.  I see what you're trying to do.  Be aware that not 
all "SolrIndexSearchers" are created equal... in RTG we need the *realtime* 
searcher (thus will see non-committed docs), not req.getSearcher().  I'm also 
dubious you needed to change the method signature in RTG to take 
SolrQueryRequest; we'll see.

bq. ... which is needed for the atomic update to be merged in the right path 
inside the block(AtomicUpdateDocumentMerger#doAdd) ...

Lets not deal with updates in this issue.  I think it's separate, perhaps 
SOLR-12638.


> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-08-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588388#comment-16588388
 ] 

mosh commented on SOLR-12685:
-

I have uploaded a very rough patch, which should only be viewed as a POC, to 
get my hands dirty before opening sub tasks.
The main changes are in RealTimeGetComponent, specifically in lines 655 to 669.
I added a new test AtomicUpdateNestedTest#testAddChild, which does not yet pass.
After using the ChildDocTransformer in RealTimeGetComponent, I have yet to 
figure out a clean way to get the child document's _nest_path_,
which is needed for the atomic update to be merged in the right path inside the 
block(AtomicUpdateDocumentMerger#doAdd).

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
> Attachments: SOLR-12638-no-commit.patch
>
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-08-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588379#comment-16588379
 ] 

mosh commented on SOLR-12685:
-

{quote}Is this a duplicate of SOLR-9006?{quote}
This ticket incorporates it, since it should return child docs from the 
transaction log in addition to the index,
in case the document was not found there.

Quoting [~ariel_lieber...@hotmail.com] in SOLR-9006:
{quote}Therefore, I think the capability (e.g. additional flag) to get parent 
with all its children is very important..{quote}
In my opinion adding extra flags only complicates this, and should instead 
check if the schema is usable for child docs...
Perhaps there is a need though to only get that specific doc, and not the whole 
block, even though the document is in the block, perhaps in place update(I am 
not very familiar with those features, just to make sure)?

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-08-21 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587640#comment-16587640
 ] 

David Smiley commented on SOLR-12685:
-

Is this a duplicate of SOLR-9006?  If so, it's not quite clear to me how atomic 
updates of child documents is related (which I don't even thinks works yet – 
SOLR-12638).

BTW ChildDocTransformer really ought to take an "fl"!  (definitely its own 
issue, if you want to tackle that)

Please provide the nocommit patch; thanks.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12685) RTG should return the whole block if schema is nested

2018-08-21 Thread mosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587388#comment-16587388
 ] 

mosh commented on SOLR-12685:
-

I have started working on this,
 and currently the biggest hold I have encountered is the lack of the ability 
to set the return fields for the ChildDocTransformer, or a way to get the 
document's _nest_path_(docValue) without having to build a new 
SolrInputDocument, or a way to get a DocValue(__nest_path__) by docId.
 The path of the parent which is updated is needed to get the document which is 
to be changed by the atomic update in the block(nested SolrInputDocument).
 e.g. when there's an atomic update
{code:javascript}
{"id": "2", "children": {"add": {"id":4, "test_s": "test"}}{code}
is made, and the doc with id:2 is itself a child doc of doc id:1, we need to 
get the path of doc id:2, to add the new child doc to it.
I could upload a very rough no-commit patch, if needed.

> RTG should return the whole block if schema is nested
> -
>
> Key: SOLR-12685
> URL: https://issues.apache.org/jira/browse/SOLR-12685
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: mosh
>Priority: Major
>
> Currently Solr's RealTimeGet component return the document if provided a 
> docId when consulting the index. For AtomicUpdates for child documents, RTG 
> should return the whole block when dealing with a nested schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org