[jira] [Commented] (SOLR-12094) JsonRecordReader ignores root record fields after the split point

2018-04-06 Thread Andrzej Wislowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428205#comment-16428205
 ] 

Andrzej Wislowski commented on SOLR-12094:
--

[~dweiss] I think it is a good idea. I will take a look at this code and try to 
create such patch.

> JsonRecordReader ignores root record fields after the split point
> -
>
> Key: SOLR-12094
> URL: https://issues.apache.org/jira/browse/SOLR-12094
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: master (8.0)
>Reporter: Przemysław Szeremiota
>Priority: Major
> Attachments: SOLR-12094.patch, SOLR-12094.patch, 
> json-record-reader-bug.patch
>
>
> JsonRecordReader, when configured with other than top-level split, ignores 
> all top-level JSON nodes after the split ends, for example:
> {code}
> {
>   "first": "John",
>   "last": "Doe",
>   "grade": 8,
>   "exams": [
> {
> "subject": "Maths",
> "test": "term1",
> "marks": 90
> },
> {
> "subject": "Biology",
> "test": "term1",
> "marks": 86
> }
>   ],
>   "after": "456"
> }
> {code}
> Node "after" won't be visible in SolrInputDocument constructed from 
> /update/json/docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12094) JsonRecordReader ignores root record fields after the split point

2018-04-06 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428202#comment-16428202
 ] 

Noble Paul commented on SOLR-12094:
---

I agree that we should be able to handle this use case as well. But, the 
primary objective is to handle streaming input well. Non streaming parsing 
should be optional

> JsonRecordReader ignores root record fields after the split point
> -
>
> Key: SOLR-12094
> URL: https://issues.apache.org/jira/browse/SOLR-12094
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: master (8.0)
>Reporter: Przemysław Szeremiota
>Priority: Major
> Attachments: SOLR-12094.patch, SOLR-12094.patch, 
> json-record-reader-bug.patch
>
>
> JsonRecordReader, when configured with other than top-level split, ignores 
> all top-level JSON nodes after the split ends, for example:
> {code}
> {
>   "first": "John",
>   "last": "Doe",
>   "grade": 8,
>   "exams": [
> {
> "subject": "Maths",
> "test": "term1",
> "marks": 90
> },
> {
> "subject": "Biology",
> "test": "term1",
> "marks": 86
> }
>   ],
>   "after": "456"
> }
> {code}
> Node "after" won't be visible in SolrInputDocument constructed from 
> /update/json/docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12094) JsonRecordReader ignores root record fields after the split point

2018-04-06 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428179#comment-16428179
 ] 

Dawid Weiss commented on SOLR-12094:


I understand the concept of "streaming" imports, but this just seems wrong to 
me here. An analogy here would be XSLT or other technologies where the 
implementation permits efficient "streaming" mode in certain cases, unless the 
input makes it impossible. 

I perceive a similar situation here: the parser should be able to handle the 
input efficiently if possible, but should also give the possibility for 
processing any type of input, even such that cannot be processed without 
bookkeeping of some history. Sure, an abuse case of millions of split nodes 
awaiting a single attribute is possible, but even then it'd be simpler to just 
say "yeah, buffer up until you can emit the output" than modify the structure 
of such a json (write a converter so that the nested nodes are always placed at 
the end of the parent).

[~awislowski] do you think you'd be able to modify the patch so that it accepts 
an argument and switches between the 'strict streaming' mode and 'relaxed' 
mode? In 'strict streaming' mode there should be no buffering and the parser 
should complain with an exception if it encounters extra nodes after the split. 
In the 'relaxed mode' the parser should buffer up the information until it's 
complete and can be emitted.

> JsonRecordReader ignores root record fields after the split point
> -
>
> Key: SOLR-12094
> URL: https://issues.apache.org/jira/browse/SOLR-12094
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: master (8.0)
>Reporter: Przemysław Szeremiota
>Priority: Major
> Attachments: SOLR-12094.patch, SOLR-12094.patch, 
> json-record-reader-bug.patch
>
>
> JsonRecordReader, when configured with other than top-level split, ignores 
> all top-level JSON nodes after the split ends, for example:
> {code}
> {
>   "first": "John",
>   "last": "Doe",
>   "grade": 8,
>   "exams": [
> {
> "subject": "Maths",
> "test": "term1",
> "marks": 90
> },
> {
> "subject": "Biology",
> "test": "term1",
> "marks": 86
> }
>   ],
>   "after": "456"
> }
> {code}
> Node "after" won't be visible in SolrInputDocument constructed from 
> /update/json/docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12094) JsonRecordReader ignores root record fields after the split point

2018-03-28 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417008#comment-16417008
 ] 

Noble Paul commented on SOLR-12094:
---

You are right, it's not a good idea to ignore this. It should probably throw an 
exception of it encounters such a json. 

 

It's possible to implement a non streaming solution. User may pass an optional 
parameter to switch to that mode

> JsonRecordReader ignores root record fields after the split point
> -
>
> Key: SOLR-12094
> URL: https://issues.apache.org/jira/browse/SOLR-12094
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: master (8.0)
>Reporter: Przemysław Szeremiota
>Priority: Major
> Attachments: SOLR-12094.patch, SOLR-12094.patch, 
> json-record-reader-bug.patch
>
>
> JsonRecordReader, when configured with other than top-level split, ignores 
> all top-level JSON nodes after the split ends, for example:
> {code}
> {
>   "first": "John",
>   "last": "Doe",
>   "grade": 8,
>   "exams": [
> {
> "subject": "Maths",
> "test": "term1",
> "marks": 90
> },
> {
> "subject": "Biology",
> "test": "term1",
> "marks": 86
> }
>   ],
>   "after": "456"
> }
> {code}
> Node "after" won't be visible in SolrInputDocument constructed from 
> /update/json/docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12094) JsonRecordReader ignores root record fields after the split point

2018-03-28 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417003#comment-16417003
 ] 

Dawid Weiss commented on SOLR-12094:


I understand, but I also believe it's really likely that people have such 
nested JSONs and will want to use them. Now it quietly just discards those 
trailing entries and I don't think that's good either: it should either signal 
an exception (probably pointing at a non-streaming solution, if there is any) 
or work correctly. What do you think?

> JsonRecordReader ignores root record fields after the split point
> -
>
> Key: SOLR-12094
> URL: https://issues.apache.org/jira/browse/SOLR-12094
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: master (8.0)
>Reporter: Przemysław Szeremiota
>Priority: Major
> Attachments: SOLR-12094.patch, SOLR-12094.patch, 
> json-record-reader-bug.patch
>
>
> JsonRecordReader, when configured with other than top-level split, ignores 
> all top-level JSON nodes after the split ends, for example:
> {code}
> {
>   "first": "John",
>   "last": "Doe",
>   "grade": 8,
>   "exams": [
> {
> "subject": "Maths",
> "test": "term1",
> "marks": 90
> },
> {
> "subject": "Biology",
> "test": "term1",
> "marks": 86
> }
>   ],
>   "after": "456"
> }
> {code}
> Node "after" won't be visible in SolrInputDocument constructed from 
> /update/json/docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12094) JsonRecordReader ignores root record fields after the split point

2018-03-27 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416475#comment-16416475
 ] 

Noble Paul commented on SOLR-12094:
---

before going into the patch, I can see that it is not designed to work like 
that . The reason is that {{JsonRecordReader}} is a streaming parser. To 
include the {{'after'}} in the document, It must hold all the data in the 
{{'examsæ}} in memory. So, it is going to seriously affect the performance of 
the parser for the normal use case. 

> JsonRecordReader ignores root record fields after the split point
> -
>
> Key: SOLR-12094
> URL: https://issues.apache.org/jira/browse/SOLR-12094
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: master (8.0)
>Reporter: Przemysław Szeremiota
>Priority: Major
> Attachments: SOLR-12094.patch, SOLR-12094.patch, 
> json-record-reader-bug.patch
>
>
> JsonRecordReader, when configured with other than top-level split, ignores 
> all top-level JSON nodes after the split ends, for example:
> {code}
> {
>   "first": "John",
>   "last": "Doe",
>   "grade": 8,
>   "exams": [
> {
> "subject": "Maths",
> "test": "term1",
> "marks": 90
> },
> {
> "subject": "Biology",
> "test": "term1",
> "marks": 86
> }
>   ],
>   "after": "456"
> }
> {code}
> Node "after" won't be visible in SolrInputDocument constructed from 
> /update/json/docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12094) JsonRecordReader ignores root record fields after the split point

2018-03-27 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416136#comment-16416136
 ] 

Dawid Weiss commented on SOLR-12094:


I looked at the code of that streaming parser, it is quite complex; seems like 
all this node copying and record trickery could be avoided, but it'd be a 
significantly more complex patch then. [~noble.paul] - you seem to be involved 
much more in the parser development, would you like to take a look before I 
commit it in?

> JsonRecordReader ignores root record fields after the split point
> -
>
> Key: SOLR-12094
> URL: https://issues.apache.org/jira/browse/SOLR-12094
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: master (8.0)
>Reporter: Przemysław Szeremiota
>Priority: Major
> Attachments: SOLR-12094.patch, SOLR-12094.patch, 
> json-record-reader-bug.patch
>
>
> JsonRecordReader, when configured with other than top-level split, ignores 
> all top-level JSON nodes after the split ends, for example:
> {code}
> {
>   "first": "John",
>   "last": "Doe",
>   "grade": 8,
>   "exams": [
> {
> "subject": "Maths",
> "test": "term1",
> "marks": 90
> },
> {
> "subject": "Biology",
> "test": "term1",
> "marks": 86
> }
>   ],
>   "after": "456"
> }
> {code}
> Node "after" won't be visible in SolrInputDocument constructed from 
> /update/json/docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org