Re: Mixing simple and nested docs in same update?

2018-02-06 Thread Jan Høydahl
Hi Mikhail,

Lacking clarity on this in the Ref Guide, I'm trying to understand all 
requirements for block join here.
I have noticed that if I index the blocks as one ADD request and then 
afterwards index the "other"
single documents in another request, then the results look ok.

But is it enough with a new ADD to divide the two or do we actually need a 
COMMIT in between?

I'm also worried that after some index merges that the docs may be mixed up 
again?
Is there some structure in the segments that prevent that from happening?

Finally, when we some day need to SPLITSHARD, is the SPLIT API aware of blocks 
so that it will
never split in the middle of a block?

I hope to perhaps update the RefGuide documentation to clarify all of these 
constraints and pitfalls.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 5. feb. 2018 kl. 20:49 skrev Mikhail Khludnev :
> 
> Jan, mixing plan docs and blocks are not supported.
> 
> On Thu, Jan 11, 2018 at 2:42 AM, Jan Høydahl  wrote:
> 
>> Hi,
>> 
>> We index several large nested documents. We found that querying the data
>> behaves differently depending on how the documents are indexed.
>> 
>> To reproduce:
>> 
>> solr start
>> solr create -c nested
>> # Index one plain document, “friend" and a nested one, “mother” and
>> “daughter”, in same request:
>> curl localhost:8983/solr/nested/update -d ‘
>> 
>>   
>> friend
>> other
>>   
>>   
>> mother
>> parent
>> 
>>   daughter
>>   child
>> 
>>   
>> '
>> 
>> # Query for mother’s children using either child transformer or child
>> query parser
>> curl "localhost:8983/solr/a/query?q=id:mother=%2A%2C%5Bchild%
>> 20parentFilter%3Dtype%3Aparent%5D”
>> {
>>  "responseHeader":{
>>"zkConnected":true,
>>"status":0,
>>"QTime":4,
>>"params":{
>>  "q":"id:mother",
>>  "fl":"*,[child parentFilter=type:parent]"}},
>>  "response":{"numFound":1,"start":0,"docs":[
>>  {
>>"id":"mother",
>>"type":["parent"],
>>"_version_":1589249812802306048,
>>"type_str":["parent"],
>>"_childDocuments_":[
>>{
>>  "id":"friend",
>>  "type":["other"],
>>  "_version_":1589249812729954304,
>>  "type_str":["other"]},
>>{
>>  "id":"daughter",
>>  "type":["child"],
>>  "_version_":1589249812802306048,
>>  "type_str":["child"]}]}]
>>  }}
>> 
>> As you can see, the “friend” got included as a child of “mother”.
>> If you index the exact same request, putting “friend” after “mother” in
>> the xml,
>> the query works as expected.
>> 
>> Inspecting the index, everything looks correct, and only “daughter” and
>> “mother” have _root_=mother.
>> Is there a rule that you should start a new update request for each type
>> of parent/child relationship
>> that you need to index, and not mix them in the same request?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev



Re: Mixing simple and nested docs in same update?

2018-02-05 Thread Mikhail Khludnev
Jan, mixing plan docs and blocks are not supported.

On Thu, Jan 11, 2018 at 2:42 AM, Jan Høydahl  wrote:

> Hi,
>
> We index several large nested documents. We found that querying the data
> behaves differently depending on how the documents are indexed.
>
> To reproduce:
>
> solr start
> solr create -c nested
> # Index one plain document, “friend" and a nested one, “mother” and
> “daughter”, in same request:
> curl localhost:8983/solr/nested/update -d ‘
>  
>
>  friend
>  other
>
>
>  mother
>  parent
>  
>daughter
>child
>  
>
>  '
>
> # Query for mother’s children using either child transformer or child
> query parser
> curl "localhost:8983/solr/a/query?q=id:mother=%2A%2C%5Bchild%
> 20parentFilter%3Dtype%3Aparent%5D”
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":4,
> "params":{
>   "q":"id:mother",
>   "fl":"*,[child parentFilter=type:parent]"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"mother",
> "type":["parent"],
> "_version_":1589249812802306048,
> "type_str":["parent"],
> "_childDocuments_":[
> {
>   "id":"friend",
>   "type":["other"],
>   "_version_":1589249812729954304,
>   "type_str":["other"]},
> {
>   "id":"daughter",
>   "type":["child"],
>   "_version_":1589249812802306048,
>   "type_str":["child"]}]}]
>   }}
>
> As you can see, the “friend” got included as a child of “mother”.
> If you index the exact same request, putting “friend” after “mother” in
> the xml,
> the query works as expected.
>
> Inspecting the index, everything looks correct, and only “daughter” and
> “mother” have _root_=mother.
> Is there a rule that you should start a new update request for each type
> of parent/child relationship
> that you need to index, and not mix them in the same request?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Mixing simple and nested docs in same update?

2018-01-31 Thread Jan Høydahl
Thanks for the reply.

I see that the child doctransformer 
(https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html#TransformingResultDocuments-_child_-ChildDocTransformerFactory)
 has a childFilter= option which, when used, solves the issue/bug.
But such a childFilter does not exist for the BlockJoin QParsers.

Still not sure whether it is a bug or not...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 31. jan. 2018 kl. 00:30 skrev Tomas Fernandez Lobbe :
> 
> I believe the problem is that:
> * BlockJoin queries do not know about your “types”, in the BlockJoin query 
> world, everything that’s not a parent (matches the parentFilter) is a child.
> * All docs indexed before a parent are considered childs of that doc.
> That’s why in your first case it considers “friend” (not a parent, then a 
> child) to be a child of the first parent it can find in the segment (mother). 
> In the second case, the “friend” doc would have no parent. No parent document 
> matches the filter after it, so it’s not considered a match. 
> Maybe if you try your query with parentFilter=-type:child, this particular 
> example works (I haven’t tried it)?
> 
> Note that when you send docs with childs to Solr, Solr will make sure the 
> childs are indexed before the parent. Also note that there are some other 
> open bugs related to child docs, and in particular, with mixing child docs 
> with non-child docs, depending on which features you need this may be a 
> problem.
> 
> Tomás
> 
>> On Jan 30, 2018, at 5:48 AM, Jan Høydahl  wrote:
>> 
>> Pasting the GIST link :-) 
>> https://gist.github.com/45640fe3bad696d53ef8a0930a35d163 
>> 
>> Anyone knows if this is expected behavior?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 15. jan. 2018 kl. 14:08 skrev Jan Høydahl :
>>> 
>>> Radio silence…
>>> 
>>> Here is a GIST for easy reproduction. Is this by design?
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
 11. jan. 2018 kl. 00:42 skrev Jan Høydahl :
 
 Hi,
 
 We index several large nested documents. We found that querying the data 
 behaves differently depending on how the documents are indexed.
 
 To reproduce:
 
 solr start
 solr create -c nested
 # Index one plain document, “friend" and a nested one, “mother” and 
 “daughter”, in same request:
 curl localhost:8983/solr/nested/update -d ‘
 
 
  friend
  other
 
 
  mother
  parent
  
daughter
child
  
 
 '
 
 # Query for mother’s children using either child transformer or child 
 query parser
 curl 
 "localhost:8983/solr/a/query?q=id:mother=%2A%2C%5Bchild%20parentFilter%3Dtype%3Aparent%5D”
 {
 "responseHeader":{
 "zkConnected":true,
 "status":0,
 "QTime":4,
 "params":{
   "q":"id:mother",
   "fl":"*,[child parentFilter=type:parent]"}},
 "response":{"numFound":1,"start":0,"docs":[
   {
 "id":"mother",
 "type":["parent"],
 "_version_":1589249812802306048,
 "type_str":["parent"],
 "_childDocuments_":[
 {
   "id":"friend",
   "type":["other"],
   "_version_":1589249812729954304,
   "type_str":["other"]},
 {
   "id":"daughter",
   "type":["child"],
   "_version_":1589249812802306048,
   "type_str":["child"]}]}]
 }}
 
 As you can see, the “friend” got included as a child of “mother”.
 If you index the exact same request, putting “friend” after “mother” in 
 the xml,
 the query works as expected.
 
 Inspecting the index, everything looks correct, and only “daughter” and 
 “mother” have _root_=mother.
 Is there a rule that you should start a new update request for each type 
 of parent/child relationship
 that you need to index, and not mix them in the same request?
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
>>> 
>> 
> 



Re: Mixing simple and nested docs in same update?

2018-01-30 Thread Tomas Fernandez Lobbe
I believe the problem is that:
* BlockJoin queries do not know about your “types”, in the BlockJoin query 
world, everything that’s not a parent (matches the parentFilter) is a child.
* All docs indexed before a parent are considered childs of that doc.
That’s why in your first case it considers “friend” (not a parent, then a 
child) to be a child of the first parent it can find in the segment (mother). 
In the second case, the “friend” doc would have no parent. No parent document 
matches the filter after it, so it’s not considered a match. 
Maybe if you try your query with parentFilter=-type:child, this particular 
example works (I haven’t tried it)?

Note that when you send docs with childs to Solr, Solr will make sure the 
childs are indexed before the parent. Also note that there are some other open 
bugs related to child docs, and in particular, with mixing child docs with 
non-child docs, depending on which features you need this may be a problem.

Tomás

> On Jan 30, 2018, at 5:48 AM, Jan Høydahl  wrote:
> 
> Pasting the GIST link :-) 
> https://gist.github.com/45640fe3bad696d53ef8a0930a35d163 
> 
> Anyone knows if this is expected behavior?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 15. jan. 2018 kl. 14:08 skrev Jan Høydahl :
>> 
>> Radio silence…
>> 
>> Here is a GIST for easy reproduction. Is this by design?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 11. jan. 2018 kl. 00:42 skrev Jan Høydahl :
>>> 
>>> Hi,
>>> 
>>> We index several large nested documents. We found that querying the data 
>>> behaves differently depending on how the documents are indexed.
>>> 
>>> To reproduce:
>>> 
>>> solr start
>>> solr create -c nested
>>> # Index one plain document, “friend" and a nested one, “mother” and 
>>> “daughter”, in same request:
>>> curl localhost:8983/solr/nested/update -d ‘
>>> 
>>> 
>>>   friend
>>>   other
>>> 
>>> 
>>>   mother
>>>   parent
>>>   
>>> daughter
>>> child
>>>   
>>> 
>>> '
>>> 
>>> # Query for mother’s children using either child transformer or child query 
>>> parser
>>> curl 
>>> "localhost:8983/solr/a/query?q=id:mother=%2A%2C%5Bchild%20parentFilter%3Dtype%3Aparent%5D”
>>> {
>>> "responseHeader":{
>>>  "zkConnected":true,
>>>  "status":0,
>>>  "QTime":4,
>>>  "params":{
>>>"q":"id:mother",
>>>"fl":"*,[child parentFilter=type:parent]"}},
>>> "response":{"numFound":1,"start":0,"docs":[
>>>{
>>>  "id":"mother",
>>>  "type":["parent"],
>>>  "_version_":1589249812802306048,
>>>  "type_str":["parent"],
>>>  "_childDocuments_":[
>>>  {
>>>"id":"friend",
>>>"type":["other"],
>>>"_version_":1589249812729954304,
>>>"type_str":["other"]},
>>>  {
>>>"id":"daughter",
>>>"type":["child"],
>>>"_version_":1589249812802306048,
>>>"type_str":["child"]}]}]
>>> }}
>>> 
>>> As you can see, the “friend” got included as a child of “mother”.
>>> If you index the exact same request, putting “friend” after “mother” in the 
>>> xml,
>>> the query works as expected.
>>> 
>>> Inspecting the index, everything looks correct, and only “daughter” and 
>>> “mother” have _root_=mother.
>>> Is there a rule that you should start a new update request for each type of 
>>> parent/child relationship
>>> that you need to index, and not mix them in the same request?
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>> 
> 



Re: Mixing simple and nested docs in same update?

2018-01-30 Thread Jan Høydahl
Pasting the GIST link :-) 
https://gist.github.com/45640fe3bad696d53ef8a0930a35d163 

Anyone knows if this is expected behavior?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 15. jan. 2018 kl. 14:08 skrev Jan Høydahl :
> 
> Radio silence…
> 
> Here is a GIST for easy reproduction. Is this by design?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 11. jan. 2018 kl. 00:42 skrev Jan Høydahl :
>> 
>> Hi,
>> 
>> We index several large nested documents. We found that querying the data 
>> behaves differently depending on how the documents are indexed.
>> 
>> To reproduce:
>> 
>> solr start
>> solr create -c nested
>> # Index one plain document, “friend" and a nested one, “mother” and 
>> “daughter”, in same request:
>> curl localhost:8983/solr/nested/update -d ‘
>> 
>>  
>>friend
>>other
>>  
>>  
>>mother
>>parent
>>
>>  daughter
>>  child
>>
>>  
>> '
>> 
>> # Query for mother’s children using either child transformer or child query 
>> parser
>> curl 
>> "localhost:8983/solr/a/query?q=id:mother=%2A%2C%5Bchild%20parentFilter%3Dtype%3Aparent%5D”
>> {
>> "responseHeader":{
>>   "zkConnected":true,
>>   "status":0,
>>   "QTime":4,
>>   "params":{
>> "q":"id:mother",
>> "fl":"*,[child parentFilter=type:parent]"}},
>> "response":{"numFound":1,"start":0,"docs":[
>> {
>>   "id":"mother",
>>   "type":["parent"],
>>   "_version_":1589249812802306048,
>>   "type_str":["parent"],
>>   "_childDocuments_":[
>>   {
>> "id":"friend",
>> "type":["other"],
>> "_version_":1589249812729954304,
>> "type_str":["other"]},
>>   {
>> "id":"daughter",
>> "type":["child"],
>> "_version_":1589249812802306048,
>> "type_str":["child"]}]}]
>> }}
>> 
>> As you can see, the “friend” got included as a child of “mother”.
>> If you index the exact same request, putting “friend” after “mother” in the 
>> xml,
>> the query works as expected.
>> 
>> Inspecting the index, everything looks correct, and only “daughter” and 
>> “mother” have _root_=mother.
>> Is there a rule that you should start a new update request for each type of 
>> parent/child relationship
>> that you need to index, and not mix them in the same request?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
> 



Re: Mixing simple and nested docs in same update?

2018-01-15 Thread Jan Høydahl
Radio silence…

Here is a GIST for easy reproduction. Is this by design?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 11. jan. 2018 kl. 00:42 skrev Jan Høydahl :
> 
> Hi,
> 
> We index several large nested documents. We found that querying the data 
> behaves differently depending on how the documents are indexed.
> 
> To reproduce:
> 
> solr start
> solr create -c nested
> # Index one plain document, “friend" and a nested one, “mother” and 
> “daughter”, in same request:
> curl localhost:8983/solr/nested/update -d ‘
> 
>   
> friend
> other
>   
>   
> mother
> parent
> 
>   daughter
>   child
> 
>   
> '
> 
> # Query for mother’s children using either child transformer or child query 
> parser
> curl 
> "localhost:8983/solr/a/query?q=id:mother=%2A%2C%5Bchild%20parentFilter%3Dtype%3Aparent%5D”
> {
>  "responseHeader":{
>"zkConnected":true,
>"status":0,
>"QTime":4,
>"params":{
>  "q":"id:mother",
>  "fl":"*,[child parentFilter=type:parent]"}},
>  "response":{"numFound":1,"start":0,"docs":[
>  {
>"id":"mother",
>"type":["parent"],
>"_version_":1589249812802306048,
>"type_str":["parent"],
>"_childDocuments_":[
>{
>  "id":"friend",
>  "type":["other"],
>  "_version_":1589249812729954304,
>  "type_str":["other"]},
>{
>  "id":"daughter",
>  "type":["child"],
>  "_version_":1589249812802306048,
>  "type_str":["child"]}]}]
>  }}
> 
> As you can see, the “friend” got included as a child of “mother”.
> If you index the exact same request, putting “friend” after “mother” in the 
> xml,
> the query works as expected.
> 
> Inspecting the index, everything looks correct, and only “daughter” and 
> “mother” have _root_=mother.
> Is there a rule that you should start a new update request for each type of 
> parent/child relationship
> that you need to index, and not mix them in the same request?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 



Mixing simple and nested docs in same update?

2018-01-10 Thread Jan Høydahl
Hi,

We index several large nested documents. We found that querying the data 
behaves differently depending on how the documents are indexed.

To reproduce:

solr start
solr create -c nested
# Index one plain document, “friend" and a nested one, “mother” and “daughter”, 
in same request:
curl localhost:8983/solr/nested/update -d ‘
 
   
 friend
 other
   
   
 mother
 parent
 
   daughter
   child
 
   
 '

# Query for mother’s children using either child transformer or child query 
parser
curl 
"localhost:8983/solr/a/query?q=id:mother=%2A%2C%5Bchild%20parentFilter%3Dtype%3Aparent%5D”
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":4,
"params":{
  "q":"id:mother",
  "fl":"*,[child parentFilter=type:parent]"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"mother",
"type":["parent"],
"_version_":1589249812802306048,
"type_str":["parent"],
"_childDocuments_":[
{
  "id":"friend",
  "type":["other"],
  "_version_":1589249812729954304,
  "type_str":["other"]},
{
  "id":"daughter",
  "type":["child"],
  "_version_":1589249812802306048,
  "type_str":["child"]}]}]
  }}

As you can see, the “friend” got included as a child of “mother”.
If you index the exact same request, putting “friend” after “mother” in the xml,
the query works as expected.

Inspecting the index, everything looks correct, and only “daughter” and 
“mother” have _root_=mother.
Is there a rule that you should start a new update request for each type of 
parent/child relationship
that you need to index, and not mix them in the same request?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com