Re: Routing a subquery directly to the shard a document came from

2018-03-29 Thread Jeff Wartes

This gets really close:

q=
fl=id,subquery:[subquery],[shard]
subquery.q=
subquery.fq={!cache=false} +{!terms f=_root_ v=$row.id}
subquery.shards=$row.[shard]

The issue here is that local params aren't a thing except in a query parser, 
and the "shards=" param isn't a query so it isn't parsed. So I have no way to 
dereference the "$row.[shard]".


On 3/27/18, 4:00 PM, "Jeff Wartes"  wrote:


I have a large 7.2 index with nested documents and many shards.
For each result (parent doc) in a query, I want to gather a 
relevance-ranked subset of the child documents. It seemed like the subquery 
transformer would be ideal: 
https://lucene.apache.org/solr/guide/7_2/transforming-result-documents.html#TransformingResultDocuments-_subquery_
(the [child] transformer allows for a filter, but the results have an 
effectively random sort)

So maybe something like this:
q=
fl=id,subquery:[subquery]
subquery.q=
subquery.fq={!cache=false} +{!terms f=_root_ v=$row.id}

This actually works fine, but there’s a lot more work going on than 
necessary. Say we have X shards and get N documents back:

Query http requests = 1 top-level query + X distributed shard-requests
Subquery http requests = N rows + N * X distributed shard-requests
So with N=10 results and X=50 shards, that is: 1+50+10+500 = 561 http 
requests through the cluster.

Some of that is unavoidable, of course, but it occurs to me that all the 
child docs are indexed in the same shard (segment) that the parent doc is. 
Meaning that if you know the parent doc id, (and I do) you can use the document 
routing to know exactly which shard to send the subquery request to. This would 
save 490 of the http requests in the scenario above.

Is there any form of query that allows for explicitly following the 
document routing rules for a given document ID?

I’m aware of the “distrib=false” and “shards=foo” parameters, but using 
those would require me to recreate the document routing in the client.
There’s also the “fl=[shard]” thing, but that would still require me to 
handle the subqueries in the client.






Routing a subquery directly to the shard a document came from

2018-03-27 Thread Jeff Wartes

I have a large 7.2 index with nested documents and many shards.
For each result (parent doc) in a query, I want to gather a relevance-ranked 
subset of the child documents. It seemed like the subquery transformer would be 
ideal: 
https://lucene.apache.org/solr/guide/7_2/transforming-result-documents.html#TransformingResultDocuments-_subquery_
(the [child] transformer allows for a filter, but the results have an 
effectively random sort)

So maybe something like this:
q=
fl=id,subquery:[subquery]
subquery.q=
subquery.fq={!cache=false} +{!terms f=_root_ v=$row.id}

This actually works fine, but there’s a lot more work going on than necessary. 
Say we have X shards and get N documents back:

Query http requests = 1 top-level query + X distributed shard-requests
Subquery http requests = N rows + N * X distributed shard-requests
So with N=10 results and X=50 shards, that is: 1+50+10+500 = 561 http requests 
through the cluster.

Some of that is unavoidable, of course, but it occurs to me that all the child 
docs are indexed in the same shard (segment) that the parent doc is. Meaning 
that if you know the parent doc id, (and I do) you can use the document routing 
to know exactly which shard to send the subquery request to. This would save 
490 of the http requests in the scenario above.

Is there any form of query that allows for explicitly following the document 
routing rules for a given document ID?

I’m aware of the “distrib=false” and “shards=foo” parameters, but using those 
would require me to recreate the document routing in the client.
There’s also the “fl=[shard]” thing, but that would still require me to handle 
the subqueries in the client.