This gets really close:
q=
fl=id,subquery:[subquery],[shard]
subquery.q=
subquery.fq={!cache=false} +{!terms f=_root_ v=$row.id}
subquery.shards=$row.[shard]
The issue here is that local params aren't a thing except in a query parser,
and the "shards=" param isn't a query so it isn't parsed. So I have no way to
dereference the "$row.[shard]".
On 3/27/18, 4:00 PM, "Jeff Wartes" wrote:
I have a large 7.2 index with nested documents and many shards.
For each result (parent doc) in a query, I want to gather a
relevance-ranked subset of the child documents. It seemed like the subquery
transformer would be ideal:
https://lucene.apache.org/solr/guide/7_2/transforming-result-documents.html#TransformingResultDocuments-_subquery_
(the [child] transformer allows for a filter, but the results have an
effectively random sort)
So maybe something like this:
q=
fl=id,subquery:[subquery]
subquery.q=
subquery.fq={!cache=false} +{!terms f=_root_ v=$row.id}
This actually works fine, but there’s a lot more work going on than
necessary. Say we have X shards and get N documents back:
Query http requests = 1 top-level query + X distributed shard-requests
Subquery http requests = N rows + N * X distributed shard-requests
So with N=10 results and X=50 shards, that is: 1+50+10+500 = 561 http
requests through the cluster.
Some of that is unavoidable, of course, but it occurs to me that all the
child docs are indexed in the same shard (segment) that the parent doc is.
Meaning that if you know the parent doc id, (and I do) you can use the document
routing to know exactly which shard to send the subquery request to. This would
save 490 of the http requests in the scenario above.
Is there any form of query that allows for explicitly following the
document routing rules for a given document ID?
I’m aware of the “distrib=false” and “shards=foo” parameters, but using
those would require me to recreate the document routing in the client.
There’s also the “fl=[shard]” thing, but that would still require me to
handle the subqueries in the client.