Shalin, Thanks for the response and explanation! I logged a JIRA per your request here: https://issues.apache.org/jira/browse/SOLR-10695
Chris On Mon, May 15, 2017 at 3:40 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Sun, May 14, 2017 at 7:40 PM, Chris Troullis <cptroul...@gmail.com> > wrote: > > Hi, > > > > I've been experimenting with various sharding strategies with Solr cloud > > (6.5.1), and am seeing some odd behavior when using the implicit router. > I > > am probably either doing something wrong or misinterpreting what I am > > seeing in the logs, but if someone could help clarify that would be > awesome. > > > > I created a collection using the implicit router, created 10 shards, > named > > shard1, shard2, etc. I indexed 3000 documents to each shard, routed by > > setting the _route_ field on the documents in my schema. All works fine, > I > > verified there are 3000 documents in each shard. > > > > The odd behavior I am seeing is when I try to route a query to a specific > > shard. I submitted a simple query to shard1 using the request parameter > > _route_=shard1. The query comes back fine, but when I looked in the logs, > > it looked like it was issuing 3 separate requests: > > > > 1. The original query to shard1 > > 2. A 2nd query to shard1 with the parameter ids=a bunch of document ids > > 3. The original query to a random shard (changes every time I run the > query) > > > > It looks like the first query is getting back a list of ids, and the 2nd > > query is retrieving the documents for those ids? I assume this is some > solr > > cloud implementation detail. > > > > What I don't understand is the 3rd query. Why is it issuing the original > > query to a random shard every time, when I am specifying the _route_? The > > _route_ parameter is definitely doing something, because if I remove it, > it > > is querying all shards (which I would expect). > > > > Any ideas? I can provide the actual queries from the logs if required. > > How many nodes is this collection distributed across? I suspect that > you are using a single node for experimentation? > > What happens with _route_=shard1 parameter and implicit routing is > that the _route_ parameter is resolved to a list of replicas of > shard1. But, SolrJ uses only the node name of the replica along with > the collection name to make the request (this is important, we'll come > back to this later). So, ordinarily, that node hosts a single shard > (shard1) and when it receives the request, it will optimize the search > to go the non-distributed code path (since the replica has all the > data needed to satisfy the search). > > But interesting things happen when the node hosts more than one shard > (say shard1 and shard3 both). When we query such a node using just the > collection name, the collection name can be resolved to either shard1 > or shard3 -- this is picked randomly without looking at _route_ > parameter at all. If shard3 is picked, it looks at the request, sees > that it doesn't have all the necessary data and decides to follow the > two-phase distributed search path where phase 1 is to get the ids and > score of the documents matching the query from all participating > shards (the list of such shards is limited by _route_ parameter, which > in our case will be only shard1) and a second phase where we get the > actual stored fields to be returned to the user. So you get three > queries in the log, 1) phase 1 of distributed search hitting shard1, > 2) phase two of distributed search hitting shard1 and 3) the > distributed scatter-gather search run by shard3. > > So to recap, this is happening because you have more than one shard1 > hosted on a node. Easy workaround is to have each shard hosted on a > unique node. But we can improve things on the solr side as well by 1) > having SolrJ resolve requests down to node name and core name, 2) > having the collection name to core name resolution take _route_ param > into account. Both 1 and 2 can solve the problem. Can you please open > a Jira issue? > > > > > Thanks, > > > > Chris > > > > -- > Regards, > Shalin Shekhar Mangar. >