: I have a fresh install of Solr 5.2.1 with about 3 million docs freshly : indexed (I can also reproduce this issue on 4.10.0). When I use the Solr : MorelikeThisHandler with content stream I'm getting different results per : shard.
I haven't looked at the code recently but i'm 99% certain that the MLT handler in general doesn't work with distributed (ie: sharded) queries. (unlike the MLT component and the recently added MLT qparser) I suspect that in the specific case of stream.body, what you are seeing is that the interesting terms are being computed relative the local tf/idf stats for that shard, and then only local results from that shard are being returned. : I also looked at using a standard MLT query, but I need to be able to : stream in a fairly large block of text for comparison that is not in the : index (different type of document). A standard MLT query Until/unless the MLT parser supports arbitrary text (there's some mention of this in SOLR-7639 but i'm not sure what the status of that is) you might find that just POSTing all of your text as a regular query (q) using dismax or edismax is suitable for your needs -- that's essentially the equivilent of what MLTHandler does with a stream.body, except it tries to only focus on "interesting terms" based on tf/idf, but if your fields are all configured with stopword files anyway, then the results and performance may be similar. -Hoss http://www.lucidworks.com/