: I have a fresh install of Solr 5.2.1 with about 3 million docs freshly
: indexed (I can also reproduce this issue on 4.10.0). When I use the Solr
: MorelikeThisHandler with content stream I'm getting different results per
: shard.

I haven't looked at the code recently but i'm 99% certain that the MLT 
handler in general doesn't work with distributed (ie: sharded) queries.  
(unlike the MLT component and the recently added MLT qparser)

I suspect that in the specific case of stream.body, what you are seeing is 
that the interesting terms are being computed relative the local tf/idf 
stats for that shard, and then only local results from that shard are 
being returned.

: I also looked at using a standard MLT query, but I need to be able to
: stream in a fairly large block of text for comparison that is not in the
: index (different type of document). A standard MLT  query

Until/unless the MLT parser supports arbitrary text (there's some mention 
of this in SOLR-7639 but i'm not sure what the status of that is) you 
might find that just POSTing all of your text as a regular query (q) using 
dismax or edismax is suitable for your needs -- that's essentially the 
equivilent of what MLTHandler does with a stream.body, except it tries to 
only focus on "interesting terms" based on tf/idf, but if your fields 
are all configured with stopword files anyway, then the results and 
performance may be similar.


-Hoss
http://www.lucidworks.com/

Reply via email to