Chris Hostetter wrote:
: + If termVectors are not stored, !MoreLikeThis will generate terms from
: stored fields. If multiple fields are used for similarity, solr will
: use the default Analyzer -- NOTE: this may or ''may not'' match the
: Analyzer used to index the field. If only one field is used for
: similarity, solr will use the Analyzer defined in schema.xml
what do you mean by the "default Analyzer" .. is that StandardAnalyzer,
IndexSchema.getAnalyzer(), or IndexSchema.getQueryAnalyzer() ? ... in the
case of hte later two they will automaticly pick the correct Analyzer for
hte FieldType.
Ahhh! I didn't realize that is how those worked. Currently I am only
setting the analyzer if there is only one field and using
fieldType.getAnalyzer() -- a better solution is to use:
searcher.getSchema().getAnalyzer()
In that case, the comment should read something like:
"If termVectors are not stored, !MoreLikeThis will generate terms from
stored fields using the Analyzer defined in schema.xml."
(although an interesting question is what happens if i want to find
similar docs based on a field htat is stored by not indexed so it *really*
has no analyzer)
I think the MLT implementation would need some modification to support
that -- what you are suggesting is to get the top tf/idf terms for a
stored but not indexed field then query against a different field (that
is indexed). As is, it compares like fields to one another...