Have you already checked Solr's more like this?
http://wiki.apache.org/solr/MoreLikeThisHandler and
http://wiki.apache.org/solr/MoreLikeThis Your describe a problem similar to
the use case of that component and if there is something to hack is solr's
more like this.
Lucene's similarity is a low le
Is there an api in Lucene for finding the similarity score for two
documents that have been randomly pulled from an index? What about for a
query and a randomly selected document?
I realize this isn't the standard purpose of Lucene, but I was given a task
to compare similarity scores for the Simil