Cao Manh Dat created LUCENE-6968:
------------------------------------

             Summary: LSH Filter
                 Key: LUCENE-6968
                 URL: https://issues.apache.org/jira/browse/LUCENE-6968
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Cao Manh Dat


I'm planning to implement LSH. Which support query like this
{quote}
Find similar documents that have 0.8 or higher similar score with a given 
document. Similarity measurement can be cosine, jaccard, euclid..
{quote}
For example. Given following corpus
{quote}
1. Solr is an open source search engine based on Lucene
2. Solr is an open source enterprise search engine based on Lucene
3. Solr is an popular open source enterprise search engine based on Lucene
4. Apache Lucene is a high-performance, full-featured text search engine 
library written entirely in Java
{quote}
We wanna find documents that have 0.6 score in jaccard with this doc
{quote}
Solr is an open source search engine
{quote}
It we return only docs 1,2 and 3 (MoreLikeThis will also return doc 4)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to