Term vectors are, to some extent, the opposite of the inverted index. They store term, position and offset (the latter two are optional) on a per document basis, such that you can say "give me the terms, position and offsets for document X". In terms of MLT, they are used to figure out what the most "important" terms are in a document, such that a new query can be formed to find other documents that are "more like this" document. They are also useful for highlighting and other non-search related activities like clustering, etc.

For more info, see my talk at ApacheCon: http://cnlp.org/presentations/slides/AdvancedLucene.pdf Also, search for term vectors on the Lucene user mailing list (you can do this via Nabble)

-Grant

On Jan 20, 2008, at 10:04 PM, anuvenk wrote:


what are term vectors? How do they help with mlt?
--
View this message in context: 
http://www.nabble.com/Term-vector-tp14990408p14990408.html
Sent from the Solr - User mailing list archive at Nabble.com.


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




Reply via email to