Re: TFIDF Implementation

2004-12-15 Thread Christoph Kiefer
David, Bruce, Otis, Thank you all for the quick replies. I looked through the BooksLikeThis example. I also agree, it's a very good and effective way to find similar docs in the index. Nevertheless, what I need is really a similarity matrix holding all TF*IDF values. For illustration I quick and

Re: TFIDF Implementation

2004-12-15 Thread David Spencer
Christoph Kiefer wrote: David, Bruce, Otis, Thank you all for the quick replies. I looked through the BooksLikeThis example. I also agree, it's a very good and effective way to find similar docs in the index. Nevertheless, what I need is really a similarity matrix holding all TF*IDF values. For

Re: TFIDF Implementation

2004-12-14 Thread David Spencer
- From: Christoph Kiefer [mailto:[EMAIL PROTECTED] Sent: December 14, 2004 11:45 AM To: Lucene Users List Subject: TFIDF Implementation Hi, My current task/problem is the following: I need to implement TFIDF document term ranking using Jakarta Lucene to compute a similarity rank between arbitrary

Re: TFIDF Implementation

2004-12-14 Thread David Spencer
://www.jivesoftware.com/ -Original Message- From: Christoph Kiefer [mailto:[EMAIL PROTECTED] Sent: December 14, 2004 11:45 AM To: Lucene Users List Subject: TFIDF Implementation Hi, My current task/problem is the following: I need to implement TFIDF document term ranking using Jakarta Lucene

RE: TFIDF Implementation

2004-12-14 Thread Otis Gospodnetic
[mailto:[EMAIL PROTECTED] Sent: December 14, 2004 11:45 AM To: Lucene Users List Subject: TFIDF Implementation Hi, My current task/problem is the following: I need to implement TFIDF document term ranking using Jakarta Lucene to compute a similarity rank between arbitrary documents

RE: TFIDF Implementation

2004-12-14 Thread Bruce Ritchie
You can also see 'Books like this' example from here https://secure.manning.com/catalog/view.php?book=hatcher2item=source Well done, uses a term vector, instead of reparsing the orig doc, to form the similarity query. Also I like the way you exclude the source doc in the query, I

RE: TFIDF Implementation

2004-12-14 Thread Bruce Ritchie
From the code I looked at, those calls don't recalculate on every call. I was referring to this fragment below from BooksLikeThis.docsLike(), and was mentioning it as the javadoc http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/in dex/TermFreqVector.html does not say that

Re: TFIDF Implementation

2004-12-14 Thread David Spencer
Bruce Ritchie wrote: From the code I looked at, those calls don't recalculate on every call. I was referring to this fragment below from BooksLikeThis.docsLike(), and was mentioning it as the javadoc http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/in dex/TermFreqVector.html does

Re: TFIDF Implementation

2004-12-14 Thread David Spencer
Bruce Ritchie wrote: You can also see 'Books like this' example from here https://secure.manning.com/catalog/view.php?book=hatcher2item=source Well done, uses a term vector, instead of reparsing the orig doc, to form the similarity query. Also I like the way you exclude the source doc in

RE: TFIDF Implementation

2004-12-14 Thread Bruce Ritchie
. Regards, Bruce Ritchie http://www.jivesoftware.com/ -Original Message- From: Christoph Kiefer [mailto:[EMAIL PROTECTED] Sent: December 14, 2004 11:45 AM To: Lucene Users List Subject: TFIDF Implementation Hi, My current task/problem is the following: I need to implement

TFIDF Implementation

2004-12-14 Thread Christoph Kiefer
Hi, My current task/problem is the following: I need to implement TFIDF document term ranking using Jakarta Lucene to compute a similarity rank between arbitrary documents in the constructed index. I saw from the API that there are similar functions already implemented in the class Similarity and