Re: Calculate Term Frequency

2014-08-22 Thread Bianca Pereira
Hi, Thank you for the answers. At the end I calculated the Topic Frequency using Java, getting the text, broken into tokens and calculating from there. It turns out to be around 6 times faster in my case (using cache). Only the document frequency I keep calculating using Lucene. Regards, Bian

Re: Calculate Term Frequency

2014-08-19 Thread Tri Cao
Erick, Solr termfreq implementation also uses DocsEnum with the assumption that freq are called on ascending doc IDs which is valid when scoring from from the hit list. If freq is requested for an out of order doc, a new DocsEnum has to be created. Bianca, can you explain your use case in more

Re: Calculate Term Frequency

2014-08-19 Thread Michael Sokolov
Have you looked into term vectors? I think they should fit your bill pretty neatly. Here's a nice blog post with helpful background info: http://blog.jpountz.net/post/41301889664/putting-term-vectors-on-a-diet -Mike On 8/19/2014 10:04 AM, Bianca Pereira wrote: Hi everybody, I would like

Re: Calculate Term Frequency

2014-08-19 Thread Erick Erickson
Hmmm, I'm not at all an expert here, but Solr has a function query "termfreq" that does what you're doing I think? I wonder if the code for that function query would be a good place to copy (or even make use of)? See TermFreqValueSource... Maybe not helpful at all, but... Erick On Tue, Aug 19, 20