Hi,
Thank you for the answers. At the end I calculated the Topic Frequency
using Java, getting the text, broken into tokens and calculating from
there. It turns out to be around 6 times faster in my case (using cache).
Only the document frequency I keep calculating using Lucene.
Regards,
Bian
Erick, Solr termfreq implementation also uses DocsEnum with the assumption that
freq are called on ascending
doc IDs which is valid when scoring from from the hit list. If freq is
requested for an out of order doc, a new
DocsEnum has to be created.
Bianca, can you explain your use case in more
Have you looked into term vectors? I think they should fit your bill
pretty neatly. Here's a nice blog post with helpful background info:
http://blog.jpountz.net/post/41301889664/putting-term-vectors-on-a-diet
-Mike
On 8/19/2014 10:04 AM, Bianca Pereira wrote:
Hi everybody,
I would like
Hmmm, I'm not at all an expert here, but Solr has a function
query "termfreq" that does what you're doing I think? I wonder
if the code for that function query would be a good place to
copy (or even make use of)? See TermFreqValueSource...
Maybe not helpful at all, but...
Erick
On Tue, Aug 19, 20