Do you expect to have relatively large or relatively small result sets? For
the former, are you willing to accept slow performance? I mean, your logic
will have to scan all of the documents and fetch and check their term
frequencies to count up df for each desired term. Maybe at least some of
t
Hi Tommaso,
Thank you for your reply and tweet!
> Some useful points / suggestions come out of it, let's see if we can follow
> up :)
Let's see simple one first. :-) Why don't we consider adding Analyzer parameter
to assignClass()?
koji
(14/03/07 17:18), Tommaso Teofili wrote:
cool Koji, tha
On Thu, Mar 6, 2014 at 6:28 PM, Furkan KAMACI wrote:
> Hi;
>
> Tf-Idf is explanation says that:
>
> *idf(t)* appears for *t* in both the query and the document, hence it is
> squared in the equation.
>
> DefaultSimilarity does not square it. What it the explanation of it?
I think you explained it
Thanks for bringing closure Jason!
Mike McCandless
http://blog.mikemccandless.com
On Fri, Mar 7, 2014 at 12:30 AM, Jason Wee wrote:
> Hello Mike,
>
> Thank you and you were right in your first comment, the expected field,
> Lucene46FieldInfos is within the file _0.cfs. We have taken a closer l
cool Koji, thanks a lot for sharing.
Some useful points / suggestions come out of it, let's see if we can follow
up :)
Regards,
Tommaso
2014-03-07 3:30 GMT+01:00 Koji Sekiguchi :
> Hello,
>
> I just posted an article on Comparing Document Classification Functions
> of Lucene and Mahout.
>
>
> h