Re: mahout tf-idf vs lucene tf-idf

Dmitriy Lyubimov Mon, 06 Jun 2016 10:02:43 -0700

to add to Ted's reply, mahout has traditionally offered a bigram/trigram
analysis as a part of its tf-idf conversion (a step away from the bag of
words model so that directional statistically stable combinations of 2 or 3
words are reduced to their own term). However, this has not been ported to
spark/h20/flink engines, and is available as a mapreduce legacy algorithm
only.


On Sat, Jun 4, 2016 at 2:14 AM, forme book <forbookm...@gmail.com> wrote:

> Hi,
>
> I'm start to study text processing and I see that for evaluating two text
> is possible to obtaing vector model through TF-IDF technique.
>
> With Mahout is possible to create vectors from text with the use of
> lucene.vector, if I have not misheard takes a lucene index and then map as
> a tf-idf,
>
> On the (Lucene side) has already by default this implementations, what I do
> struggle to understand what is the advantage of having lucene.vector in
> mahout when Lucene offer that feature out of the box ?
>
> Maybe I'm missing something big but what’s the Connection Between then ?
>  could you please explain a possible user case ?
>
> Thanks for help
>
> Richard
>

Re: mahout tf-idf vs lucene tf-idf

Reply via email to