Hi mc tell,

For featurizing the description, in case you have all description on hand,
you could use LDA to extract the set of topics and then per description
decide what topics are relevant for it and to what extend. This is provided
by the current implementation of LDA in the form of topic-probability
distribution per document (see
MAHOUT-458<https://issues.apache.org/jira/browse/MAHOUT-458>
)

Regards, Vasil

On Tue, Jun 7, 2011 at 3:23 PM, mc tell <[email protected]> wrote:

> Hi,
>
> I would like to build a system able to say how similar two items are from a
> set of attributes including: title, genre, ratings, year, description and
> more.
> So i guess i could build a feature vector for each item and then come up
> with some similarity measures.
>
> However i have no clue on which method i could use to:
> - determine a weight to put on each feature (other than intuitive)
> - how to deal with the 'description' attribute (i.e. a more or less long
> free text) and to transform it into a relevant set of features.
> - what algorithms in mahout could be adapted to build such things
>
> Thanks a lot in advance for any insights, links or anything related to
> that.
>

Reply via email to