Hi, I would like to build a system able to say how similar two items are from a set of attributes including: title, genre, ratings, year, description and more. So i guess i could build a feature vector for each item and then come up with some similarity measures.
However i have no clue on which method i could use to: - determine a weight to put on each feature (other than intuitive) - how to deal with the 'description' attribute (i.e. a more or less long free text) and to transform it into a relevant set of features. - what algorithms in mahout could be adapted to build such things Thanks a lot in advance for any insights, links or anything related to that.
