It's sometimes difficult to define the recommendation problem, given a lot of possible data to work with.
I take it that you are trying to recommend activities to people here. But, you're also talking a lot about computing similarities between people to cluster, and between activities. Do you need these as output, or do you think you need them as part of a recommendation process (you don't necessarily)? I think you mean that you want a similarity function between activities rather than distance function. Just create a DataModel with your user-activity data, and then use any ItemSimilarity implementation to find similarity between any pair of activities. Clustering users is a different problem entirely. If you just want user-user similarity, you can similarly use the same data and a UserSimilarity function to find any user-user similarity. But, you are talking about using user features to compute this similarity. This isn't a collaborative filtering problem anymore, then. maybe you can clarify first what you want to do. On Fri, Sep 21, 2012 at 1:48 PM, kostas_new <[email protected]> wrote: > Hello, > > I am programming a recommendation system in terms of a course project in > order to propose activities for a specific person. > I have installed mahout and handoop in order to succeed that. > > The attributes which enroll important role in the recommendation system are > the followings: > 1) all the attributes for each one person (e.g. age, gender, his/her > preferences in different types of activity) > 2) The core activities (type of activities, target_group(numerical > attribute)) > The numeric attribute it is not a problem because is only a number. As a > result i would I would like to declare a "distance function" between the > different activities, for example the relationship distance between the > football and basketball should be strong, because are parts of the sports > category. Otherwise the distance between basketball and opera should be > larger. Except of the unique characterization of each one of the activities, > I would prefer to characterize each activity by many types, for example > activity X -> is an opera with education character. That is a multi > dimension of nominal attributes. > ) > > <http://lucene.472066.n3.nabble.com/file/n4009403/MULTI_nominal.jpg> > > *Question* > How do you recommend me to implement the recommendation algorithm process?! > (*Q1*) > One the one hand, I am thinking to do the clustering for the users. The > clustering must take under consideration for example the age, e.g. 33 years > old , and the preferences, for example the user 1023 prefers to go to B1 > (activity type = B2), in that point the creation of the vector is a headache > for me because I have to measure the distance between the different > activities, counting as well the user’s preferences.(*Q2*). > > On the other hand because I know the exact number and features of the > activities I don’t think that it is needed to implement a clustering for > them. > > As the last step, I want to use the collaborative filtering for my > recommendations. For example the input table will follow the format: > User_id activity_id Preference (0..1) > 100 500 0.5 > 200 300 0.9 > > I know that I could use this table only for my recommendations, but I want > to take advantage on the user preferences and of the dependencies between > the different types of activities which hypothetically could be siblings. > > Thank you very much for your time. Unfortunately I have spent many months in > order to find a solution. > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-create-a-recommended-system-tp4009403.html > Sent from the Mahout User List mailing list archive at Nabble.com.
