Hi all, I'm curious what approaches are recommended for generating user-user similarity, when I've got two (or more) distinct types of item data, both of which are fairly large.
E.g. let's say I had a set of users where I knew both (a) what books they had bought on Amazon, and (b) what YouTube videos they had watched. For each user, I want to find the 10 most similar other users. - I could create two separate models, find the nearest 30 users for each user, and combine (maybe with weighting) the results. - I could toss all of the data into one model - and I could use a value of < 1.0 for whichever type of preference is less important. Any other suggestions? Input on the above two approaches? Thanks! -- Ken -------------------------- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr
