Actually, I want to do SlopeOne on KDD-MUSIC dataset... As you know, it's really big, and the diff-matrix is 160GB size. Though I have a 120GB RAM machine, that's not enough. Now I'm going to predict the rating for the users in the test set, so I think I need to import the user-profile.
I wonder can't I use a map-reduce program to calculate the predictions? If I can, would you please give me some hints? Thank you. On Mon, Apr 11, 2011 at 3:31 PM, Sean Owen <sro...@gmail.com> wrote: > There is no distributed slope-one implementation at this time. You > need to copy the resulting diffs output off HDFS to a local disk. Then > you simply use it as input to a MemoryDiffStorage for > SlopeOneRecommender. > > However, if you have computed diffs over a large number of items, it > may not fit in memory. You can try JDBCDiffStorage and put diffs in a > database, but you may find it's just too slow. Or you can set > MemoryDiffStorage to cap the number of diffs it store. > > None of these algorithms involve a user profile. > > On Mon, Apr 11, 2011 at 8:20 AM, ke xie <oed...@gmail.com> wrote: > > Hi there: > > > > I've successfully used a hadoop program to calculate the diff-matrix, and > > stored the data in my HDFS... > > > > But now I'm confusing, how can I read the users' profile as well as the > > diff-matrix at the same time(they are at different location in my HDFS) > to > > predict a specific user's ratings? > > > > I've already checked the mahout implementation of Slopeone with hadoop, > but > > that one just did the calculation of diff-matrix.. and no prediction part > is > > included... > > > > Anyone can help me? How to read two kinds of data in Hadoop program at > the > > same time? > > > > > > -- > > Name: Ke Xie Eddy > > Research Group of Information Retrieval > > State Key Laboratory of Intelligent Technology and Systems > > Tsinghua University > > > -- Name: Ke Xie Eddy Research Group of Information Retrieval State Key Laboratory of Intelligent Technology and Systems Tsinghua University