Actually, I want to do SlopeOne on KDD-MUSIC dataset...

As you know, it's really big, and the diff-matrix is 160GB size. Though I
have a 120GB RAM machine, that's not enough. Now I'm going to predict the
rating for the users in the test set, so I think I need to import the
user-profile.


I wonder can't I use a map-reduce program to calculate the predictions? If I
can, would you please give me some hints?

Thank you.


On Mon, Apr 11, 2011 at 3:31 PM, Sean Owen <sro...@gmail.com> wrote:

> There is no distributed slope-one implementation at this time. You
> need to copy the resulting diffs output off HDFS to a local disk. Then
> you simply use it as input to a MemoryDiffStorage for
> SlopeOneRecommender.
>
> However, if you have computed diffs over a large number of items, it
> may not fit in memory. You can try JDBCDiffStorage and put diffs in a
> database, but you may find it's just too slow. Or you can set
> MemoryDiffStorage to cap the number of diffs it store.
>
> None of these algorithms involve a user profile.
>
> On Mon, Apr 11, 2011 at 8:20 AM, ke xie <oed...@gmail.com> wrote:
> > Hi there:
> >
> > I've successfully used a hadoop program to calculate the diff-matrix, and
> > stored the data in my HDFS...
> >
> > But now I'm confusing, how can I read the users' profile as well as the
> > diff-matrix at the same time(they are at different location in my HDFS)
> to
> > predict a specific user's ratings?
> >
> > I've already checked the mahout implementation of Slopeone with hadoop,
> but
> > that one just did the calculation of diff-matrix.. and no prediction part
> is
> > included...
> >
> > Anyone can help me? How to read two kinds of data in Hadoop program at
> the
> > same time?
> >
> >
> > --
> > Name: Ke Xie   Eddy
> > Research Group of Information Retrieval
> > State Key Laboratory of Intelligent Technology and Systems
> > Tsinghua University
> >
>



-- 
Name: Ke Xie   Eddy
Research Group of Information Retrieval
State Key Laboratory of Intelligent Technology and Systems
Tsinghua University

Reply via email to