Hey Vignesh
You can refer to the book Mahout in Action. It would help you with a
deep knowledge on the available algorithms and choose the best one for your
data set. If your user ids and item ids are numeric then distributed
implementation of item based similarity is straight forward just use the packed
jar avaliabe with mahout and provide the appropriate inputs. The IBM developer
works article is really good. In addition you can refer the following as well
https://cwiki.apache.org/MAHOUT/itembased-collaborative-filtering.html
http://kickstarthadoop.blogspot.com/2011/05/mahout-recommendations-in-distributed.html
Couple of points to take care
- I'm not sure why you choose hive to store the final result. If you are
looking at a distributed database within hadoop eco system it is not hive but
hbase for all your low latency access. You need to go in for such a distributed
database only if the recommendation results are too large,ranging to a few Tera
bytes and not hence not scalable with traditional RDBMs. If the result size is
manageable get it from hdfs and store into rdbms, which would be better choice
for legacy/existing applications to consume.
- On top, if your user ids or item ids in input data set are alphanumeric then
it'd be better to have an input and output formatter wrapping your distributed
recommender. It'd would inturn do the conversions between alphanumeric and
numeric and vice versa on the data consumed and processed data by the
recommender.
Hope it helps!..
Thanks and Regards
Bejoy.K.S
> Date: Thu, 10 Nov 2011 15:16:11 +0100
> Subject: Re: Help needed for Recommendation engine
> From: [email protected]
> To: [email protected]
>
> This new article by Grant Ingersoll is really good to get started in my
> opinion.
>
> http://www.ibm.com/developerworks/java/library/j-mahout-scaling/index.html?ca=drs-
>
> Pascal
>
>
> 2011/11/10 VIGNESH PRAJAPATI <[email protected]>
>
> > Ya ted,
> > i am new in the mahout world,having java knowledge with hadoop ,mahout
> > basics..
> > i indeed want to develop recommender system with kmeans clustering
> > algorithm.
> > So it take little time for generating recommendation of items for
> > millions of users based on their past activity like users rating on
> > the perticular products and total number of product's page visit by
> > users.so,i have three input
> > 1.userid
> > 2.productid
> > 3.rate or number of visits by userid provided at first no.
> >
> > And based on this will provide numbers of item or best item
> > appropriate to user as recommeendation.and after all this information
> > is stored in hadoop
> >
> > -----Original message-----
> > From: Ted Dunning
> > Sent: 10/11/2011, 7:09 pm
> > To: [email protected]
> > Cc: user
> > Subject: Re: Help needed for Recommendation engine
> >
> >
> > Hive is not a database.
> >
> > You should test different algorithms. If you want suggestions you
> > should say a lot more about what you are doing.
> >
> > Sent from my iPhone
> >
> > On Nov 10, 2011, at 5:39, VIGNESH PRAJAPATI <[email protected]> wrote:
> >
> > > So how to start for it which mahout algorithm is sutable for this?
> >