Hey Vignesh
          You can refer to the book Mahout in Action. It would help you with a 
deep knowledge on the available algorithms and choose the best one for your 
data set. If your user ids and item ids are numeric then distributed 
implementation of item based similarity is straight forward just use the packed 
jar avaliabe with mahout and provide the appropriate inputs. The IBM developer 
works article is really good. In addition you can refer the following as well
https://cwiki.apache.org/MAHOUT/itembased-collaborative-filtering.html
http://kickstarthadoop.blogspot.com/2011/05/mahout-recommendations-in-distributed.html

Couple of points to take care
- I'm not sure why you choose hive to store the final result. If you are 
looking at a distributed database within hadoop eco system it is not hive but 
hbase for all your low latency access. You need to go in for such a distributed 
database only if the recommendation results are too large,ranging to a few Tera 
bytes and not hence not scalable with traditional RDBMs. If the result size is 
manageable get it from hdfs and store into rdbms, which would be better choice 
for legacy/existing applications to consume.
- On top, if your user ids or item ids in input data set are alphanumeric then 
it'd be better to have an input and output formatter wrapping your distributed 
recommender. It'd would inturn do the conversions between alphanumeric and 
numeric and vice versa on the data consumed and processed data by the 
recommender.

Hope it helps!..

Thanks and Regards
        Bejoy.K.S


> Date: Thu, 10 Nov 2011 15:16:11 +0100
> Subject: Re: Help needed for Recommendation engine
> From: [email protected]
> To: [email protected]
> 
> This new article by Grant Ingersoll is really good to get started in my
> opinion.
> 
> http://www.ibm.com/developerworks/java/library/j-mahout-scaling/index.html?ca=drs-
> 
> Pascal
> 
> 
> 2011/11/10 VIGNESH PRAJAPATI <[email protected]>
> 
> > Ya ted,
> > i am new in the mahout world,having java knowledge with hadoop ,mahout
> > basics..
> > i indeed want to develop recommender system with kmeans clustering
> > algorithm.
> > So it take little time for generating recommendation of items for
> > millions of users based on their past activity like users rating on
> > the perticular products and total number of product's page  visit by
> > users.so,i have three input
> > 1.userid
> > 2.productid
> > 3.rate or number of visits by userid provided at first no.
> >
> > And based on this will provide numbers of item or best item
> > appropriate to user as recommeendation.and after all this information
> > is stored in hadoop
> >
> > -----Original message-----
> > From: Ted Dunning
> > Sent:  10/11/2011, 7:09  pm
> > To: [email protected]
> > Cc: user
> > Subject: Re: Help needed for Recommendation engine
> >
> >
> > Hive is not a database.
> >
> > You should test different algorithms. If you want suggestions you
> > should say a lot more about what you are doing.
> >
> > Sent from my iPhone
> >
> > On Nov 10, 2011, at 5:39, VIGNESH PRAJAPATI <[email protected]> wrote:
> >
> > > So how to start for it which mahout algorithm is sutable for this?
> >
                                          

Reply via email to