I agree with your suggestion though. I have already implemented a Java recommender and it performed better. But, due to scalability problems that are predicted to occur in the future, we thought of moving to Mahout. However, it seems like, for now, it's better to go with the single machine implementation.
Thanks for your suggestions, Warunika On Fri, Jun 6, 2014 at 3:36 PM, Sebastian Schelter <[email protected]> wrote: > 1M ratings take up something like 20 megabytes. This is a datasize where > it does not make any sense to use Hadoop. Just try the single machine > implementation. > > --sebastian > > > > > On 06/06/2014 12:01 PM, Warunika Ranaweera wrote: > >> Hi Sebastian, >> >> Thanks for your prompt response. It's just a sample data set from our >> database and it may expand up to 6 million ratings. Since the performance >> was low for a smaller data set, I thought it would be even worse for a >> larger data set. As per your suggestion, I also applied the same command >> on >> 1 million user ratings for approx. 6000 users and got the same performance >> level. >> >> What is the average running time for the Mahout distributed recommendation >> job on 1 million ratings? Does it usually take more than 1 minute? >> >> Thanks in advance, >> Warunika >> >> >> On Fri, Jun 6, 2014 at 2:42 PM, Sebastian Schelter <[email protected]> >> wrote: >> >> You should not use Hadoop for such a tiny dataset. Use the >>> GenericItemBasedRecommender on a single machine in Java. >>> >>> --sebastian >>> >>> >>> On 06/06/2014 11:10 AM, Warunika Ranaweera wrote: >>> >>> Hi, >>>> >>>> I am using Mahout's recommenditembased algorithm on a data set with >>>> nearly >>>> 10,000 (implicit) user ratings. This is the command I used: >>>> *mahout recommenditembased --input ratings.csv --output recommendation >>>> >>>> --usersFile users.dat --tempDir temp --similarityClassname >>>> SIMILARITY_LOGLIKELIHOOD --numRecommendations 3 * >>>> >>>> >>>> Although the output is successfully generated, this process takes >>>> nearly 7 >>>> minutes to produce recommendations for a single user. The Hadoop cluster >>>> has 8 nodes and the machine on which Mahout is invoked is an AWS EC2 >>>> c3.2xlarge server. When I tracked the mapreduce jobs, I noticed that >>>> more >>>> than one machine is *not* utilized at a time, and the >>>> *recommenditembased* >>>> >>>> command takes 9 mapreduce jobs altogether with approx. 45 seconds taken >>>> per >>>> job. >>>> >>>> Since the performance is too slow for real time recommendations, it >>>> would >>>> be really helpful to know whether I'm missing out any additional >>>> commands >>>> or configurations that enables faster performance. >>>> >>>> Thanks, >>>> Warunikay >>>> >>>> >>>> >>> >> >
