Hi Sebastian, Thanks for your prompt response. It's just a sample data set from our database and it may expand up to 6 million ratings. Since the performance was low for a smaller data set, I thought it would be even worse for a larger data set. As per your suggestion, I also applied the same command on 1 million user ratings for approx. 6000 users and got the same performance level.
What is the average running time for the Mahout distributed recommendation job on 1 million ratings? Does it usually take more than 1 minute? Thanks in advance, Warunika On Fri, Jun 6, 2014 at 2:42 PM, Sebastian Schelter <[email protected]> wrote: > You should not use Hadoop for such a tiny dataset. Use the > GenericItemBasedRecommender on a single machine in Java. > > --sebastian > > > On 06/06/2014 11:10 AM, Warunika Ranaweera wrote: > >> Hi, >> >> I am using Mahout's recommenditembased algorithm on a data set with nearly >> 10,000 (implicit) user ratings. This is the command I used: >> *mahout recommenditembased --input ratings.csv --output recommendation >> >> --usersFile users.dat --tempDir temp --similarityClassname >> SIMILARITY_LOGLIKELIHOOD --numRecommendations 3 * >> >> >> Although the output is successfully generated, this process takes nearly 7 >> minutes to produce recommendations for a single user. The Hadoop cluster >> has 8 nodes and the machine on which Mahout is invoked is an AWS EC2 >> c3.2xlarge server. When I tracked the mapreduce jobs, I noticed that more >> than one machine is *not* utilized at a time, and the *recommenditembased* >> >> command takes 9 mapreduce jobs altogether with approx. 45 seconds taken >> per >> job. >> >> Since the performance is too slow for real time recommendations, it would >> be really helpful to know whether I'm missing out any additional commands >> or configurations that enables faster performance. >> >> Thanks, >> Warunikay >> >> >
