Mahout has single machine and distributed recommenders.


On 06/06/2014 02:31 PM, Warunika Ranaweera wrote:
I agree with your suggestion though. I have already implemented a Java
recommender and it performed better. But, due to scalability problems that
are predicted to occur in the future, we thought of moving to Mahout.
However, it seems like, for now, it's better to go with the single machine
implementation.

Thanks for your suggestions,
Warunika



On Fri, Jun 6, 2014 at 3:36 PM, Sebastian Schelter <[email protected]> wrote:

1M ratings take up something like 20 megabytes. This is a datasize where
it does not make any sense to use Hadoop. Just try the single machine
implementation.

--sebastian




On 06/06/2014 12:01 PM, Warunika Ranaweera wrote:

Hi Sebastian,

Thanks for your prompt response. It's just a sample data set from our
database and it may expand up to 6 million ratings. Since the performance
was low for a smaller data set, I thought it would be even worse for a
larger data set. As per your suggestion, I also applied the same command
on
1 million user ratings for approx. 6000 users and got the same performance
level.

What is the average running time for the Mahout distributed recommendation
job on 1 million ratings? Does it usually take more than 1 minute?

Thanks in advance,
Warunika


On Fri, Jun 6, 2014 at 2:42 PM, Sebastian Schelter <[email protected]>
wrote:

  You should not use Hadoop for such a tiny dataset. Use the
GenericItemBasedRecommender on a single machine in Java.

--sebastian


On 06/06/2014 11:10 AM, Warunika Ranaweera wrote:

  Hi,

I am using Mahout's recommenditembased algorithm on a data set with
nearly
10,000 (implicit) user ratings. This is the command I used:
*mahout recommenditembased --input ratings.csv --output recommendation

--usersFile users.dat --tempDir temp --similarityClassname
SIMILARITY_LOGLIKELIHOOD --numRecommendations 3 *


Although the output is successfully generated, this process takes
nearly 7
minutes to produce recommendations for a single user. The Hadoop cluster
has 8 nodes and the machine on which Mahout is invoked is an AWS EC2
c3.2xlarge server. When I tracked the mapreduce jobs, I noticed that
more
than one machine is *not* utilized at a time, and the
*recommenditembased*

command takes 9 mapreduce jobs altogether with approx. 45 seconds taken
per
job.

Since the performance is too slow for real time recommendations, it
would
be really helpful to know whether I'm missing out any additional
commands
or configurations that enables faster performance.

Thanks,
Warunikay








Reply via email to