Re: Extra low speed mahout distribution with hadoop

Maksim Areshkau Thu, 10 May 2012 02:54:32 -0700

On 10.05.12 12:31, "Sean Owen" <[email protected]> wrote:


>Hadoop just about always makes things slower, in terms of total
>resources needed. It adds a lot of overhead; such is the price of
>parallelism. My rule of thumb is that Hadoop-based algorithms will,
>all else equal, take 4x more CPU hours. But of course Hadoop lets you
>distribute.
>
>However, I doubt that's the total explanation here. What did you do in
>2 minutes? I can't believe even the Mahout non-distributed recommender
>would build its model and make recs for *all* users in that time.
>Really?
Nope. I just need a recommendation for one user. But I need this
recommendation quickly.
So As Input for hadoop I have used two files(input.txt - wikipedia
database). Users.txt - files with user id for which I need a
recommendation.
So I need a param to specify that I need a recommendation for one user?
>  Remember that RecommenderJob is computing all recommendations
>for all users. The non-distributed recommender doesn't do anything
>like that until you ask it.
>
>On Thu, May 10, 2012 at 10:29 AM,  <[email protected]> wrote:
>> Hi, I am study mahout by Mahaout in Action Book.
>> I have downloaded wikipedia links database and tried to executed
>> recommendation for it using mahout and hadoop.
>> I have used following command:
>> hadoop jar mahout-core-0.6-job.jar
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output
>>--usersFile
>> input/users.txt --booleanData true -s SIMILARITY_LOGLIKELIHOOD
>>
>> The command took for execution about 4 hours on my Mac Book Pro. At the
>> same time on my book the recommendation without hadoop have required
>>about
>> 2 minuts. So why mahout+hadoop so slow?

Re: Extra low speed mahout distribution with hadoop

Reply via email to