Re: Extra low speed mahout distribution with hadoop

Sean Owen Thu, 10 May 2012 03:06:31 -0700

If you need anything for "one user" and "quickly" then you don't want
to use Hadoop. :) It is extremely inefficient to try to use Hadoop
this way since by running it you are recomputing every single
similarity, then producing one recommendation. At best, use Hadoop to
precompute similarities, then use them to recommend in real-time.


This is really the right way, I think, to use Hadoop with
recommendations. You use Hadoop to compute, offline and periodically,
some underlying model. Then you load that into a server that can
quickly make any recommendation from it at run-time. This is the
architecture I'm building in the (Mahout-based) Myrrix recommender
engine (myrrix.com)

Sean

On Thu, May 10, 2012 at 10:53 AM, Maksim Areshkau
<[email protected]> wrote:
> On 10.05.12 12:31, "Sean Owen" <[email protected]> wrote:
>
>
>>Hadoop just about always makes things slower, in terms of total
>>resources needed. It adds a lot of overhead; such is the price of
>>parallelism. My rule of thumb is that Hadoop-based algorithms will,
>>all else equal, take 4x more CPU hours. But of course Hadoop lets you
>>distribute.
>>
>>However, I doubt that's the total explanation here. What did you do in
>>2 minutes? I can't believe even the Mahout non-distributed recommender
>>would build its model and make recs for *all* users in that time.
>>Really?
> Nope. I just need a recommendation for one user. But I need this
> recommendation quickly.
> So As Input for hadoop I have used two files(input.txt - wikipedia
> database). Users.txt - files with user id for which I need a
> recommendation.
> So I need a param to specify that I need a recommendation for one user?
>>  Remember that RecommenderJob is computing all recommendations
>>for all users. The non-distributed recommender doesn't do anything
>>like that until you ask it.
>>
>>On Thu, May 10, 2012 at 10:29 AM,  <[email protected]> wrote:
>>> Hi, I am study mahout by Mahaout in Action Book.
>>> I have downloaded wikipedia links database and tried to executed
>>> recommendation for it using mahout and hadoop.
>>> I have used following command:
>>> hadoop jar mahout-core-0.6-job.jar
>>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>>> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output
>>>--usersFile
>>> input/users.txt --booleanData true -s SIMILARITY_LOGLIKELIHOOD
>>>
>>> The command took for execution about 4 hours on my Mac Book Pro. At the
>>> same time on my book the recommendation without hadoop have required
>>>about
>>> 2 minuts. So why mahout+hadoop so slow?
>
>

Re: Extra low speed mahout distribution with hadoop

Reply via email to