Re: Mahout Amazon EMR usage cost

Koobas Sun, 02 Dec 2012 19:07:28 -0800

Thank you very much.
The pointer to Myrrix is a very useful piece of information.
Myrrix, however, relies on an iterative sparse matrix factorization to do
PCA.
I want to produce Amazon-like recommendations.
I.e., "70% of users who bough this, also bought that."
So, I specifically want the direct kNN algorithm.
Any clue what Mahout + Hadoop can deliver on that one?
Thanks,
Jacob



On Sun, Dec 2, 2012 at 5:25 PM, Sean Owen <[email protected]> wrote:

> My guess is: less than $10. Little enough that I wouldn't worry about
> it. But I have not tried it directly.
>
> You just have 10K items, so it ought to be relatively quick to find
> similar items for them. You will want to look at ItemSimilarityJob.
> Setting some parameters like --maxSimilaritiesPerRow and --threshold
> will be important to speed. On EMR, I suggest using 2-4 m1.xlarge
> instances and using spot instances. For the master, use on-demand and
> use m1.large. The usual Hadoop tunings like mapred.reduce.tasks matter
> a lot too. When set up well it should be quite economical.
>
> Since you mentioned implicit feedback and EMR, you may benefit from a
> look at Myrrix (http://myrrix.com). It can compute recommendations or
> item-item similarities, on Hadoop / EMR if desired, and is built for
> this implicit feedback model. The scale is no problem. It's
> pre-packaged and tuned to run by itself, so, might save you time and
> money versus trying to configure, run and tune it from scratch
> (http://myrrix.com/purchase-computation-layer/).  For what it may be
> worth I do have one recent benchmark on EMR
> (http://myrrix.com/example-wikipedia-links/) computing a model over
> 13M Wikipedia articles for about $7.
>
> On Sun, Dec 2, 2012 at 9:12 PM, Koobas <[email protected]> wrote:
> > I was wondering if somebody could give me a rough estimate of the cost of
> > running Mahout on Amazon's Elastic MapReduce for a specific problem.
> > I am working with a common case of implicit feedback.
> > I have a simple, boolean input, i.e., user-item pairs (userID, itemID).
> > I would like to find 50 nearest neighbors for each item.
> > I have 10M users, 10K items, and 500M records.
> > If anybody has any ballpark idea of the kind of cost it would take to
> solve
> > the problem using EMR, I would appreciate it very much.
> > Jacob
>

Re: Mahout Amazon EMR usage cost

Reply via email to