Re: Mahout Amazon EMR usage cost

Sean Owen Sun, 02 Dec 2012 14:26:15 -0800

My guess is: less than $10. Little enough that I wouldn't worry about
it. But I have not tried it directly.

You just have 10K items, so it ought to be relatively quick to find
similar items for them. You will want to look at ItemSimilarityJob.
Setting some parameters like --maxSimilaritiesPerRow and --threshold
will be important to speed. On EMR, I suggest using 2-4 m1.xlarge
instances and using spot instances. For the master, use on-demand and
use m1.large. The usual Hadoop tunings like mapred.reduce.tasks matter
a lot too. When set up well it should be quite economical.

Since you mentioned implicit feedback and EMR, you may benefit from a
look at Myrrix (http://myrrix.com). It can compute recommendations or
item-item similarities, on Hadoop / EMR if desired, and is built for
this implicit feedback model. The scale is no problem. It's
pre-packaged and tuned to run by itself, so, might save you time and
money versus trying to configure, run and tune it from scratch
(http://myrrix.com/purchase-computation-layer/).  For what it may be
worth I do have one recent benchmark on EMR
(http://myrrix.com/example-wikipedia-links/) computing a model over
13M Wikipedia articles for about $7.

On Sun, Dec 2, 2012 at 9:12 PM, Koobas <[email protected]> wrote:
> I was wondering if somebody could give me a rough estimate of the cost of
> running Mahout on Amazon's Elastic MapReduce for a specific problem.
> I am working with a common case of implicit feedback.
> I have a simple, boolean input, i.e., user-item pairs (userID, itemID).
> I would like to find 50 nearest neighbors for each item.
> I have 10M users, 10K items, and 500M records.
> If anybody has any ballpark idea of the kind of cost it would take to solve
> the problem using EMR, I would appreciate it very much.
> Jacob

Re: Mahout Amazon EMR usage cost

Reply via email to