My guess is: less than $10. Little enough that I wouldn't worry about it. But I have not tried it directly.
You just have 10K items, so it ought to be relatively quick to find similar items for them. You will want to look at ItemSimilarityJob. Setting some parameters like --maxSimilaritiesPerRow and --threshold will be important to speed. On EMR, I suggest using 2-4 m1.xlarge instances and using spot instances. For the master, use on-demand and use m1.large. The usual Hadoop tunings like mapred.reduce.tasks matter a lot too. When set up well it should be quite economical. Since you mentioned implicit feedback and EMR, you may benefit from a look at Myrrix (http://myrrix.com). It can compute recommendations or item-item similarities, on Hadoop / EMR if desired, and is built for this implicit feedback model. The scale is no problem. It's pre-packaged and tuned to run by itself, so, might save you time and money versus trying to configure, run and tune it from scratch (http://myrrix.com/purchase-computation-layer/). For what it may be worth I do have one recent benchmark on EMR (http://myrrix.com/example-wikipedia-links/) computing a model over 13M Wikipedia articles for about $7. On Sun, Dec 2, 2012 at 9:12 PM, Koobas <[email protected]> wrote: > I was wondering if somebody could give me a rough estimate of the cost of > running Mahout on Amazon's Elastic MapReduce for a specific problem. > I am working with a common case of implicit feedback. > I have a simple, boolean input, i.e., user-item pairs (userID, itemID). > I would like to find 50 nearest neighbors for each item. > I have 10M users, 10K items, and 500M records. > If anybody has any ballpark idea of the kind of cost it would take to solve > the problem using EMR, I would appreciate it very much. > Jacob
