Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

Ken Krugler Sat, 25 Jun 2011 12:14:28 -0700

On Jun 25, 2011, at 9:15am, Ted Dunning wrote:

> I have had best results with somewhat beefier machines because you pay less
> VM overhead.

Definitely matches my experience.

For example, with m1.small instances the I/O performance is dreadful.

So we typically run with m1.large, and spot pricing (for all slaves, not the 
master, as Sean notes).

> Typical Hadoop configuration advice lately is 4GB per core and 1 disk
> spindle per two cores.  For higher performance systems like MapR, the number
> of spindles can go up.

The 4GB per core is a bit higher than what I typically use.

E.g. most configs oversubscribe cores by 50% (12 cores => 18 total map + reduce 
slots), and 2GB per task is plenty - often you can use much less.

Though I know of one config where a well known Hadoop consulting company set it 
up with 40 mappers and 4 reducers for a 24 core box (12 real cores). Depends on 
the target use case.

-- Ken

> On Sat, Jun 25, 2011 at 2:21 AM, Sean Owen <[email protected]> wrote:
> 
>> I think EMR is well worth using. I just think you do want to throw more,
>> and
>> smaller, machines at the task than you imagine. I used the 'small' instance
>> but you might get away with a fleet of micro instances even. And do most
>> certainly request spot instances for your workers (but pay full rate for
>> your master to ensure it's not killed). It stays reasonably economical this
>> way, even if I wouldn't call this "dirt cheap".
>> 
>> On Sat, Jun 25, 2011 at 9:06 AM, Chris Schilling <[email protected]>
>> wrote:
>> 
>>> Hey Sean,
>>> 
>>> Just curious about your AWS comment.  I am only in very early testing
>>> phases with AWS EMR.  So, would you say that you generally recommend
>>> manually setting an EC2 cluster to run Mahout over EMR?  I guess the
>>> question is: for those of us without the resources to setup an in-house
>>> hadoop cluster, what is the best setup we can hope to acheive?
>>> 
>>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

Reply via email to