I have had best results with somewhat beefier machines because you pay less
VM overhead.

Typical Hadoop configuration advice lately is 4GB per core and 1 disk
spindle per two cores.  For higher performance systems like MapR, the number
of spindles can go up.

On Sat, Jun 25, 2011 at 2:21 AM, Sean Owen <[email protected]> wrote:

> I think EMR is well worth using. I just think you do want to throw more,
> and
> smaller, machines at the task than you imagine. I used the 'small' instance
> but you might get away with a fleet of micro instances even. And do most
> certainly request spot instances for your workers (but pay full rate for
> your master to ensure it's not killed). It stays reasonably economical this
> way, even if I wouldn't call this "dirt cheap".
>
> On Sat, Jun 25, 2011 at 9:06 AM, Chris Schilling <[email protected]>
> wrote:
>
> > Hey Sean,
> >
> > Just curious about your AWS comment.  I am only in very early testing
> > phases with AWS EMR.  So, would you say that you generally recommend
> > manually setting an EC2 cluster to run Mahout over EMR?  I guess the
> > question is: for those of us without the resources to setup an in-house
> > hadoop cluster, what is the best setup we can hope to acheive?
> >
> >
>

Reply via email to