I have had best results with somewhat beefier machines because you pay less VM overhead.
Typical Hadoop configuration advice lately is 4GB per core and 1 disk spindle per two cores. For higher performance systems like MapR, the number of spindles can go up. On Sat, Jun 25, 2011 at 2:21 AM, Sean Owen <[email protected]> wrote: > I think EMR is well worth using. I just think you do want to throw more, > and > smaller, machines at the task than you imagine. I used the 'small' instance > but you might get away with a fleet of micro instances even. And do most > certainly request spot instances for your workers (but pay full rate for > your master to ensure it's not killed). It stays reasonably economical this > way, even if I wouldn't call this "dirt cheap". > > On Sat, Jun 25, 2011 at 9:06 AM, Chris Schilling <[email protected]> > wrote: > > > Hey Sean, > > > > Just curious about your AWS comment. I am only in very early testing > > phases with AWS EMR. So, would you say that you generally recommend > > manually setting an EC2 cluster to run Mahout over EMR? I guess the > > question is: for those of us without the resources to setup an in-house > > hadoop cluster, what is the best setup we can hope to acheive? > > > > >
