> Yes we're going to be running jobs on a continuous basis > > I understand. Managing long running Hadoop clusters in Amazon is tricky due to namenode availability issues and inconsistent network & disk performance. Have you looked at this from a cost perspective? Maybe it's cheaper to buy a bunch of servers for this cluster that needs to be on all the time.
> >> Also, how can I specify ebs volumes for these machines ? >> > > Unfortunately there is no easy way to do this with the current > implementation. Do you want to take the lead on this? > > See https://issues.apache.org/jira/browse/WHIRR-290 > > > > I may not have the bandwidth to ramp up but would appreciate if you could > send me some pointers on getting started ! > We have a wiki page that describes how to build Whirr and contribute changes: https://cwiki.apache.org/confluence/display/WHIRR/How+To+Contribute -- Andrei