It depends on a couple factors. First are you developing a product where customers will need the freedom to choose what cloud provider to use, or something in house where you can standardize on one cloud provider (like AWS). And second, do you only need to spin up Hadoop resources? Or do you need other resources on your cloud based cluster like Cassandra, Mongo, SQL-whatever?
If you only need hadoop and can standardize on AWS (who has the best prices) then EMR is definitely the way to go. AWS has a full set of APIs in most languages to allow you to do everything you need. If you need other resources deployed and flexibility in different cloud providers, you'll have to go another route. I really like a combination of jclouds and Whirr. I used this previously to deploy hadoop, cassandra, haproxy, solr, and tomcat clusters, setup all ingress security rules, and run custom bash scripts. And it runs on most cloud providers. The only problem with Whirr is that its development seems to have slowed down in the past year or so, as one of the primary guys has moved onto his own product idea. John On Mon, Mar 4, 2013 at 7:08 AM, Christian Schneider < [email protected]> wrote: > Hi, > what is they best way to realize this. > On our current scenario we need the cluster only for some overnight > processing. > Therefore it would be good to shutdown the cluster overnight and store the > results on s3. > > Could you suggest me some libraries or services for that? Like Whirr? > Or is the Amazons EMR what we need (but the prices are high...)? > > Best Regards, > Christian. > -- Thanks, John C
