Paolo - I am so glad to your part of the community. Thanks for sharing your work with us!
I am dreaming that by the time we get to 1.0.0 Apache Whirr should be a drop-in replacement for Amazon EMR for common scenario and at the same time a strong foundation for building more complicated workflows that involve multiple services (including custom ones) and work in a similar fashion on different clouds. > But, the cluster size issue can be a big problem in terms of adoption and > in my opinion it should be addressed (if at all possible). > > I agree that this is an important requirement. There is also some work happening in jclouds for this: http://www.jclouds.org/documentation/reference/pool-design > I hope we are going to be able to get this in for 0.8.0. > > Ack. > It's now on the roadmap for 0.7.0: http://s.apache.org/whirr-0.7.0-roadmap BTW it would be great if you want to help with some of the remaining issues. > > Indeed, I was thinking to do the opposite: use twice or more m1.small. > My MapReduce jobs are typically very simple and do not require a lot of > RAM. > I am aware that this might not be the right thing to do... but I am > curious and > I want to experience it myself. IO might be poor ... I know. > > You need testing to answer these questions. AFAIK for Hadoop commodity hardware means systems with two processor each with 4 cores, plenty of RAM and fast disks + fast networking. > So, far I've used Whirr for Hadoop clusters only, but I am really happy to > see > that there is support for Cassandra, HBase, ElasticSearch and ZooKeeper. > I might use these as well in a not too distant future. > > Great! > Paolo > > [1] https://github.com/castagna/tdbloader3 > Chers, Andrei
