John - I'm really happy to hear that you are using Whirr for a while to deploy complex work flows and it matched your needs.
I think that before going into supporting more complex deployments with multiple clusters we need to do a really good job at deploying and managing a single cluster and there is still an nontrivial amount of work that we need to do for this (e.g. deterministic cluster configuration behavior, good error reporting, add / remove node from running clusters, improved support for setting firewall rules, overall user experience improvements etc.) I also think that this matches the vision of Whirr as a library that you can use to deploy more complex scenarios as part of your application. Thanks, -- Andrei Savu / andreisavu.ro On Thu, Oct 6, 2011 at 12:58 AM, John Conwell <[email protected]> wrote: > Hey guys, > Here are some thoughts I've been kicking around lately about whirr. > > I've been using whirr fairly extensively since 0.4.0. At first my needs > started off fairly simple, requiring only a single hadoop cluster. Then > things got a bit more complex and I needed three different clusters (hadoop, > solr, cassandra), so I started using whirr's API, and built a bit of > automation around it. And now my requirements have gotten fairly complex, > where I have 7 different kinds of clusters being created, and 3 times that > many post cluster launch steps to authorize ingress from one cluster to > another, run custom configuration scripts, copy required files to the > clusters, etc. > > And this has brought me to the question, what do you think whirrs roll > should be when it comes to complex, interdependent cloud based architecture > deployment? Whirr is really good at creating a single cluster of > non-dependent resources, meaning its good at creating a cluster of VMs dont > require any upstream dependencies in order for it to be used. And this is > fine as long as there are no external dependencies. But what about > deployment scenarios where there are N different types of clusters, and > where the configuration of one cluster is dependent on makeup of a previous > cluster? Also, what about other kinds of deployment steps, like configuring > custom fire wall rules, or executing custom setup scripts. > > For example, the scenario that I'm in the process of automating creates the > following clusters: hadoop, cassandra, solr, zookeeper, activemq, haproxy, > and two different tomcat clusters. Then there are cluster to cluster > ingress rules I need to set, as well as a few ip address to cluster rules. > But thats not the worst of it. In order to fully configure our tomcat > servers for example, I need to know things like the ip addresses of the > cassandra, hadoop, solr, and activemq nodes. So I've got custom steps that > gather this info and call runScriptOnNodesMatching on the tomcat cluster. > Then there are external files that need to get put in certain clusters, > like custom solr config and schema files. These I download form a > blobstore, again triggered from a script executed > by runScriptOnNodesMatching. > > So in order to fully support complex cloud base deployments there are a set > of actions that need to get stitched together to execute is a specified > order in order to allow downstream dependencies to get info about up stream > deployment actions: launch cluster action, remote script action, cluster > ingress action, ip ingress action, file upload action, blob file upload, > etc, all hopefully driven by one configuration file that can define the > entire set of complex interdependent deployment actions. > > Thoughts? > > -- > > Thanks, > John C > >
