Hey guys, Here are some thoughts I've been kicking around lately about whirr.
I've been using whirr fairly extensively since 0.4.0. At first my needs started off fairly simple, requiring only a single hadoop cluster. Then things got a bit more complex and I needed three different clusters (hadoop, solr, cassandra), so I started using whirr's API, and built a bit of automation around it. And now my requirements have gotten fairly complex, where I have 7 different kinds of clusters being created, and 3 times that many post cluster launch steps to authorize ingress from one cluster to another, run custom configuration scripts, copy required files to the clusters, etc. And this has brought me to the question, what do you think whirrs roll should be when it comes to complex, interdependent cloud based architecture deployment? Whirr is really good at creating a single cluster of non-dependent resources, meaning its good at creating a cluster of VMs dont require any upstream dependencies in order for it to be used. And this is fine as long as there are no external dependencies. But what about deployment scenarios where there are N different types of clusters, and where the configuration of one cluster is dependent on makeup of a previous cluster? Also, what about other kinds of deployment steps, like configuring custom fire wall rules, or executing custom setup scripts. For example, the scenario that I'm in the process of automating creates the following clusters: hadoop, cassandra, solr, zookeeper, activemq, haproxy, and two different tomcat clusters. Then there are cluster to cluster ingress rules I need to set, as well as a few ip address to cluster rules. But thats not the worst of it. In order to fully configure our tomcat servers for example, I need to know things like the ip addresses of the cassandra, hadoop, solr, and activemq nodes. So I've got custom steps that gather this info and call runScriptOnNodesMatching on the tomcat cluster. Then there are external files that need to get put in certain clusters, like custom solr config and schema files. These I download form a blobstore, again triggered from a script executed by runScriptOnNodesMatching. So in order to fully support complex cloud base deployments there are a set of actions that need to get stitched together to execute is a specified order in order to allow downstream dependencies to get info about up stream deployment actions: launch cluster action, remote script action, cluster ingress action, ip ingress action, file upload action, blob file upload, etc, all hopefully driven by one configuration file that can define the entire set of complex interdependent deployment actions. Thoughts? -- Thanks, John C
