I almost always want to pick the exact release of what is installed. I
have been using whirr.hadoop.tarball.url (and similar props) to specify
versions (with the side-effect that I also specify from where it is
downloaded). Building a cluster for interactive, exploratory use might
work fine with obtaining the latest of everything, but I want Whirr to
help me spawn something that I previously characterized so that it has
predictable performance.
I thought that using:
whirr.hadoop.tarball.url=http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz
would ensure that I would get cdh3u1, but since the release of
hadoop-0.20.2-cdh3u2.tar.gz on Oct. 20, I see lots of cdh3u2 components
installed, as seen in the names of jar files. Aside from being untested
for my purposes, this also breaks some of my post-processing scripts for
setting up Hive + HBase.
Is the above property insufficient for specifying exact releases? What
else should I do or not do?
Paul