On Wed, Nov 30, 2011 at 9:19 AM, Periya.Data <[email protected]> wrote:
> Hi, > I am exploring options to deploy a small hadoop cluster on EC2. I read > about Whirr and would like to try it out. I have a few questions before I > dive into this: > > > 1. My laptop currently runs Ubuntu 11.10 (Oneiric Ocelot). I am > running Hadoop 0.20.2+923.142 - CDH3u2 on my laptop. Is there a compatible > Whirr for this Hadoop release? I read this: > > http://ashenfad.blogspot.com/2011/01/hadoop-cluster-on-ec2-using-cloudera.html. > > That should be fine. You can get the latest Whirr release from here: http://www.apache.org/dyn/closer.cgi/incubator/whirr/ To start a CDH3u2 Hadoop cluster customise recipes/hadoop-cdh.properties as needed. Make sure you uncomment the following lines: # Uncomment out these lines to run CDH #whirr.hadoop.install-function=install_cdh_hadoop #whirr.hadoop.configure-function=configure_cdh_hadoop > 1. Is it really necessary to have the same Hadoop version running on > my laptop as what the Whirr instance is using? > > Nope. You can use the cluster by ssh-ing into the remote nodes with no extra software on your local machine. > > 1. Is there an example whirr config file that shows how to create an > EC-2 instance with Hadoop 20.2, Hive, Sqoop and Flume? I guess I can > configure Whirr to download and install all the latest hadoop ecosysem > tools and then create a custom AMI out of that first instance. > > We don't support Hive, Sqoop and Flume as Whirr services yet. Having a custom AMI is not really that useful for Whirr - the scripts are designed to work with a vanilla OS install. > > > Please let me know the caveats and other fine points I need to know to use > all the latest packages and Whirr. What should I keep in mind while I begin > to use Whirr? > Before doing more advanced things I recommend that you should take a quick look at the following doc pages: * http://whirr.apache.org/docs/0.6.0/whirr-in-5-minutes.html * http://whirr.apache.org/docs/0.6.0/quick-start-guide.html * http://whirr.apache.org/docs/0.6.0/configuration-guide.html * http://www.oscon.com/oscon2011/public/schedule/detail/19214 Also the following Github repos could be interesting for you: * https://github.com/tomwhite/whirr-service-example (experimental support for flume) * https://github.com/tomwhite/whirr-scm (adds the ability to use Cloudera SCM to setup the cluster) > many thanks, > > PD/ > Feel free to asks any questions. We can assist you as needed. Cheers, Andrei
