I am still new but had similar questions and went through a lot of pain getting 
started

If you want to get programming rather than spend time learning how to install, 
configure and administer the Hadoop tools I recommend using Amazon Elastic 
MapReduce.
This will very quickly get you to a stage where you are able to submit and run 
mapreduce jobs (and pig, hive, etc...)

It's a very cheap option for learning the platform, especially if you use the 
Ruby command-line tool which allows you to re-use your Hadoop instances for 
multiple jobs rather than the more expensive default of starting and stopping 
new clusters each time. It's got some pretty decent tutorials although (as with 
everything hadoop it seems) the area is so large that inevitably you'll be 
googling some things or asking questions here

Also, I found the book "Hadoop in Action" very readable and informative, even 
as someone who has only sporadically used Java throughout my career. This 
actually takes you through different use cases based on test data downloadable 
from the web. Only issue is that it's written based on the older (though fully 
supported Hadoop 0.20) API and since it's written for someone with a local 
Hadoop cluster you have a small effort to translate to the Amazon EMR way of 
doing things. Still very useful though

Cheers
Mike

From: John Lilley [mailto:[email protected]]
Sent: 11 January 2013 10:29
To: [email protected]
Subject: Getting started recommendations

We are somewhat new to Hadoop and are looking to run some experiments with 
HDFS, Pig, and HBase.
With that in mind, I have a few questions:
What is the easiest (preferably free) Hadoop distro to get started with?  
Cloudera?
What host OS distro/release is recommended?
What is the easiest environment to get started with?  Amazon EC2?  Is there 
anyone offering virtual/hosted prebuilt Hadoop instances?
Where would we find some "big data" files that people have used for testing 
purposes?
Feel free to RTFM me to the right place ;-)
Thanks, john

Reply via email to