http://my.safaribooksonline.com/book/databases/hadoop/9780596521974
I loved this book. very well defined On Fri, Jan 11, 2013 at 3:22 AM, Michael Forage < [email protected]> wrote: > I am still new but had similar questions and went through a lot of pain > getting started**** > > ** ** > > If you want to get programming rather than spend time learning how to > install, configure and administer the Hadoop tools I recommend using Amazon > Elastic MapReduce.**** > > This will very quickly get you to a stage where you are able to submit and > run mapreduce jobs (and pig, hive, etc…)**** > > ** ** > > It’s a very cheap option for learning the platform, especially if you use > the Ruby command-line tool which allows you to re-use your Hadoop instances > for multiple jobs rather than the more expensive default of starting and > stopping new clusters each time. It’s got some pretty decent tutorials > although (as with everything hadoop it seems) the area is so large that > inevitably you’ll be googling some things or asking questions here**** > > ** ** > > Also, I found the book “Hadoop in Action” very readable and informative, > even as someone who has only sporadically used Java throughout my career. > This actually takes you through different use cases based on test data > downloadable from the web. Only issue is that it’s written based on the > older (though fully supported Hadoop 0.20) API and since it’s written for > someone with a local Hadoop cluster you have a small effort to translate to > the Amazon EMR way of doing things. Still very useful though **** > > ** ** > > Cheers**** > > Mike**** > > ** ** > > *From:* John Lilley [mailto:[email protected]] > *Sent:* 11 January 2013 10:29 > *To:* [email protected] > *Subject:* Getting started recommendations**** > > ** ** > > We are somewhat new to Hadoop and are looking to run some experiments with > HDFS, Pig, and HBase. **** > > With that in mind, I have a few questions:**** > > What is the easiest (preferably free) Hadoop distro to get started with? > Cloudera?**** > > What host OS distro/release is recommended?**** > > What is the easiest environment to get started with? Amazon EC2? Is > there anyone offering virtual/hosted prebuilt Hadoop instances?**** > > Where would we find some “big data” files that people have used for > testing purposes?**** > > Feel free to RTFM me to the right place ;-)**** > > Thanks, john**** > > ** ** > -- Nitin Pawar
