Thanks for your answer Anders. -The amount of data that i'm going to manipulate it's like the wikipedia (i will use a dump) - I already have the basics of hadoop (i hope), i have a local multinode cluster setup and i already executed some algorithms. - Because the amount of data its important, i believe that i should use several nodes.
Maybe another option to considerate should be that i'm running Giraph on top of the selected hadoop distribution/EC2. Bye! Jose 2015-10-18 18:53 GMT-03:00 Anders Nielsen <[email protected]>: > Dear Jose, > > It will help people answer your question if you specify your goals : > > -If you do it to learn how to USE a running Hadoop then go for one of the > prebuilt distributions (Amazon or MapR) > -If you do it to learn more about the setting up and administrating Hadoop > then you are better off setting everything up from scratch on EC2. > -Do you need to run on many nodes or just a 1 node to test some Mapreduce > scripts on a small data set? > > Regards, > > Anders > > > > > On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque < > [email protected]> wrote: > >> Hi all ! >> >> I started to use hadoop with aws, and a big question appears in front of >> me! >> >> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried >> some trivial examples, and before moving forward i have one question. >> >> What is the better option for using Hadoop on AWS? >> - Build it from scratch on a EC2 instance >> - Use MapR distribution of Hadoop >> - Use Amazon distribution of Hadoop >> >> Sorry if my question is too broad. >> >> Bye! >> Jose >> >> >> >> >> >
