Hey Jose Have you looked at Amazon emr ( elastic map reduce) where I work we have used it and when you provision the emr instance you can use custom jars like the one you mentioned.
In terms of storage you can use either hdfs, if you are going to keep a persistent cluster. If not you can store your data in an Amazon s3 bucket. Documentation for emr is really good. At the time when we did this and this was at the beginning of this year and they supported Hadoop 2.6. In my honest opinion you are giving yourself a lot of extra work for nothing to get us in Hadoop. Try out emr with temporary cluster and go from there. I managed to tool up and learn how to work with emr in a week. Sent from my iPhone > On 19 Oct 2015, at 02:10, José Luis Larroque <[email protected]> wrote: > > Thanks for your answer Anders. > > -The amount of data that i'm going to manipulate it's like the wikipedia (i > will use a dump) > - I already have the basics of hadoop (i hope), i have a local multinode > cluster setup and i already executed some algorithms. > - Because the amount of data its important, i believe that i should use > several nodes. > > Maybe another option to considerate should be that i'm running Giraph on top > of the selected hadoop distribution/EC2. > > Bye! > Jose > > 2015-10-18 18:53 GMT-03:00 Anders Nielsen <[email protected]>: >> Dear Jose, >> >> It will help people answer your question if you specify your goals : >> >> -If you do it to learn how to USE a running Hadoop then go for one of the >> prebuilt distributions (Amazon or MapR) >> -If you do it to learn more about the setting up and administrating Hadoop >> then you are better off setting everything up from scratch on EC2. >> -Do you need to run on many nodes or just a 1 node to test some Mapreduce >> scripts on a small data set? >> >> Regards, >> >> Anders >> >> >> >> >>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque >>> <[email protected]> wrote: >>> Hi all ! >>> >>> I started to use hadoop with aws, and a big question appears in front of me! >>> >>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried >>> some trivial examples, and before moving forward i have one question. >>> >>> What is the better option for using Hadoop on AWS? >>> - Build it from scratch on a EC2 instance >>> - Use MapR distribution of Hadoop >>> - Use Amazon distribution of Hadoop >>> >>> Sorry if my question is too broad. >>> >>> Bye! >>> Jose >
