Thanks for your answers! @Jonathan: Yes! i looked AWS EMR already, but i was trying to compare the benefits of using it against building from scratch on a EC2 instance (i found tutorials using all of this options alike) @jay vyas: Thanks jay, but i need to use AWS, and using that doesn't seem the right option, i'm trying to keep things simple, because i don't have much experience with this tecnologies.
Any other answer will be welcome! Bye! Jose 2015-10-19 12:37 GMT-03:00 jay vyas <[email protected]>: > Also, ASF BigTop packages hadoop for you. > > You can always grab our releases > http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/ > > We package pig, spark, hive, hbase, .... > > Its not had to set up a bigtop build server, as we have dockerized the > packaging of both RPM and Deb packages, and you can experiment locally with > this stuff using the vagrant recipes. > > > > On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina < > [email protected]> wrote: > >> Hey Jose >> >> Have you looked at Amazon emr ( elastic map reduce) where I work we have >> used it and when you provision the emr instance you can use custom jars >> like the one you mentioned. >> >> In terms of storage you can use either hdfs, if you are going to keep a >> persistent cluster. If not you can store your data in an Amazon s3 bucket. >> >> Documentation for emr is really good. At the time when we did this and >> this was at the beginning of this year and they supported Hadoop 2.6. >> >> In my honest opinion you are giving yourself a lot of extra work for >> nothing to get us in Hadoop. Try out emr with temporary cluster and go from >> there. I managed to tool up and learn how to work with emr in a week. >> >> Sent from my iPhone >> >> On 19 Oct 2015, at 02:10, José Luis Larroque <[email protected]> >> wrote: >> >> Thanks for your answer Anders. >> >> -The amount of data that i'm going to manipulate it's like the wikipedia >> (i will use a dump) >> - I already have the basics of hadoop (i hope), i have a local multinode >> cluster setup and i already executed some algorithms. >> - Because the amount of data its important, i believe that i should use >> several nodes. >> >> Maybe another option to considerate should be that i'm running Giraph on >> top of the selected hadoop distribution/EC2. >> >> Bye! >> Jose >> >> 2015-10-18 18:53 GMT-03:00 Anders Nielsen < >> [email protected]>: >> >>> Dear Jose, >>> >>> It will help people answer your question if you specify your goals : >>> >>> -If you do it to learn how to USE a running Hadoop then go for one of >>> the prebuilt distributions (Amazon or MapR) >>> -If you do it to learn more about the setting up and administrating >>> Hadoop then you are better off setting everything up from scratch on EC2. >>> -Do you need to run on many nodes or just a 1 node to test some >>> Mapreduce scripts on a small data set? >>> >>> Regards, >>> >>> Anders >>> >>> >>> >>> >>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque < >>> [email protected]> wrote: >>> >>>> Hi all ! >>>> >>>> I started to use hadoop with aws, and a big question appears in front >>>> of me! >>>> >>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried >>>> some trivial examples, and before moving forward i have one question. >>>> >>>> What is the better option for using Hadoop on AWS? >>>> - Build it from scratch on a EC2 instance >>>> - Use MapR distribution of Hadoop >>>> - Use Amazon distribution of Hadoop >>>> >>>> Sorry if my question is too broad. >>>> >>>> Bye! >>>> Jose >>>> >>>> >>>> >>>> >>>> >>> >> > > > -- > jay vyas >
