Amazon hadoop distribution

José Luis Larroque Fri, 23 Oct 2015 07:41:05 -0700

Thanks for your answers!

@Jonathan: Yes! i looked AWS EMR already, but i was trying to compare the
benefits of using it against building from scratch on a EC2 instance (i
found tutorials using all of this options alike)
@jay vyas: Thanks jay, but i need to use AWS, and using that doesn't seem
the right option, i'm trying to keep things simple, because i don't have
much experience with this tecnologies.


Any other answer will be welcome!

Bye!
Jose

2015-10-19 12:37 GMT-03:00 jay vyas <[email protected]>:

> Also, ASF BigTop packages hadoop for you.
>
> You can always grab our releases
> http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/
>
> We package pig, spark, hive, hbase, ....
>
> Its not had to set up a bigtop build server, as we have dockerized the
> packaging of both RPM and Deb packages, and you can experiment locally with
> this stuff using the vagrant recipes.
>
>
>
> On Mon, Oct 19, 2015 at 6:26 AM, Jonathan Aquilina <
> [email protected]> wrote:
>
>> Hey Jose
>>
>> Have you looked at Amazon emr ( elastic map reduce) where I work we have
>> used it and when you provision the emr instance you can use custom jars
>> like the one you mentioned.
>>
>> In terms of storage you can use either hdfs, if you are going to keep a
>> persistent cluster. If not you can store your data in an Amazon s3 bucket.
>>
>> Documentation for emr is really good. At the time when we did this and
>> this was at the beginning of this year and they supported Hadoop 2.6.
>>
>> In my honest opinion you are giving yourself a lot of extra work for
>> nothing to get us in Hadoop. Try out emr with temporary cluster and go from
>> there. I managed to tool up and learn how to work with emr in a week.
>>
>> Sent from my iPhone
>>
>> On 19 Oct 2015, at 02:10, José Luis Larroque <[email protected]>
>> wrote:
>>
>> Thanks for your answer Anders.
>>
>> -The amount of data that i'm going to manipulate it's like the wikipedia
>> (i will use a dump)
>> - I already have the basics of hadoop (i hope), i have a local multinode
>> cluster setup and i already executed some algorithms.
>> - Because the amount of data its important, i believe that i should use
>> several nodes.
>>
>> Maybe another option to considerate should be that i'm running Giraph on
>> top of the selected hadoop distribution/EC2.
>>
>> Bye!
>> Jose
>>
>> 2015-10-18 18:53 GMT-03:00 Anders Nielsen <
>> [email protected]>:
>>
>>> Dear Jose,
>>>
>>> It will help people answer your question if you specify your goals :
>>>
>>> -If you do it to learn how to USE a running Hadoop then go for one of
>>> the prebuilt distributions (Amazon or MapR)
>>> -If you do it to learn more about the setting up and administrating
>>> Hadoop then you are better off setting everything up from scratch on EC2.
>>> -Do you need to run on many nodes or just a 1 node to test some
>>> Mapreduce scripts on a small data set?
>>>
>>> Regards,
>>>
>>> Anders
>>>
>>>
>>>
>>>
>>> On Sun, Oct 18, 2015 at 10:03 PM, José Luis Larroque <
>>> [email protected]> wrote:
>>>
>>>> Hi all !
>>>>
>>>> I started to use hadoop with aws, and a big question appears in front
>>>> of me!
>>>>
>>>> I'm using a MapR distribution, for hadoop 2.4.0 in AWS. I already tried
>>>> some trivial examples, and before moving forward i have one question.
>>>>
>>>> What is the better option for using Hadoop on AWS?
>>>> - Build it from scratch on a EC2 instance
>>>> - Use MapR distribution of Hadoop
>>>> - Use Amazon distribution of Hadoop
>>>>
>>>> Sorry if my question is too broad.
>>>>
>>>> Bye!
>>>> Jose
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> jay vyas
>

Re: Use of hadoop in AWS - Build it from scratch on a EC2 instance / MapR hadoop distribution / Amazon hadoop distribution

Reply via email to