Re: Running on Amazon EMR?

Gustavo Enrique Salazar Torres Mon, 11 Nov 2013 08:06:31 -0800

I'm sorry :D, when I said this:

I had to do all those steps you talked about, specially at bootstrap I run
a Bash script stored at s3 like this:


--core-key-value, giraph.zkList=localhost:2181, --mapred-key-value,
mapreduce.job.counters.limit=
1200

translate to:

I had to do all those steps you talked about, specially at bootstrap I run:

--core-key-value, giraph.zkList=localhost:2181, --mapred-key-value,
mapreduce.job.counters.limit=
1200

Cheers
Gustavo




On Mon, Nov 11, 2013 at 1:05 PM, Gustavo Enrique Salazar Torres <
gsala...@ime.usp.br> wrote:

> Hi Rob:
>
> I had to do all those steps you talked about, specially at bootstrap I run
> a Bash script stored at s3 like this:
>
> --core-key-value, giraph.zkList=localhost:2181, --mapred-key-value,
> mapreduce.job.counters.limit=1200
>
> Then at the steps configuration I start by setting up Giraph and Zookeeper
> by calling two Bash scripts (two separate steps):
>
> s3://elasticmapreduce/libs/script-runner/script-runner.jar
> s3://mybucket/install_giraph.sh
> s3://elasticmapreduce/libs/script-runner/script-runner.jar
> s3://mybucket/install_zookeeper.sh
>
> In the case of the install_giraph.sh I do this:
>
> hadoop dfs -copyToLocal s3://mybucket/giraph.tar.gz /home/hadoop
> tar -xzvf /home/hadoop/giraph.tar.gz -C /home/hadoop
>
> and install_zookeeper.sh does this:
>
> hadoop dfs -copyToLocal s3://data.clipesebandas/binaries/zookeeper.tar.gz
> /home/hadoop
> tar -xzvf /home/hadoop/zookeeper.tar.gz -C /home/hadoop
> /home/hadoop/zookeeper/bin/zkServer.sh start
>
> And finally I run my Giraph algorithm in another step like this:
>
> /home/hadoop/giraph.jar org.giraph.MyGraphAlgorithm  /user/hadoop/input_graph,
> /user/hadoop/built_graph  20 1
>
> Perhaps some steps, like Zookeeper configuration, are not needed since
> this configuration is based on Giraph 0.1.
> Hope this helps.
>
> Cheers
> Gustavo
>
>
>
> On Mon, Nov 11, 2013 at 12:43 PM, Rob Vesse <rve...@dotnetrdf.org> wrote:
>
>> Hi All
>>
>> I've been looking around for any documentation about running Giraph on
>> Amazon Elastic Map Reduce (EMR) and didn't turn up anything particularly
>> useful.
>>
>> It looks like the only real requirements to run on EMR are to add
>> Bootstrap actions to the Job Flow configuration to apply the relevant
>> Hadoop configuration settings e.g. increasing max map tasks.  After that it
>> looks like I should just need to use a standard Custom JAR launch step to
>> launch the Giraph Runner with appropriate arguments for my Giraph program.
>>
>> Before I start trying to do this and incurring EC2 costs does anyone have
>> experience of running Giraph applications on EMR that they are willing to
>> share?  Any suggestions, tips, common pitfalls etc I should be aware of?
>>
>> Cheers,
>>
>> Rob
>>
>
>

Re: Running on Amazon EMR?

Reply via email to