I'm sorry :D, when I said this: I had to do all those steps you talked about, specially at bootstrap I run a Bash script stored at s3 like this:
--core-key-value, giraph.zkList=localhost:2181, --mapred-key-value, mapreduce.job.counters.limit= 1200 translate to: I had to do all those steps you talked about, specially at bootstrap I run: --core-key-value, giraph.zkList=localhost:2181, --mapred-key-value, mapreduce.job.counters.limit= 1200 Cheers Gustavo On Mon, Nov 11, 2013 at 1:05 PM, Gustavo Enrique Salazar Torres < gsala...@ime.usp.br> wrote: > Hi Rob: > > I had to do all those steps you talked about, specially at bootstrap I run > a Bash script stored at s3 like this: > > --core-key-value, giraph.zkList=localhost:2181, --mapred-key-value, > mapreduce.job.counters.limit=1200 > > Then at the steps configuration I start by setting up Giraph and Zookeeper > by calling two Bash scripts (two separate steps): > > s3://elasticmapreduce/libs/script-runner/script-runner.jar > s3://mybucket/install_giraph.sh > s3://elasticmapreduce/libs/script-runner/script-runner.jar > s3://mybucket/install_zookeeper.sh > > In the case of the install_giraph.sh I do this: > > hadoop dfs -copyToLocal s3://mybucket/giraph.tar.gz /home/hadoop > tar -xzvf /home/hadoop/giraph.tar.gz -C /home/hadoop > > and install_zookeeper.sh does this: > > hadoop dfs -copyToLocal s3://data.clipesebandas/binaries/zookeeper.tar.gz > /home/hadoop > tar -xzvf /home/hadoop/zookeeper.tar.gz -C /home/hadoop > /home/hadoop/zookeeper/bin/zkServer.sh start > > And finally I run my Giraph algorithm in another step like this: > > /home/hadoop/giraph.jar org.giraph.MyGraphAlgorithm /user/hadoop/input_graph, > /user/hadoop/built_graph 20 1 > > Perhaps some steps, like Zookeeper configuration, are not needed since > this configuration is based on Giraph 0.1. > Hope this helps. > > Cheers > Gustavo > > > > On Mon, Nov 11, 2013 at 12:43 PM, Rob Vesse <rve...@dotnetrdf.org> wrote: > >> Hi All >> >> I've been looking around for any documentation about running Giraph on >> Amazon Elastic Map Reduce (EMR) and didn't turn up anything particularly >> useful. >> >> It looks like the only real requirements to run on EMR are to add >> Bootstrap actions to the Job Flow configuration to apply the relevant >> Hadoop configuration settings e.g. increasing max map tasks. After that it >> looks like I should just need to use a standard Custom JAR launch step to >> launch the Giraph Runner with appropriate arguments for my Giraph program. >> >> Before I start trying to do this and incurring EC2 costs does anyone have >> experience of running Giraph applications on EMR that they are willing to >> share? Any suggestions, tips, common pitfalls etc I should be aware of? >> >> Cheers, >> >> Rob >> > >