PageRankBenchmark on Yarn

Chuan Lei Mon, 08 Jul 2013 15:07:50 -0700

Hello everyone,

I have a few questions regarding running PageRankBenchmark on Yarn
(2.0.5-alpha) cluster. I ran PageRankBenchmark with the following command.


=====
hadoop jar
/export/home/clei/giraph-1.0.0/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.3-alpha-jar-with-dependencies.jar
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5000000 -w 3
=====

1. It seems to me that this is still submitted as a GiraphJob, but not a
job submitted from GiraphYarnClient. If so, the PageRankBenchmark are still
hosted in mappers rather than containers. Am I correct? If I am correct,
how can I actually run the benchmark as a Yarn application?

2. The PageRankBenchmark doesn't consume neither input nor output path from
the command line. I was wondering how Giraph generates all 5 million
vertices according to the above command (-V 5000000). Moreover, from the
log files, it seems that each work tries to load all 5 million vertices at
the beginning instead of 1/3 of these vertices. In this case, why each work
consumes all inputs instead of only taking a split of the input? It is not
the case in the SimpleShortestPath example.

Any inputs on the above questions would be greatly appreciated.

Regards,
Chuan

PageRankBenchmark on Yarn

Reply via email to