Hello everyone, I have a few questions regarding running PageRankBenchmark on Yarn (2.0.5-alpha) cluster. I ran PageRankBenchmark with the following command.
===== hadoop jar /export/home/clei/giraph-1.0.0/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.3-alpha-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5000000 -w 3 ===== 1. It seems to me that this is still submitted as a GiraphJob, but not a job submitted from GiraphYarnClient. If so, the PageRankBenchmark are still hosted in mappers rather than containers. Am I correct? If I am correct, how can I actually run the benchmark as a Yarn application? 2. The PageRankBenchmark doesn't consume neither input nor output path from the command line. I was wondering how Giraph generates all 5 million vertices according to the above command (-V 5000000). Moreover, from the log files, it seems that each work tries to load all 5 million vertices at the beginning instead of 1/3 of these vertices. In this case, why each work consumes all inputs instead of only taking a split of the input? It is not the case in the SimpleShortestPath example. Any inputs on the above questions would be greatly appreciated. Regards, Chuan
