I suggest you start by trying the ByteArrayPartition and continue with out of core messages and/or graph. Also, make sure the mapper tasks can get enough memory on the heap in the hadoop cluster configuration.
On Thu, Jan 10, 2013 at 9:23 PM, Pradeep Gollakota <[email protected]> wrote: > Hi All, > > I'm trying to run some benchmarks using the PageRankBenchmark tool on my > cluster. However, I'm seeing some scaling issues. > > My cluster has 4 nodes, configured to run 24 map tasks. I'm running the > benchmark with 23 workers. I've been able to get it scale up to 256 million > edges (16m vertices with 16 edges per vertex). However, when I try to scale > higher than that, I've been getting GC Overhead limit exceeded errors. I > tried to modify the PageRankComputation class to try to use object reuse, > but to no avail. > > Does anyone have any thoughts on how I can scale this higher on my cluster? > I'm trying to get to about 50 million vertices with 150 edges per vertex > (7.5 billion edges). > > Thanks > Pradeep -- Claudio Martella [email protected]
