Hello: I am struggling to make PageRank run on 75M nodes with each node having 1-75000 edges.
I am constantly getting zookeeper timeouts irrespective of my configuration. - I have 21 node hadoop cluster, each node having 4 cores, 4GB memory. - Data is stored in hbase as adjacency matrix - I am running 21 regionservers, 3 zookeepers. - I am using standard PageRankComputation class, my vertexID is a long. I am setting only these parameters: GiraphConfiguration.SPLIT_MASTER_WORKER.set(giraphConf, false); GiraphConfiguration.USE_SUPERSTEP_COUNTERS.set(giraphConf, false); GiraphConfiguration.CHECKPOINT_FREQUENCY.set(giraphConf, 0); Most of other configurations are set to default value. Thanks -- --Puneet
