Just a followup note .. my Master is timing out because my other mappers are taking too much time to finish ...
2013-07-24 22:13:18,874 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@5f337f6f 2013-07-24 22:13:18,895 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 62377ms for sessionid 0x240129e443a011a, closing socket connection and attempting reconnect 2013-07-24 22:13:18,895 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 63825ms for sessionid 0x240129e443a0118, closing socket connection and attempting reconnect 2013-07-24 22:13:19,123 WARN org.apache.giraph.bsp.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null 2013-07-24 22:13:19,137 WARN org.apache.giraph.bsp.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null 2013-07-24 22:13:19,491 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server XXXXXX:2181 2013-07-24 22:13:19,492 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to XXXXXX:2181, initiating session 2013-07-24 22:13:19,546 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x240129e443a011a has expired, closing socket connection 2013-07-24 22:13:19,549 WARN org.apache.giraph.bsp.BspService: process: Got unknown null path event WatchedEvent state:Expired type:None path:null 2013-07-24 22:13:19,549 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2013-07-24 22:13:20,045 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server XXXXXX:2181 2013-07-24 22:13:20,046 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to XXXXXX:2181, initiating session 2013-07-24 22:13:20,056 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x240129e443a0118 has expired, closing socket connection 2013-07-24 22:13:20,056 WARN org.apache.giraph.bsp.BspService: process: Got unknown null path event WatchedEvent state:Expired type:None path:null 2013-07-24 22:13:20,056 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2013-07-24 22:13:20,169 ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with IllegalStateException java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.bsp.BspService.getJobState(BspService.java:676) at org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:835) at org.apache.giraph.master.MasterThread.run(MasterThread.java:97) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /_hadoopBsp/job_201307241738_0004/_masterJobState at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) at org.apache.giraph.bsp.BspService.getJobState(BspService.java:667) ... 2 more 2013-07-24 22:13:20,314 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.IllegalStateException: Failed to create job state path due to KeeperException, exiting... java.lang.IllegalStateException: java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.master.MasterThread.run(MasterThread.java:180) Caused by: java.lang.IllegalStateException: Failed to create job state path due to KeeperException at org.apache.giraph.bsp.BspService.getJobState(BspService.java:676) at org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:835) at org.apache.giraph.master.MasterThread.run(MasterThread.java:97) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /_hadoopBsp/job_201307241738_0004/_masterJobState at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) at org.apache.giraph.bsp.BspService.getJobState(BspService.java:667) On Wed, Jul 24, 2013 at 6:30 PM, Puneet Jain <[email protected]>wrote: > Hello: > > I am struggling to make PageRank run on 75M nodes with each node having > 1-75000 edges. > > I am constantly getting zookeeper timeouts irrespective of my > configuration. > > - I have 21 node hadoop cluster, each node having 4 cores, 4GB memory. > - Data is stored in hbase as adjacency matrix > - I am running 21 regionservers, 3 zookeepers. > - I am using standard PageRankComputation class, my vertexID is a long. > > I am setting only these parameters: > GiraphConfiguration.SPLIT_MASTER_WORKER.set(giraphConf, false); > GiraphConfiguration.USE_SUPERSTEP_COUNTERS.set(giraphConf, false); > GiraphConfiguration.CHECKPOINT_FREQUENCY.set(giraphConf, 0); > > Most of other configurations are set to default value. > > Thanks > -- > --Puneet > -- --Puneet
