i think you should find the actual problem in the logs of another worker.
On Fri, Sep 6, 2013 at 8:06 PM, Bu Xiao <[email protected]> wrote: > Thanks Claudio and Gustavo for your answer. I have another question. I run > my algorithm on a cluster that has 20 nodes. When I specify the number of > workers to be 10 (or more), the algorithms works well and produces the > expected output. But, if the number of workers is less than 10 I get the > following exception in ZooKeeper. > <https://plus.google.com/u/0/101834038373575526108?prsrc=4> > 2013-09-06 10:39:04,313 INFO org.apache.giraph.comm.netty.NettyClient: > connectAllAddresses: Successfully added 0 connections, (0 total connected) > 0 failed, 0 failures total. > 2013-09-06 10:39:04,313 INFO > org.apache.giraph.partition.PartitionBalancer: > balancePartitionsAcrossWorkers: Using algorithm static > 2013-09-06 10:39:04,314 INFO org.apache.giraph.partition.PartitionUtils: > analyzePartitionStats: Vertices - Mean: 200000, Min: Worker(hostname= > node1.cluster.net, MRtaskID=5, port=30005) - 200000, Max: Worker(hostname= > node7.cluster.net, MRtaskID=1, port=30001) - 200000 > 2013-09-06 10:39:04,314 INFO org.apache.giraph.partition.PartitionUtils: > analyzePartitionStats: Edges - Mean: 10019985, Min: Worker(hostname= > node9.cluster.net, MRtaskID=4, port=30004) - 10000354, Max: > Worker(hostname=node5.cluster.net, MRtaskID=2, port=30002) - 10088901 > 2013-09-06 10:39:04,339 INFO org.apache.giraph.master.BspServiceMaster: > barrierOnWorkerList: 0 out of 5 workers finished on superstep 2 on path > /_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir > 2013-09-06 10:39:04,340 INFO org.apache.giraph.master.BspServiceMaster: > barrierOnWorkerList: Waiting on [node8.cluster.net_3, node1.cluster.net_5, > node9.cluster.net_4, node5.cluster.net_2, node7.cluster.net_1] > 2013-09-06 10:40:15,255 INFO > org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server window > metrics MBytes/sec sent = 0, MBytes/sec received = 0, MBytesSent = 0, > MBytesReceived = 0, ave sent req MBytes = 0, ave received req MBytes = 0, > secs waited = 71.241 > 2013-09-06 10:40:15,291 INFO org.apache.giraph.master.BspServiceMaster: > barrierOnWorkerList: 3 out of 5 workers finished on superstep 2 on path > /_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir > 2013-09-06 10:40:15,291 INFO org.apache.giraph.master.BspServiceMaster: > barrierOnWorkerList: Waiting on [node1.cluster.net_5, node7.cluster.net_1] > 2013-09-06 10:40:15,388 INFO org.apache.giraph.master.BspServiceMaster: > aggregateWorkerStats: Aggregation found > (vtx=1000000,finVtx=0,edges=50099927,msgCount=0,msgBytesCount=0,haltComputation=false) > on superstep = 2 > 2013-09-06 10:40:15,394 INFO org.apache.giraph.master.BspServiceMaster: > coordinateSuperstep: Cleaning up old Superstep > /_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/1 > 2013-09-06 10:40:15,531 INFO org.apache.giraph.master.MasterThread: > masterThread: Coordination of superstep 2 took 71.313 seconds ended with > state THIS_SUPERSTEP_DONE and is now on superstep 3 > 2013-09-06 10:40:15,563 INFO org.apache.giraph.comm.netty.NettyClient: > connectAllAddresses: Successfully added 0 connections, (0 total connected) > 0 failed, 0 failures total. > 2013-09-06 10:40:15,563 INFO > org.apache.giraph.partition.PartitionBalancer: > balancePartitionsAcrossWorkers: Using algorithm static > 2013-09-06 10:40:15,564 INFO org.apache.giraph.partition.PartitionUtils: > analyzePartitionStats: Vertices - Mean: 200000, Min: Worker(hostname= > node1.cluster.net, MRtaskID=5, port=30005) - 200000, Max: Worker(hostname= > node7.cluster.net, MRtaskID=1, port=30001) - 200000 > 2013-09-06 10:40:15,564 INFO org.apache.giraph.partition.PartitionUtils: > analyzePartitionStats: Edges - Mean: 10019985, Min: Worker(hostname= > node9.cluster.net, MRtaskID=4, port=30004) - 10000354, Max: > Worker(hostname=node5.cluster.net, MRtaskID=2, port=30002) - 10088901 > 2013-09-06 10:40:15,587 INFO org.apache.giraph.master.BspServiceMaster: > barrierOnWorkerList: 0 out of 5 workers finished on superstep 3 on path > /_hadoopBsp/job_201309060934_0013/_applicationAttemptsDir/0/_superstepDir/3/_workerFinishedDir > 2013-09-06 10:40:15,587 INFO org.apache.giraph.master.BspServiceMaster: > barrierOnWorkerList: Waiting on [node8.cluster.net_3, node1.cluster.net_5, > node9.cluster.net_4, node5.cluster.net_2, node7.cluster.net_1] > 2013-09-06 10:50:18,111 ERROR org.apache.giraph.master.BspServiceMaster: > superstepChosenWorkerAlive: Missing chosen worker Worker(hostname= > node7.cluster.net, MRtaskID=1, port=30001) on superstep 3 > 2013-09-06 10:50:18,111 ERROR org.apache.giraph.master.BspServiceMaster: > superstepChosenWorkerAlive: Missing chosen worker Worker(hostname= > node9.cluster.net, MRtaskID=4, port=30004) on superstep 3 > 2013-09-06 10:50:18,111 INFO org.apache.giraph.master.MasterThread: > masterThread: Coordination of superstep 3 took 602.58 seconds ended with > state WORKER_FAILURE and is now on superstep 3 > 2013-09-06 10:50:18,118 ERROR org.apache.giraph.master.MasterThread: > masterThread: Master algorithm failed with ArrayIndexOutOfBoundsException > java.lang.ArrayIndexOutOfBoundsException: -1 > at > org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1272) > at org.apache.giraph.master.MasterThread.run(MasterThread.java:139) > 2013-09-06 10:50:18,119 FATAL org.apache.giraph.graph.GraphMapper: > uncaughtException: OverrideExceptionHandler on thread > org.apache.giraph.master.MasterThread, msg = > java.lang.ArrayIndexOutOfBoundsException: -1, exiting... > java.lang.IllegalStateException: java.lang.ArrayIndexOutOfBoundsException: > -1 > at org.apache.giraph.master.MasterThread.run(MasterThread.java:185) > Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 > at > org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1272) > at org.apache.giraph.master.MasterThread.run(MasterThread.java:139) > 2013-09-06 10:50:18,122 INFO org.apache.giraph.zk.ZooKeeperManager: run: > Shutdown hook started. > 2013-09-06 10:50:18,122 WARN org.apache.giraph.zk.ZooKeeperManager: > onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper > process. > 2013-09-06 10:50:18,495 INFO org.apache.zookeeper.ClientCnxn: Unable to > read additional data from server sessionid 0x140f459adcd0000, likely server > has closed socket, closing socket connection and attempting reconnect > 2013-09-06 10:50:18,496 INFO org.apache.giraph.zk.ZooKeeperManager: > onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 > typically means killed). > > Thank you. > > > On Fri, Sep 6, 2013 at 3:51 AM, Gustavo Enrique Salazar Torres < > [email protected]> wrote: > >> Hi Bu: >> Until the interface with Gora is available you could use Apache Sqoop to >> import your mysql table into HDFS and then run your Giraph job. >> >> Cheers >> Gustavo >> Em 06/09/2013 04:43, "Claudio Martella" <[email protected]> >> escreveu: >> >> Hi Bu, >>> >>> no, currently we do not have a DBInputFormat. We have an open issue with >>> a google summer of code student working on a GoraInputFormat, which >>> supports also reading from RDBMs through Gora. However, if/when it will get >>> it, it will not provide a rich semantic as DBInputFormat, e.g. you'll be >>> able to only provide scan-like/range queries, instead of ANY query like >>> DBInputFormat. >>> >>> I think that creating an DB[Vertex|Edge]InputFormat starting from the >>> hadoop DBInputFormat should not be too hard and could prove to be a very >>> useful contribution. If you think about providing an implementation, I can >>> provide guidance. >>> >>> Best, >>> Claudio >>> >>> >>> On Fri, Sep 6, 2013 at 1:45 AM, Bu Xiao <[email protected]> wrote: >>> >>>> Hi Girapher, >>>> >>>> I am currently working on algorithm that requires reading the >>>> vertices from MySQL table and not from HDFS. I thought that there has to be >>>> a way of reading data from SQL table since Giraph is built on top of >>>> Hadoop. But I do not seem to figure this part out. Do you have a class >>>> similar to the DBInputFormat in Hadoop? Thank you very much for your help. >>>> >>>> >>>> >>> >>> >>> -- >>> Claudio Martella >>> [email protected] >>> >> > -- Claudio Martella [email protected]
