Hi, Lukas, Thank you, but when I tried to apply the patch, I got: 2014.04.07|09:25:47~/giraph/giraph-core/src> git apply --check NettyClient_Timeout.patch error: patch failed: giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java:153 error: giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java: patch does not apply
Could you send me directly the new patched NettyClient.java file? Thanks! Best Regards, Suijian 2014-04-04 17:12 GMT-05:00 Lukas Nalezenec <[email protected]> : > Hi, > > I had similar issue, it was caused by long GC pauses. I patched > NettyClient so when reconnect fails it sleeps for some time before next > try. Patch is enclosed. Let me know if it works for you. > I would try tuning GC. You can also try to use > giraph.waitForRequestsConfirmation and giraph.maxNumberOfOpenRequests . > I hope I am right. > > Regards > Lukas > > > On 4.4.2014 22:49, Suijian Zhou wrote: > > Hi, > I have a zookeeper problem when running a giraph program, the program > will be aborted in superstep 2 as: > 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Opening socket connection to > server compute-0-18.local/10.1.255.236:22181. Will not attempt to > authenticate using SASL (unknown error) > 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Socket connection established > to compute-0-18.local/10.1.255.236:22181, initiating session > 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Session establishment > complete on server compute-0-18.local/10.1.255.236:22181, sessionid = > 0x1452e7c79910009, negotiated timeout = 600000 > ...... > 14/04/04 15:46:08 INFO job.JobProgressTracker: Data from 8 workers - > Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64 > partitions computed; min free memory on worker 3 - 270.37MB, average > 451.21MB > 14/04/04 15:46:13 INFO job.JobProgressTracker: Data from 8 workers - > Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64 > partitions computed; min free memory on worker 6 - 249.25MB, average > 404.02MB > 14/04/04 15:46:16 INFO zookeeper.ClientCnxn: Unable to read additional > data from server sessionid 0x1452e7c79910009, likely server has closed > socket, closing socket connection and attempting reconnect > 14/04/04 15:46:17 INFO zookeeper.ClientCnxn: Opening socket connection to > server compute-0-18.local/10.1.255.236:22181. Will not attempt to > authenticate using SASL (unknown error) > 14/04/04 15:46:17 WARN zookeeper.ClientCnxn: Session 0x1452e7c79910009 for > server null, unexpected error, closing socket connection and attempting > reconnect > java.net.ConnectException: Connection refused > > > Each rerun of the program will lead to another computing node reporting > the same error("Unable to read additional data from server sessionid..."). > > What in superstep 2 are: > if (getSuperstep() == 2) { > for (IntWritable message: messages) { > for (Edge<IntWritable, IntWritable> edge: vertex.getEdges()) { > sendMessage(edge.getTargetVertexId(), message); > //int abc=0; > } > } > } > > Checked that if I replace the line "sendMessage(edge.getTargetVertexId(), > message);" to another meaningless line like "int abc=0;", the program could > be finished successfully. Seems a ZooKeeper problem but this seems comes > with giraph as I did not install ZooKeeper seperately. I tried to modify > parameters in GiraphConstants.java and re-compile giraph, but it seems do > not take any effects as I see in the screen output the parameters were not > changed at all. Any hints? > > Best Regards, > Suijian > > >
