I would try the -Dgiraph.useNetty=true to use Netty rather than Jetty.
My guess, however, is that you likely had a error (likely memory) that
caused a task to fail, causing a connect reset. We try to assign the
port numbers based on the task id so that you can work backwards to
debug. This task failed because it couldn't connect to a worker with
port 30069. I would look at map task 30069 and see why it failed, etc.
Avery
On 7/20/12 7:56 AM, Nicolas DUGUE wrote:
Hi,
We runned a Pagerank benchmark with 120 millions of vertices and
one edge per vertice.
We distributed that on 128 workers.
The loading of the graph is done well.
But, several workers bug at the superstep 0. Any ideas of the
problem ? Thanks
The Error trace :
java.lang.IllegalStateException: flush: Got ExecutionException
at
org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:1102)
at
org.apache.giraph.graph.BspServiceWorker.finishSuperstep(BspServiceWorker.java:968)
at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:613)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:657)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: java.io.IOException: Call to
hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed on local exception:
java.io.IOException: Connection reset by peer
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at
org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:1097)
... 10 more
Caused by: java.lang.RuntimeException: java.io.IOException: Call to
hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed on local exception:
java.io.IOException: Connection reset by peer
at
org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:368)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Call to
hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed on local exception:
java.io.IOException: Connection reset by peer
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy3.putVertexIdMessagesList(Unknown Source)
at
org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:328)
... 6 more
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(Unknown Source)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.write(Unknown Source)
at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.write(Unknown Source)
at java.io.DataOutputStream.write(Unknown Source)
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:782)
at org.apache.hadoop.ipc.Client.call(Client.java:1051)
... 9 more
Best regards,
Nicolas