Hi,
We runned a Pagerank benchmark with 120 millions of vertices and
one edge per vertice.
We distributed that on 128 workers.
The loading of the graph is done well.
But, several workers bug at the superstep 0. Any ideas of the
problem ? Thanks
The Error trace :
java.lang.IllegalStateException: flush: Got ExecutionException
at
org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:1102)
at
org.apache.giraph.graph.BspServiceWorker.finishSuperstep(BspServiceWorker.java:968)
at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:613)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:657)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.io.IOException: Call to hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed
on local exception: java.io.IOException: Connection reset by peer
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at
org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:1097)
... 10 more
Caused by: java.lang.RuntimeException: java.io.IOException: Call to
hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed on local exception:
java.io.IOException: Connection reset by peer
at
org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:368)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Call to
hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed on local exception:
java.io.IOException: Connection reset by peer
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy3.putVertexIdMessagesList(Unknown Source)
at
org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:328)
... 6 more
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(Unknown Source)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.write(Unknown Source)
at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.write(Unknown Source)
at java.io.DataOutputStream.write(Unknown Source)
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:782)
at org.apache.hadoop.ipc.Client.call(Client.java:1051)
... 9 more
Best regards,
Nicolas