Hi,

We runned a Pagerank benchmark with 120 millions of vertices and one edge per vertice.
    We distributed that on 128 workers.

    The loading of the graph is done well.
But, several workers bug at the superstep 0. Any ideas of the problem ? Thanks
    The Error trace :

java.lang.IllegalStateException: flush: Got ExecutionException
        at 
org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:1102)
        at 
org.apache.giraph.graph.BspServiceWorker.finishSuperstep(BspServiceWorker.java:968)
        at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:613)
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:657)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Unknown Source)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
java.io.IOException: Call to hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed 
on local exception: java.io.IOException: Connection reset by peer
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at 
org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:1097)
        ... 10 more
Caused by: java.lang.RuntimeException: java.io.IOException: Call to 
hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed on local exception: 
java.io.IOException: Connection reset by peer
        at 
org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:368)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Call to 
hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed on local exception: 
java.io.IOException: Connection reset by peer
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
        at org.apache.hadoop.ipc.Client.call(Client.java:1075)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at $Proxy3.putVertexIdMessagesList(Unknown Source)
        at 
org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:328)
        ... 6 more
Caused by: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(Unknown Source)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
        at sun.nio.ch.IOUtil.write(Unknown Source)
        at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
        at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(Unknown Source)
        at java.io.DataOutputStream.write(Unknown Source)
        at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:782)
        at org.apache.hadoop.ipc.Client.call(Client.java:1051)
        ... 9 more

Best regards,
Nicolas

Reply via email to