I would try the -Dgiraph.useNetty=true to use Netty rather than Jetty. My guess, however, is that you likely had a error (likely memory) that caused a task to fail, causing a connect reset. We try to assign the port numbers based on the task id so that you can work backwards to debug. This task failed because it couldn't connect to a worker with port 30069. I would look at map task 30069 and see why it failed, etc.

Avery

On 7/20/12 7:56 AM, Nicolas DUGUE wrote:
Hi,

We runned a Pagerank benchmark with 120 millions of vertices and one edge per vertice.
    We distributed that on 128 workers.

    The loading of the graph is done well.
But, several workers bug at the superstep 0. Any ideas of the problem ? Thanks
    The Error trace :

java.lang.IllegalStateException: flush: Got ExecutionException
at org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:1102) at org.apache.giraph.graph.BspServiceWorker.finishSuperstep(BspServiceWorker.java:968)
    at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:613)
    at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:657)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Call to hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed on local exception: java.io.IOException: Connection reset by peer
    at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
    at java.util.concurrent.FutureTask.get(Unknown Source)
at org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:1097)
    ... 10 more
Caused by: java.lang.RuntimeException: java.io.IOException: Call to hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed on local exception: java.io.IOException: Connection reset by peer at org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:368) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Call to hadoop-0.univ-orleans.fr/172.18.1.200:30069 failed on local exception: java.io.IOException: Connection reset by peer
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
    at org.apache.hadoop.ipc.Client.call(Client.java:1075)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy3.putVertexIdMessagesList(Unknown Source)
at org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:328)
    ... 6 more
Caused by: java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcher.write0(Native Method)
    at sun.nio.ch.SocketDispatcher.write(Unknown Source)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
    at sun.nio.ch.IOUtil.write(Unknown Source)
    at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
    at java.io.BufferedOutputStream.write(Unknown Source)
    at java.io.DataOutputStream.write(Unknown Source)
    at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:782)
    at org.apache.hadoop.ipc.Client.call(Client.java:1051)
    ... 9 more

Best regards,
Nicolas

  • RPC Error Nicolas DUGUE
    • Re: RPC Error Avery Ching

Reply via email to