Hi Lukas
Thanks for quick response. It seems I found the problem.
On 2,6,14 worker, errors show:
raph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 WARN [netty-client-worker-1]
org.apache.giraph.comm.netty.handler.ResponseClientHandler:
exceptionCaught: Channel failed with remote address
bespin03c.umiacs.umd.edu/192.168.74.113:30005
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:446)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:871)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:208)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:118)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:485)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Thread.java:745)
I checked with bespin03c.umiacs.umd.edu/192.168.74.113:30005 and it shows:
2015-05-22 05:20:50,028 ERROR [main]
org.apache.giraph.graph.GraphMapper: Caught an unrecoverable exception
waitFor: ExecutionException occurred while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@7328027c
java.lang.IllegalStateException: waitFor: ExecutionException occurred
while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@7328027c
at
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193)
at
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151)
at
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136)
at
org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99)
at
org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233)
at
org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:756)
at
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:335)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.util.concurrent.ExecutionException:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at
org.apache.giraph.utils.ProgressableUtils$FutureWaitable.getResult(ProgressableUtils.java:327)
at
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:187)
... 14 more
So the problem could be only solved by expand the memory of cluster if
I still use default hash way?
Thanks
Hai
Hai Lan, PhD student
[email protected] <[email protected]>
Department of Geographical Science
University of Maryland, College Park
1104 LeFrak Hall
College Park, MD 20742, USA
On Fri, May 22, 2015 at 6:32 AM, Lukas Nalezenec <
[email protected]> wrote:
> On 22.5.2015 12:25, Hai Lan wrote:
>
> Missing chosen workers [Worker(hostname=bespin05.umiacs.umd.edu, MRtaskID=2,
> port=30002), Worker(hostname=bespin04d.umiacs.umd.edu, MRtaskID=6,
> port=30006), Worker(hostname=bespin03a.umiacs.umd.edu, MRtaskID=14,
> port=30014)] on superstep 0
>
>
> Hi,
> See in logs what happened on the missing workers.
> Lukas
>