[ https://issues.apache.org/jira/browse/HDDS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816277#comment-16816277 ]
Shashikant Banerjee edited comment on HDDS-1282 at 4/12/19 1:48 PM: -------------------------------------------------------------------- Thanks [~elek], In the latest code, the test fails because of datanode crash when the miniOzoneCluster startup. {code:java} 2019-04-12 19:13:26,593 INFO ratis.XceiverServerRatis (XceiverServerRatis.java:start(416)) - Starting XceiverServerRatis 3ab53731-d087-494c-9378-ee35abffb271 at port 53578 2019-04-12 19:13:26,593 INFO ozone.HddsDatanodeService (HddsDatanodeService.java:start(174)) - HddsDatanodeService host:192.168.0.64 ip:192.168.0.64 2019-04-12 19:13:26,600 INFO impl.RaftServerProxy (RaftServerProxy.java:lambda$start$3(299)) - 3ab53731-d087-494c-9378-ee35abffb271: start RPC server 2019-04-12 19:13:26,605 ERROR server.GrpcService (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to start Grpc server java.io.IOException: Failed to bind at org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) at org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) at org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) at org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) at org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) at org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30){code} The issue is happening because, we use random ports for datanodes in MIniOzoneCluster, where we try to find a free port during set up, but the Ratis server starts at a a later time . In the meantime, if some other datanode picks up the same port, the datanode crash. The patch does not address this issue and is outdated. was (Author: shashikant): Thanks[~elek], In the latest code, the test fails because of datanode crash when the miniOzoneCluster startup. {code:java} 2019-04-12 19:13:26,593 INFO ratis.XceiverServerRatis (XceiverServerRatis.java:start(416)) - Starting XceiverServerRatis 3ab53731-d087-494c-9378-ee35abffb271 at port 53578 2019-04-12 19:13:26,593 INFO ozone.HddsDatanodeService (HddsDatanodeService.java:start(174)) - HddsDatanodeService host:192.168.0.64 ip:192.168.0.64 2019-04-12 19:13:26,600 INFO impl.RaftServerProxy (RaftServerProxy.java:lambda$start$3(299)) - 3ab53731-d087-494c-9378-ee35abffb271: start RPC server 2019-04-12 19:13:26,605 ERROR server.GrpcService (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to start Grpc server java.io.IOException: Failed to bind at org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) at org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) at org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) at org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) at org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) at org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30){code} The issue is happening because, we use random ports for datanodes in MIniOzoneCluster, where we try to find a free port during set up, but the Ratis server starts at a a later time . In the meantime, if some other datanode picks up the same port, the datanode crash. > TestFailureHandlingByClient causes a jvm exit > --------------------------------------------- > > Key: HDDS-1282 > URL: https://issues.apache.org/jira/browse/HDDS-1282 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test > Reporter: Mukul Kumar Singh > Assignee: Shashikant Banerjee > Priority: Major > Attachments: HDDS-1282.001.patch, > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient-output.txt > > > The test causes jvm exit because the test exits prematurely. > {code} > [ERROR] org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient > [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > [ERROR] Command was /bin/sh -c cd > /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test && > /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/jre/bin/java > -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -jar > /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test/target/surefire/surefirebooter5405606309417840457.jar > > /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test/target/surefire > 2019-03-13T23-31-09_018-jvmRun1 surefire5934599060460829594tmp > surefire_1202723709650989744795tmp > [ERROR] Error occurred in starting fork, check output in log > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org