[jira] [Comment Edited] (HDDS-1282) TestFailureHandlingByClient causes a jvm exit

Shashikant Banerjee (JIRA) Fri, 12 Apr 2019 06:49:10 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816277#comment-16816277
 ]


Shashikant Banerjee edited comment on HDDS-1282 at 4/12/19 1:48 PM:
--------------------------------------------------------------------

Thanks [~elek], In the latest code, the test fails because of datanode crash 
when the miniOzoneCluster startup.
{code:java}
2019-04-12 19:13:26,593 INFO ratis.XceiverServerRatis 
(XceiverServerRatis.java:start(416)) - Starting XceiverServerRatis 
3ab53731-d087-494c-9378-ee35abffb271 at port 53578
2019-04-12 19:13:26,593 INFO ozone.HddsDatanodeService 
(HddsDatanodeService.java:start(174)) - HddsDatanodeService host:192.168.0.64 
ip:192.168.0.64
2019-04-12 19:13:26,600 INFO impl.RaftServerProxy 
(RaftServerProxy.java:lambda$start$3(299)) - 
3ab53731-d087-494c-9378-ee35abffb271: start RPC server
2019-04-12 19:13:26,605 ERROR server.GrpcService 
(ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
start Grpc server
java.io.IOException: Failed to bind
at 
org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
at 
org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at 
org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
at 
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
at 
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
at 
org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at 
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30){code}
The issue is happening because, we use random ports for datanodes in 
MIniOzoneCluster, where we try to find a free port during set up, but the Ratis 
server starts at a a later time . In the meantime, if some other datanode picks 
up the same port, the datanode crash.

The patch does not address this issue and is outdated.


was (Author: shashikant):
Thanks[~elek], In the latest code, the test fails because of datanode crash 
when the miniOzoneCluster startup.
{code:java}
2019-04-12 19:13:26,593 INFO ratis.XceiverServerRatis 
(XceiverServerRatis.java:start(416)) - Starting XceiverServerRatis 
3ab53731-d087-494c-9378-ee35abffb271 at port 53578
2019-04-12 19:13:26,593 INFO ozone.HddsDatanodeService 
(HddsDatanodeService.java:start(174)) - HddsDatanodeService host:192.168.0.64 
ip:192.168.0.64
2019-04-12 19:13:26,600 INFO impl.RaftServerProxy 
(RaftServerProxy.java:lambda$start$3(299)) - 
3ab53731-d087-494c-9378-ee35abffb271: start RPC server
2019-04-12 19:13:26,605 ERROR server.GrpcService 
(ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
start Grpc server
java.io.IOException: Failed to bind
at 
org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
at 
org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at 
org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
at 
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
at 
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
at 
org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at 
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30){code}
The issue is happening because, we use random ports for datanodes in 
MIniOzoneCluster, where we try to find a free port during set up, but the Ratis 
server starts at a a later time . In the meantime, if some other datanode picks 
up the same port, the datanode crash.

> TestFailureHandlingByClient causes a jvm exit
> ---------------------------------------------
>
>                 Key: HDDS-1282
>                 URL: https://issues.apache.org/jira/browse/HDDS-1282
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: test
>            Reporter: Mukul Kumar Singh
>            Assignee: Shashikant Banerjee
>            Priority: Major
>         Attachments: HDDS-1282.001.patch, 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient-output.txt
>
>
> The test causes jvm exit because the test exits prematurely.
> {code}
> [ERROR] org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
> [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> [ERROR] Command was /bin/sh -c cd 
> /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test && 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/jre/bin/java 
> -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -jar 
> /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test/target/surefire/surefirebooter5405606309417840457.jar
>  
> /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test/target/surefire
>  2019-03-13T23-31-09_018-jvmRun1 surefire5934599060460829594tmp 
> surefire_1202723709650989744795tmp
> [ERROR] Error occurred in starting fork, check output in log
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-1282) TestFailureHandlingByClient causes a jvm exit

Reply via email to