[jira] [Commented] (HDDS-2047) Datanodes fail to come up after 10 retries in a secure environment
[ https://issues.apache.org/jira/browse/HDDS-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919730#comment-16919730 ] Hudson commented on HDDS-2047: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17206 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17206/]) HDDS-2047. Datanodes fail to come up after 10 retries in a secure env… (github: rev ec34cee5e37ca48bf61403655eba8b89dba0ed57) * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java * (edit) hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java * (edit) hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsUtils.java > Datanodes fail to come up after 10 retries in a secure environment > -- > > Key: HDDS-2047 > URL: https://issues.apache.org/jira/browse/HDDS-2047 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, Security >Affects Versions: 0.4.1 >Reporter: Vivek Ratnavel Subramanian >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 50m > Remaining Estimate: 0h > > {code:java} > 10:06:36.585 PMERRORHddsDatanodeService > Error while storing SCM signed certificate. > java.net.ConnectException: Call From > jmccarthy-ozone-secure-2.vpc.cloudera.com/10.65.50.127 to > jmccarthy-ozone-secure-1.vpc.cloudera.com:9961 failed on connection > exception: java.net.ConnectException: Connection refused; For more details > see: http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy15.getDataNodeCertificate(Unknown Source) > at > org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.getDataNodeCertificateChain(SCMSecurityProtocolClientSideTranslatorPB.java:156) > at > org.apache.hadoop.ozone.HddsDatanodeService.getSCMSignedCert(HddsDatanodeService.java:278) > at > org.apache.hadoop.ozone.HddsDatanodeService.initializeCertificateClient(HddsDatanodeService.java:248) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:211) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:168) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:143) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:70) > at picocli.CommandLine.execute(CommandLine.java:1173) > at picocli.CommandLine.access$800(CommandLine.java:141) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) > at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) > at > org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:126) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794) > at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572) > at
[jira] [Commented] (HDDS-2047) Datanodes fail to come up after 10 retries in a secure environment
[ https://issues.apache.org/jira/browse/HDDS-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918151#comment-16918151 ] Siyao Meng commented on HDDS-2047: -- Seemingly this could be fixed by letting HDDS DataNodes retry the connection indefinitely, like HDFS DataNodes do. > Datanodes fail to come up after 10 retries in a secure environment > -- > > Key: HDDS-2047 > URL: https://issues.apache.org/jira/browse/HDDS-2047 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, Security >Affects Versions: 0.4.1 >Reporter: Vivek Ratnavel Subramanian >Priority: Major > > {code:java} > 10:06:36.585 PMERRORHddsDatanodeService > Error while storing SCM signed certificate. > java.net.ConnectException: Call From > jmccarthy-ozone-secure-2.vpc.cloudera.com/10.65.50.127 to > jmccarthy-ozone-secure-1.vpc.cloudera.com:9961 failed on connection > exception: java.net.ConnectException: Connection refused; For more details > see: http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) > at org.apache.hadoop.ipc.Client.call(Client.java:1457) > at org.apache.hadoop.ipc.Client.call(Client.java:1367) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy15.getDataNodeCertificate(Unknown Source) > at > org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.getDataNodeCertificateChain(SCMSecurityProtocolClientSideTranslatorPB.java:156) > at > org.apache.hadoop.ozone.HddsDatanodeService.getSCMSignedCert(HddsDatanodeService.java:278) > at > org.apache.hadoop.ozone.HddsDatanodeService.initializeCertificateClient(HddsDatanodeService.java:248) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:211) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:168) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:143) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:70) > at picocli.CommandLine.execute(CommandLine.java:1173) > at picocli.CommandLine.access$800(CommandLine.java:141) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) > at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) > at > org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:126) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794) > at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > ... 21 more > {code} > Datanodes try to get SCM signed certificate for just 10 times with interval > of 1 sec. When SCM takes a little longer to come up, datanodes throw an > exception and fail. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org