[jira] [Commented] (HDDS-2047) Datanodes fail to come up after 10 retries in a secure environment

2019-08-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919730#comment-16919730
 ] 

Hudson commented on HDDS-2047:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17206 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17206/])
HDDS-2047. Datanodes fail to come up after 10 retries in a secure env… (github: 
rev ec34cee5e37ca48bf61403655eba8b89dba0ed57)
* (edit) 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java
* (edit) 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
* (edit) hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsUtils.java


> Datanodes fail to come up after 10 retries in a secure environment
> --
>
> Key: HDDS-2047
> URL: https://issues.apache.org/jira/browse/HDDS-2047
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, Security
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:java}
> 10:06:36.585 PMERRORHddsDatanodeService
> Error while storing SCM signed certificate.
> java.net.ConnectException: Call From 
> jmccarthy-ozone-secure-2.vpc.cloudera.com/10.65.50.127 to 
> jmccarthy-ozone-secure-1.vpc.cloudera.com:9961 failed on connection 
> exception: java.net.ConnectException: Connection refused; For more details 
> see:  http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
> at org.apache.hadoop.ipc.Client.call(Client.java:1457)
> at org.apache.hadoop.ipc.Client.call(Client.java:1367)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy15.getDataNodeCertificate(Unknown Source)
> at 
> org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.getDataNodeCertificateChain(SCMSecurityProtocolClientSideTranslatorPB.java:156)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.getSCMSignedCert(HddsDatanodeService.java:278)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.initializeCertificateClient(HddsDatanodeService.java:248)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:211)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:168)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:143)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:70)
> at picocli.CommandLine.execute(CommandLine.java:1173)
> at picocli.CommandLine.access$800(CommandLine.java:141)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
> at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
> at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
> at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:126)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
> at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
> at 

[jira] [Commented] (HDDS-2047) Datanodes fail to come up after 10 retries in a secure environment

2019-08-28 Thread Siyao Meng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918151#comment-16918151
 ] 

Siyao Meng commented on HDDS-2047:
--

Seemingly this could be fixed by letting HDDS DataNodes retry the connection 
indefinitely, like HDFS DataNodes do.

> Datanodes fail to come up after 10 retries in a secure environment
> --
>
> Key: HDDS-2047
> URL: https://issues.apache.org/jira/browse/HDDS-2047
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, Security
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Priority: Major
>
> {code:java}
> 10:06:36.585 PMERRORHddsDatanodeService
> Error while storing SCM signed certificate.
> java.net.ConnectException: Call From 
> jmccarthy-ozone-secure-2.vpc.cloudera.com/10.65.50.127 to 
> jmccarthy-ozone-secure-1.vpc.cloudera.com:9961 failed on connection 
> exception: java.net.ConnectException: Connection refused; For more details 
> see:  http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
> at org.apache.hadoop.ipc.Client.call(Client.java:1457)
> at org.apache.hadoop.ipc.Client.call(Client.java:1367)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy15.getDataNodeCertificate(Unknown Source)
> at 
> org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.getDataNodeCertificateChain(SCMSecurityProtocolClientSideTranslatorPB.java:156)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.getSCMSignedCert(HddsDatanodeService.java:278)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.initializeCertificateClient(HddsDatanodeService.java:248)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:211)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:168)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:143)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:70)
> at picocli.CommandLine.execute(CommandLine.java:1173)
> at picocli.CommandLine.access$800(CommandLine.java:141)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
> at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
> at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
> at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:126)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
> at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
> at org.apache.hadoop.ipc.Client.call(Client.java:1403)
> ... 21 more
> {code}
> Datanodes try to get SCM signed certificate for just 10 times with interval 
> of 1 sec. When SCM takes a little longer to come up, datanodes throw an 
> exception and fail.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org