[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437008#comment-16437008
 ] 

Hudson commented on HBASE-20338:


Results for branch master
[build #297 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/297/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/297//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/297//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/297//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436830#comment-16436830
 ] 

Hudson commented on HBASE-20338:


Results for branch branch-2
[build #606 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/606/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/606//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/606//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/606//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436825#comment-16436825
 ] 

Hudson commented on HBASE-20338:


Results for branch branch-2.0
[build #168 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/168/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/168//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/168//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/168//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-12 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436450#comment-16436450
 ] 

Wei-Chiu Chuang commented on HBASE-20338:
-

Thanks [~mdrob], [~uagashe] and [~chia7712]!

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-12 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436274#comment-16436274
 ] 

Umesh Agashe commented on HBASE-20338:
--

Thanks [~mdrob]! call to getLogFiles() will be after the changes here.

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-12 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436262#comment-16436262
 ] 

Mike Drob commented on HBASE-20338:
---

Patch looks good but it doesn't apply anymore. I think HBASE-20330 screwed it 
up. Is the sleep supposed to be before or after the call to getLogFiles() now?

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-10 Thread Chia-Ping Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432204#comment-16432204
 ] 

Chia-Ping Tsai commented on HBASE-20338:


{quote}Chia-Ping if you want to push it go ahead otherwise I will look later.
{quote}
This issue doesn't provoke the end of world so it still have time to wait for 
your review. :)

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-10 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432139#comment-16432139
 ] 

Mike Drob commented on HBASE-20338:
---

I'm out this morning, Chia-Ping if you want to push it go ahead otherwise I 
will look later.

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-10 Thread Chia-Ping Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431993#comment-16431993
 ] 

Chia-Ping Tsai commented on HBASE-20338:


patch 005 LGTM

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at 
> 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-09 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431343#comment-16431343
 ] 

Umesh Agashe commented on HBASE-20338:
--

[~mdrob], can you review and commit this patch? Thanks!

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431321#comment-16431321
 ] 

Hadoop QA commented on HBASE-20338:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
53s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
57s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
14m 52s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
35s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
10s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-20338 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918259/HBASE-20338.master.005.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 8eb49cf37150 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 17f930c4d6 |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12362/testReport/ |
| Max. process+thread count | 279 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12362/console |
| 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-09 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431270#comment-16431270
 ] 

Umesh Agashe commented on HBASE-20338:
--

+1 for 005 patch. Thanks [~jojochuang]!

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at 
> 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-09 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431210#comment-16431210
 ] 

Wei-Chiu Chuang commented on HBASE-20338:
-

... apparently I can't write code. Uploade rev 005.

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch, HBASE-20338.master.005.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-09 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431174#comment-16431174
 ] 

Umesh Agashe commented on HBASE-20338:
--

[~jojochuang], with patch 004, looks like we sleep first time only?

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch, 
> HBASE-20338.master.004.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at 
> 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430986#comment-16430986
 ] 

Hadoop QA commented on HBASE-20338:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
11s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  2m  
1s{color} | {color:red} The patch causes 10 errors with Hadoop v2.6.5. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  4m  
5s{color} | {color:red} The patch causes 10 errors with Hadoop v2.7.4. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  6m 
22s{color} | {color:red} The patch causes 10 errors with Hadoop v3.0.0. {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
5s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 8s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-20338 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918203/HBASE-20338.master.004.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 30e49733dd17 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 17f930c4d6 |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
| 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-09 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430867#comment-16430867
 ] 

Wei-Chiu Chuang commented on HBASE-20338:
-

{quote}we may want to consider sleeping immediately after while() statement 
(except first time). i.e Sequence sleep(), getLogFiles(), rollWriter() instead 
of getLogFiles(), sleep(), rollWriter().
{quote}
Ah, got what you're say. I thought because there's already a sleep after 
recover lease, it should be sufficient. But if recover lease fails for some 
reason it could get us into the same situation. Will upload a new patch for 
that.

Thank you.

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-07 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429382#comment-16429382
 ] 

Umesh Agashe commented on HBASE-20338:
--

Master can't be functional/ active if it fails to read procedure WAL. I think 
its okay to keep trying with reasonable delay as in this case for 300 days and 
then defaulting to current behavior (without this patch) of trying continuously 
without delay.

But if we have new version of the patch then looking at patch 005 for Jira 
HBASE-20330, we may want to consider sleeping immediately after while() 
statement (except first time). i.e Sequence sleep(), getLogFiles(), 
rollWriter() instead of getLogFiles(), sleep(), rollWriter().

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-06 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429216#comment-16429216
 ] 

Mike Drob commented on HBASE-20338:
---

That's a known issue that pops up due to... unknown reasons. There's a jira for 
it but I can't find it at the moment.

Do we bound the number of retries? If not, then we can get into a pathological 
case where we've retried so much that the counter overflows and we stop 
sleeping b/c it is negative. Using the default sleep length, that would happen 
after the error running for... ~300 days? Probably not a concern.

[~uagashe] - if you think it's ok then I'll commit this later, I'm not sure if 
it makes more sense to use a boolean to flag no sleep first time and sleep 
subsequent iterations.

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-06 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429205#comment-16429205
 ] 

Wei-Chiu Chuang commented on HBASE-20338:
-

Hmm. I didn't change any pom files. Why did it fail?
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-install-plugin:2.5.2:install (default-install) 
on project hbase-thrift: Failed to install metadata 
org.apache.hbase:hbase-thrift:3.0.0-SNAPSHOT/maven-metadata.xml: Could not 
parse metadata 
/home/jenkins/.m2/repository/org/apache/hbase/hbase-thrift/3.0.0-SNAPSHOT/maven-metadata-local.xml:
 in epilog non whitespace content is not allowed but got / (position: END_TAG 
seen ...\n/... @25:2) -> [Help 1] {[ERROR]{noformat}

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428881#comment-16428881
 ] 

Hadoop QA commented on HBASE-20338:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
14s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 3s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  6m  
1s{color} | {color:red} The patch causes 10 errors with Hadoop v2.6.5. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  8m  
0s{color} | {color:red} The patch causes 10 errors with Hadoop v2.7.4. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 
17s{color} | {color:red} The patch causes 10 errors with Hadoop v3.0.0. {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
33s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 7s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-20338 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917892/HBASE-20338.master.003.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux fafb6004f1fc 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8014c5c3ac |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
| 

[jira] [Commented] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps for retrying rollWriter()

2018-04-06 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428829#comment-16428829
 ] 

Wei-Chiu Chuang commented on HBASE-20338:
-

(Posted in the wrong jira)

Updated the patch to address [~uagashe]'s comments, and upated the Jira summary 
too.

> WALProcedureStore#recoverLease() should have fixed sleeps for retrying 
> rollWriter()
> ---
>
> Key: HBASE-20338
> URL: https://issues.apache.org/jira/browse/HBASE-20338
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-2
>Reporter: Umesh Agashe
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-20338.master.001.patch, 
> HBASE-20338.master.002.patch, HBASE-20338.master.003.patch
>
>
> In our internal testing we observed that logs are getting flooded due to 
> continuous loop in WALProcedureStore#recoverLease():
> {code}
>   while (isRunning()) {
> // Get Log-MaxID and recover lease on old logs
> try {
>   flushLogId = initOldLogs(oldLogs);
> } catch (FileNotFoundException e) {
>   LOG.warn("Someone else is active and deleted logs. retrying.", e);
>   oldLogs = getLogFiles();
>   continue;
> }
> // Create new state-log
> if (!rollWriter(flushLogId + 1)) {
>   // someone else has already created this log
>   LOG.debug("Someone else has already created log " + flushLogId);
>   continue;
> }
> {code}
> rollWriter() fails to create a new file. Error messages in HDFS namenode logs 
> around same time:
> {code}
> INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 172.31.121.196:38508 Call#3141 Retry#0
> java.io.IOException: Exeption while contacting value generator
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
> at 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at