[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2018-04-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450821#comment-16450821
 ] 

Hudson commented on HDFS-11741:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14057/])
HDFS-11741. Long running balancer may fail due to expired (xyao: rev 
cb622bc619a8897e1f433c388586d83791b1cb23)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenSecretManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestKeyManager.java


> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, 
> HDFS-11741.003.patch, HDFS-11741.004.patch, HDFS-11741.005.patch, 
> HDFS-11741.06.patch, HDFS-11741.07.patch, HDFS-11741.08.patch, 
> HDFS-11741.branch-2.01.patch, block keys.png
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-07-11 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083223#comment-16083223
 ] 

Wei-Chiu Chuang commented on HDFS-11741:


Good point. Thanks for reminder.

Pushed the commit to branch-2.7.
There was a very trivial conflict due to HDFS-8103 refactory.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
>
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-07-11 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081949#comment-16081949
 ] 

Brahma Reddy Battula commented on HDFS-11741:
-

[~jojochuang] nice finding. As HDFS-9804 committed to branch-2.7, this jira 
also should goto branch-2.7.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-06-01 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033803#comment-16033803
 ] 

Wei-Chiu Chuang commented on HDFS-11741:


Thanks [~xiaochen] and [~yzhangal] for pushing the patch to the finish line.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-06-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033737#comment-16033737
 ] 

Hudson commented on HDFS-11741:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11813 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11813/])
HDFS-11741. Long running balancer may fail due to expired (xiao: rev 
6a3fc685a98718742c351ed6625dc7a4dee55e77)
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestKeyManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenSecretManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java


> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-06-01 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033709#comment-16033709
 ] 

Xiao Chen commented on HDFS-11741:
--

Compiled and ran {{TestBlockToken}} & {{TestKeyManager}} locally on branch-2, 
passed.
Ran the failed tests reported by pre-commit on trunk, passed.

Committed this to trunk, branch-2, branch-2.8. Thanks [~jojochuang] for 
reporting and fixing the issue, and [~andrew.wang] [~yzhangal] for reviews!

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-06-01 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033506#comment-16033506
 ] 

Xiao Chen commented on HDFS-11741:
--

Turns out YETUS-515 is the same as HADOOP-14474. Commented there to see if we 
can unblock branch-2 soon.
Will manually compile and run related tests if it's not done by end of today, 
if no objections.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-06-01 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033444#comment-16033444
 ] 

Xiao Chen commented on HDFS-11741:
--

INFRA-14261 is fixed, but YETUS-515 surfaced

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033424#comment-16033424
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
18s{color} | {color:red} Docker failed to build yetus/hadoop:8515d35. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870697/HDFS-11741.branch-2.01.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19730/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033304#comment-16033304
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  8m 
10s{color} | {color:red} Docker failed to build yetus/hadoop:8515d35. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870697/HDFS-11741.branch-2.01.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19727/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-31 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032432#comment-16032432
 ] 

Xiao Chen commented on HDFS-11741:
--

... and precommits are having problems, filed INFRA-14261

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-31 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032410#comment-16032410
 ] 

Xiao Chen commented on HDFS-11741:
--

Was waiting on the branch-2 pre-commit. Just checked jenkins and seems none. 
Kicked off https://builds.apache.org/job/PreCommit-HDFS-Build/19714

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-31 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032316#comment-16032316
 ] 

Yongjun Zhang commented on HDFS-11741:
--

Thanks [~xiaochen].

I'm +1 on both the trunk and branch-2 version. Please do run the tests failed 
in trunk manually to see them succeed, they do look unrelated to me though. And 
go ahead commit the patch please.





> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-31 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032244#comment-16032244
 ] 

Xiao Chen commented on HDFS-11741:
--

Test failures look unrelated. Attaching a branch-2 patch due to conflicts.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031805#comment-16031805
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}101m 11s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870598/HDFS-11741.08.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 5aa3dff26374 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1543d0f |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19703/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19703/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19703/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> 

[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031692#comment-16031692
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}102m 13s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m 32s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870593/HDFS-11741.08.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d70ef5e36f96 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1543d0f |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19700/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19700/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19700/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: 

[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-31 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031619#comment-16031619
 ] 

Yongjun Zhang commented on HDFS-11741:
--

Thanks [~xiaochen], +1 pending jenkins.


> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, 
> HDFS-11741.08.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030974#comment-16030974
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 14s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}119m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870527/HDFS-11741.07.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4f5408dd9642 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 
16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 4b4a652 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19689/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19689/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19689/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS

[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-31 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030741#comment-16030741
 ] 

Yongjun Zhang commented on HDFS-11741:
--

Thanks [~xiaochen] for the updated patch.

Suggest to make the following comment change, +1 after that.

{code}
if (encryptionKey == null ||
encryptionKey.expiryDate < timer.now()) {
  // Encryption Key (EK) is generated from Block Key (BK), but its
  // expiryDate is solely based on tokenLifetime.
  // Once EK is expired, we need to generate a new one using the current
  // BK. Retired BK is kept for (keyUpdateInterval + tokenLifetime)
  // before removal.
  // See BlockTokenSecretManager for details.
  LOG.debug("Generating new data encryption key because current key"
  + " expired on {}.", encryptionKey.expiryDate);
  encryptionKey = blockTokenSecretManager.generateDataEncryptionKey();
}
return encryptionKey;
{code}

to

{code}
if (encryptionKey == null ||
encryptionKey.expiryDate < timer.now()) {
  // Encryption Key (EK) is generated from Block Key (BK). 
  // Check if EK is expired here, and generate a new one using the 
current
  // BK if so, otherwise continue to use the previously generated EK.
  //
  // It's important to make sure that when EK is not expired, the BK 
used to
  // generate the EK  is not expired and removed, because the same BK
  // will be used to re-generate the EK by BlockTokenSecretManager.
  //
  // The current implementation ensure that when an EK is not expired 
(even if
  // it's close to expiration), the BK that's  used to generate it 
still has has at least
  //  "key update interval" of life time before the BK gets expired and 
removed.
  // See BlockTokenSecretManager for details.
  // 
  LOG.debug("Generating new data encryption key because current key"
  + " expired on {}.", encryptionKey.expiryDate);
  encryptionKey = blockTokenSecretManager.generateDataEncryptionKey();
}
return encryptionKey;
{code}


> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> 

[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030649#comment-16030649
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 23 unchanged - 0 fixed = 24 total (was 23) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
40s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 58s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
19s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 90m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Possible null pointer dereference of KeyManager.encryptionKey in 
org.apache.hadoop.hdfs.server.balancer.KeyManager.newDataEncryptionKey()  
Dereferenced at KeyManager.java:KeyManager.encryptionKey in 
org.apache.hadoop.hdfs.server.balancer.KeyManager.newDataEncryptionKey()  
Dereferenced at KeyManager.java:[line 139] |
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.server.balancer.TestKeyManager |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870500/HDFS-11741.06.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 564b5eebee40 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 4b4a652 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-30 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030148#comment-16030148
 ] 

Xiao Chen commented on HDFS-11741:
--

Looked again at this one, and I think {{encryptionKey.expiryDate < 
timer.now()}} should be OK.

Found the graph isn't exactly accurate though, I think the timeline of the 
block key goes like this:
generate --- (Tk) ---> becomes current --- (Tk (not Tl)) ---> retiring --- 
(Tk + Tl) ---> expire (removed)

And the Encryption Key expires at Tl.

So with the buffer of (Tk) before block key removal, I think we're safe to only 
compare Encryption Key's expiry as you said. :)

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-26 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026536#comment-16026536
 ] 

Xiao Chen commented on HDFS-11741:
--

Thanks for revving Wei-Chiu, good analysis!

As talked offline I think generating new DEK would be sufficient.

Prefer the {{encryptionKey.expiryDate - keyUpdateInterval * 3 / 4 < timer.now() 
}} route to prevent TOCTOU as Andrew pointed out earlier.

Nits:
- {{LOG.debug("Getting new encryption token from NN");}} IIUC this is local
- Please remove the space change in {{TestBalancerWithEncryptedTransfer}}
- I think the test case in {{TestKeyManager}} needs updating after the 3/4 
interval change - new DEK not generated based on expiry, but actually on BK's 
update interval. Maybe we can choose different updateInterval and tokenLifetime 
to differentiate it in the test.
- Let's use a safer test timeout to reduce false positives due to infra.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, 
> HDFS-11741.005.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025781#comment-16025781
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 71 unchanged - 0 fixed = 73 total (was 71) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 41s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.TestEncryptionZones |
| Timed out junit tests | 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869967/HDFS-11741.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 26aa1cb2844b 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 2b5ad48 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19622/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19622/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19622/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19622/console |
| Powered by | 

[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-25 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025577#comment-16025577
 ] 

Wei-Chiu Chuang commented on HDFS-11741:


!block keys.png!
Assuming a block key is generated at t=0. After one key update interval (t=Tk), 
it becomes current key and is used for generating block tokens and data 
encryption key. After token life time (t=Tk+Tl), the key retires, and a new key 
(generated at t=Tk) becomes current. However, this key is still kept in 
BlockTokenSecretManager and can be used for block token verification and 
decrypt data. The key finally expires at (t=2*Tk+2*Tl).

After the fix in my patch, the only way block key expires before DEK expires is 
that balancer's local time drifts by more than one key update interval (that 
is, 10 hours). If there is such a long drift, a lot of other things would 
already not work.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: block keys.png, HDFS-11741.001.patch, 
> HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-24 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023750#comment-16023750
 ] 

Wei-Chiu Chuang commented on HDFS-11741:


Thanks [~xiaochen].
I like your suggestion. I initially just wanted to maintain the parity to 
DFSClient#newDataEncryptionKey. But that's actually not needed: DFSClient does 
not have access to block key, so it has to ask NameNode for DEK. Balancer 
KeyManager has access to block key, so it can generate DEK on its own, no extra 
overhead for NN.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, 
> HDFS-11741.003.patch, HDFS-11741.004.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-24 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023691#comment-16023691
 ] 

Xiao Chen commented on HDFS-11741:
--

Thanks [~jojochuang] for reporting the issue and working on the fix, and others 
for reviewing.

Just want to make sure I understand correctly: the problem is the 
{{KeyManager}} instance in the {{Dispatcher}} uses a version of 
{{encryptionKey}}, which is associated with a {{BlockKey}} that is larger than 
{{2 * keyUpdateInterval + tokenLifetime}} old. So the balancer side of 
{{BlockTokenSecretManager}} cannot find that {{BlockKey}}, and this is because 
the {{encryptionKey}} object isn't updated.

If above is correct, can we go with the route to have KM's {{BlockKeyUpdater}} 
(or a new EKUpdater) to update the {{encryptionKey}} periodically (say, 
tokenLifetime / 2, or /4) as well? I think this is more future proof because 
{{KeyManager}} is associated with {{NameNodeConnector}} - it seems dispatcher 
is the only place that retrieves this KM, but I feel the problem exists with 
NNC.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, 
> HDFS-11741.003.patch, HDFS-11741.004.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020326#comment-16020326
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 71 unchanged - 0 fixed = 73 total (was 71) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 51s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 97m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHDFS |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869326/HDFS-11741.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d658dd0b9b9e 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9cab42c |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19545/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19545/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19545/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19545/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically 

[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013246#comment-16013246
 ] 

Yongjun Zhang commented on HDFS-11741:
--

Hi [~jojochuang],

Thanks for your work here. It seems the patch doesn't apply anymore. Would you 
please update it?

Thanks.


> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, 
> HDFS-11741.003.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-09 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003619#comment-16003619
 ] 

Wei-Chiu Chuang commented on HDFS-11741:


[~zhz] [~shahrs87] mind to chime in on this observation?
{quote}
I just realized a client side BlockTokenSecretManager generates 
DataEncryptionKey expiration time using now + token life time. I am not sure if 
that's intended, as I would have assumed the key expiration time equals the 
current BlockKey expiration time (which is determined by NameNode).

So it is entirely possible that balancer has an unexpired DataEncryptionKey, 
corresponding to an expired BlockKey. When it talks to the other side, the 
expired BlockKey would fail the connection. Therefore my rev 01 patch would not 
fix all the problems because of this mismatch.
{quote}

Thanks!

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, 
> HDFS-11741.003.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995913#comment-15995913
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
36s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 70 unchanged - 1 fixed = 73 total (was 71) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12866274/HDFS-11741.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a0846454202a 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / fd5cb2c |
| Default Java | 1.8.0_121 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19307/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19307/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19307/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19307/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19307/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was 

[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995578#comment-15995578
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
39s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 33s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 22 unchanged - 1 fixed = 25 total (was 23) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 36s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}116m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.namenode.TestMetadataVersionOutput |
|   | hadoop.hdfs.server.namenode.TestStartup |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12866234/HDFS-11741.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 928953f4eb07 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d4631e4 |
| Default Java | 1.8.0_121 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19301/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19301/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19301/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19301/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-03 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995341#comment-15995341
 ] 

Wei-Chiu Chuang commented on HDFS-11741:


Hi Andrew, thanks for the review!

I just realized a client side BlockTokenSecretManager generates 
DataEncryptionKey expiration time using now + token life time. I am not sure if 
that's intended, as I would have assumed the key expiration time equals the 
current BlockKey expiration time (which is determined by NameNode).

So it is entirely possible that balancer has an unexpired DataEncryptionKey, 
corresponding to an expired BlockKey. When it talks to the other side, the 
expired BlockKey would fail the connection. Therefore my rev 01 patch would fix 
all the problems because of this mismatch.

There are two potential fixes:
* Change BlockTokenSecretManager so that DEK expiration is based on current 
BlockKey expiration.
* Change Balancer to catch InvalidEncryptionKeyException, generate a new DEK 
and repeat the connection.

I feel the first fix is the right one. But it changes every participant in 
HDFS, so want to double check here.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11741.001.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-02 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993858#comment-15993858
 ] 

Andrew Wang commented on HDFS-11741:


Hi Wei-chiu, patch looks good overall, few quick questions:

* For other tokens, we renew before the token is expired, e.g. after half the 
token lifetime has elapsed. This handles clock skew and TOCTOU issues. Should 
we do this here too?
* Is it possible to write a unit test using a FakeTimer rather than using 
Thread.sleep?
* Test is using the JUnit 3 assert, please use JUnit 4's asserts instead.

> Long running balancer may fail due to expired DataEncryptionKey
> ---
>
> Key: HDFS-11741
> URL: https://issues.apache.org/jira/browse/HDFS-11741
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
> Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. 
> Balancer login using keytab
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-11741.001.patch
>
>
> We found a long running balancer may fail despite using keytab, because 
> KeyManager returns expired DataEncryptionKey, and it throws the following 
> exception:
> {noformat}
> 2017-04-30 05:03:58,661 WARN  [pool-1464-thread-10] balancer.Dispatcher 
> (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with 
> size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through 
> 10.0.0.134:50010
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=1005215027) doesn't exist. Current key: 1005215030
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This bug is similar in nature to HDFS-10609. While balancer KeyManager 
> actively synchronizes itself with NameNode w.r.t block keys, it does not 
> update DataEncryptionKey accordingly.
> In a specific cluster, with Kerberos ticket life time 10 hours, and default 
> block token expiration/life time 10 hours, a long running balancer failed 
> after 20~30 hours.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey

2017-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993731#comment-15993731
 ] 

Hadoop QA commented on HDFS-11741:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m  
1s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  
3s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}113m  1s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.namenode.TestStartup |
|   | hadoop.hdfs.server.namenode.TestMetadataVersionOutput |
|   | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
| Timed out junit tests | 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-11741 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12866019/HDFS-11741.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 1fc5f3e01f5c 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / dcc292d |
| Default Java | 1.8.0_121 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19274/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19274/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19274/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs