[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450821#comment-16450821 ] Hudson commented on HDFS-11741: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14057 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14057/]) HDFS-11741. Long running balancer may fail due to expired (xyao: rev cb622bc619a8897e1f433c388586d83791b1cb23) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenSecretManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestKeyManager.java > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2 > > Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, > HDFS-11741.003.patch, HDFS-11741.004.patch, HDFS-11741.005.patch, > HDFS-11741.06.patch, HDFS-11741.07.patch, HDFS-11741.08.patch, > HDFS-11741.branch-2.01.patch, block keys.png > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083223#comment-16083223 ] Wei-Chiu Chuang commented on HDFS-11741: Good point. Thanks for reminder. Pushed the commit to branch-2.7. There was a very trivial conflict due to HDFS-8103 refactory. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2 > > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081949#comment-16081949 ] Brahma Reddy Battula commented on HDFS-11741: - [~jojochuang] nice finding. As HDFS-9804 committed to branch-2.7, this jira also should goto branch-2.7. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033803#comment-16033803 ] Wei-Chiu Chuang commented on HDFS-11741: Thanks [~xiaochen] and [~yzhangal] for pushing the patch to the finish line. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033737#comment-16033737 ] Hudson commented on HDFS-11741: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11813 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11813/]) HDFS-11741. Long running balancer may fail due to expired (xiao: rev 6a3fc685a98718742c351ed6625dc7a4dee55e77) * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestKeyManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenSecretManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/KeyManager.java > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033709#comment-16033709 ] Xiao Chen commented on HDFS-11741: -- Compiled and ran {{TestBlockToken}} & {{TestKeyManager}} locally on branch-2, passed. Ran the failed tests reported by pre-commit on trunk, passed. Committed this to trunk, branch-2, branch-2.8. Thanks [~jojochuang] for reporting and fixing the issue, and [~andrew.wang] [~yzhangal] for reviews! > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033506#comment-16033506 ] Xiao Chen commented on HDFS-11741: -- Turns out YETUS-515 is the same as HADOOP-14474. Commented there to see if we can unblock branch-2 soon. Will manually compile and run related tests if it's not done by end of today, if no objections. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033444#comment-16033444 ] Xiao Chen commented on HDFS-11741: -- INFRA-14261 is fixed, but YETUS-515 surfaced > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033424#comment-16033424 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 18s{color} | {color:red} Docker failed to build yetus/hadoop:8515d35. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870697/HDFS-11741.branch-2.01.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19730/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033304#comment-16033304 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 8m 10s{color} | {color:red} Docker failed to build yetus/hadoop:8515d35. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870697/HDFS-11741.branch-2.01.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19727/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032432#comment-16032432 ] Xiao Chen commented on HDFS-11741: -- ... and precommits are having problems, filed INFRA-14261 > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032410#comment-16032410 ] Xiao Chen commented on HDFS-11741: -- Was waiting on the branch-2 pre-commit. Just checked jenkins and seems none. Kicked off https://builds.apache.org/job/PreCommit-HDFS-Build/19714 > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032316#comment-16032316 ] Yongjun Zhang commented on HDFS-11741: -- Thanks [~xiaochen]. I'm +1 on both the trunk and branch-2 version. Please do run the tests failed in trunk manually to see them succeed, they do look unrelated to me though. And go ahead commit the patch please. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032244#comment-16032244 ] Xiao Chen commented on HDFS-11741: -- Test failures look unrelated. Attaching a branch-2 patch due to conflicts. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch, HDFS-11741.branch-2.01.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031805#comment-16031805 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}101m 11s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}133m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870598/HDFS-11741.08.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 5aa3dff26374 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1543d0f | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19703/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19703/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19703/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 >
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031692#comment-16031692 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}102m 13s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy | | | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870593/HDFS-11741.08.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux d70ef5e36f96 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1543d0f | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19700/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19700/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19700/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key:
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031619#comment-16031619 ] Yongjun Zhang commented on HDFS-11741: -- Thanks [~xiaochen], +1 pending jenkins. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch, > HDFS-11741.08.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030974#comment-16030974 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 14s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}119m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870527/HDFS-11741.07.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 4f5408dd9642 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4b4a652 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19689/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19689/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19689/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030741#comment-16030741 ] Yongjun Zhang commented on HDFS-11741: -- Thanks [~xiaochen] for the updated patch. Suggest to make the following comment change, +1 after that. {code} if (encryptionKey == null || encryptionKey.expiryDate < timer.now()) { // Encryption Key (EK) is generated from Block Key (BK), but its // expiryDate is solely based on tokenLifetime. // Once EK is expired, we need to generate a new one using the current // BK. Retired BK is kept for (keyUpdateInterval + tokenLifetime) // before removal. // See BlockTokenSecretManager for details. LOG.debug("Generating new data encryption key because current key" + " expired on {}.", encryptionKey.expiryDate); encryptionKey = blockTokenSecretManager.generateDataEncryptionKey(); } return encryptionKey; {code} to {code} if (encryptionKey == null || encryptionKey.expiryDate < timer.now()) { // Encryption Key (EK) is generated from Block Key (BK). // Check if EK is expired here, and generate a new one using the current // BK if so, otherwise continue to use the previously generated EK. // // It's important to make sure that when EK is not expired, the BK used to // generate the EK is not expired and removed, because the same BK // will be used to re-generate the EK by BlockTokenSecretManager. // // The current implementation ensure that when an EK is not expired (even if // it's close to expiration), the BK that's used to generate it still has has at least // "key update interval" of life time before the BK gets expired and removed. // See BlockTokenSecretManager for details. // LOG.debug("Generating new data encryption key because current key" + " expired on {}.", encryptionKey.expiryDate); encryptionKey = blockTokenSecretManager.generateDataEncryptionKey(); } return encryptionKey; {code} > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch, HDFS-11741.06.patch, HDFS-11741.07.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030649#comment-16030649 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 29s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 23 unchanged - 0 fixed = 24 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 40s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 58s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 90m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Possible null pointer dereference of KeyManager.encryptionKey in org.apache.hadoop.hdfs.server.balancer.KeyManager.newDataEncryptionKey() Dereferenced at KeyManager.java:KeyManager.encryptionKey in org.apache.hadoop.hdfs.server.balancer.KeyManager.newDataEncryptionKey() Dereferenced at KeyManager.java:[line 139] | | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | | | hadoop.hdfs.server.balancer.TestKeyManager | | | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication | | | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870500/HDFS-11741.06.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 564b5eebee40 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4b4a652 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030148#comment-16030148 ] Xiao Chen commented on HDFS-11741: -- Looked again at this one, and I think {{encryptionKey.expiryDate < timer.now()}} should be OK. Found the graph isn't exactly accurate though, I think the timeline of the block key goes like this: generate --- (Tk) ---> becomes current --- (Tk (not Tl)) ---> retiring --- (Tk + Tl) ---> expire (removed) And the Encryption Key expires at Tl. So with the buffer of (Tk) before block key removal, I think we're safe to only compare Encryption Key's expiry as you said. :) > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026536#comment-16026536 ] Xiao Chen commented on HDFS-11741: -- Thanks for revving Wei-Chiu, good analysis! As talked offline I think generating new DEK would be sufficient. Prefer the {{encryptionKey.expiryDate - keyUpdateInterval * 3 / 4 < timer.now() }} route to prevent TOCTOU as Andrew pointed out earlier. Nits: - {{LOG.debug("Getting new encryption token from NN");}} IIUC this is local - Please remove the space change in {{TestBalancerWithEncryptedTransfer}} - I think the test case in {{TestKeyManager}} needs updating after the 3/4 interval change - new DEK not generated based on expiry, but actually on BK's update interval. Maybe we can choose different updateInterval and tokenLifetime to differentiate it in the test. - Let's use a safer test timeout to reduce false positives due to infra. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch, > HDFS-11741.005.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025781#comment-16025781 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 71 unchanged - 0 fixed = 73 total (was 71) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 41s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}105m 31s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.TestEncryptionZones | | Timed out junit tests | org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869967/HDFS-11741.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 26aa1cb2844b 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2b5ad48 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/19622/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19622/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19622/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19622/console | | Powered by |
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025577#comment-16025577 ] Wei-Chiu Chuang commented on HDFS-11741: !block keys.png! Assuming a block key is generated at t=0. After one key update interval (t=Tk), it becomes current key and is used for generating block tokens and data encryption key. After token life time (t=Tk+Tl), the key retires, and a new key (generated at t=Tk) becomes current. However, this key is still kept in BlockTokenSecretManager and can be used for block token verification and decrypt data. The key finally expires at (t=2*Tk+2*Tl). After the fix in my patch, the only way block key expires before DEK expires is that balancer's local time drifts by more than one key update interval (that is, 10 hours). If there is such a long drift, a lot of other things would already not work. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: block keys.png, HDFS-11741.001.patch, > HDFS-11741.002.patch, HDFS-11741.003.patch, HDFS-11741.004.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023750#comment-16023750 ] Wei-Chiu Chuang commented on HDFS-11741: Thanks [~xiaochen]. I like your suggestion. I initially just wanted to maintain the parity to DFSClient#newDataEncryptionKey. But that's actually not needed: DFSClient does not have access to block key, so it has to ask NameNode for DEK. Balancer KeyManager has access to block key, so it can generate DEK on its own, no extra overhead for NN. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, > HDFS-11741.003.patch, HDFS-11741.004.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023691#comment-16023691 ] Xiao Chen commented on HDFS-11741: -- Thanks [~jojochuang] for reporting the issue and working on the fix, and others for reviewing. Just want to make sure I understand correctly: the problem is the {{KeyManager}} instance in the {{Dispatcher}} uses a version of {{encryptionKey}}, which is associated with a {{BlockKey}} that is larger than {{2 * keyUpdateInterval + tokenLifetime}} old. So the balancer side of {{BlockTokenSecretManager}} cannot find that {{BlockKey}}, and this is because the {{encryptionKey}} object isn't updated. If above is correct, can we go with the route to have KM's {{BlockKeyUpdater}} (or a new EKUpdater) to update the {{encryptionKey}} periodically (say, tokenLifetime / 2, or /4) as well? I think this is more future proof because {{KeyManager}} is associated with {{NameNodeConnector}} - it seems dispatcher is the only place that retrieves this KM, but I feel the problem exists with NNC. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, > HDFS-11741.003.patch, HDFS-11741.004.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020326#comment-16020326 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 71 unchanged - 0 fixed = 73 total (was 71) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 51s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHDFS | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869326/HDFS-11741.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux d658dd0b9b9e 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9cab42c | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/19545/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19545/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19545/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19545/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013246#comment-16013246 ] Yongjun Zhang commented on HDFS-11741: -- Hi [~jojochuang], Thanks for your work here. It seems the patch doesn't apply anymore. Would you please update it? Thanks. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, > HDFS-11741.003.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003619#comment-16003619 ] Wei-Chiu Chuang commented on HDFS-11741: [~zhz] [~shahrs87] mind to chime in on this observation? {quote} I just realized a client side BlockTokenSecretManager generates DataEncryptionKey expiration time using now + token life time. I am not sure if that's intended, as I would have assumed the key expiration time equals the current BlockKey expiration time (which is determined by NameNode). So it is entirely possible that balancer has an unexpired DataEncryptionKey, corresponding to an expired BlockKey. When it talks to the other side, the expired BlockKey would fail the connection. Therefore my rev 01 patch would not fix all the problems because of this mismatch. {quote} Thanks! > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11741.001.patch, HDFS-11741.002.patch, > HDFS-11741.003.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995913#comment-15995913 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 36s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 31s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 70 unchanged - 1 fixed = 73 total (was 71) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 44s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 87m 35s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12866274/HDFS-11741.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux a0846454202a 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fd5cb2c | | Default Java | 1.8.0_121 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/19307/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/19307/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19307/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19307/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19307/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995578#comment-15995578 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 39s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 33s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 22 unchanged - 1 fixed = 25 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 36s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}116m 19s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.namenode.TestMetadataVersionOutput | | | hadoop.hdfs.server.namenode.TestStartup | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12866234/HDFS-11741.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 928953f4eb07 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d4631e4 | | Default Java | 1.8.0_121 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/19301/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/19301/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19301/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19301/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995341#comment-15995341 ] Wei-Chiu Chuang commented on HDFS-11741: Hi Andrew, thanks for the review! I just realized a client side BlockTokenSecretManager generates DataEncryptionKey expiration time using now + token life time. I am not sure if that's intended, as I would have assumed the key expiration time equals the current BlockKey expiration time (which is determined by NameNode). So it is entirely possible that balancer has an unexpired DataEncryptionKey, corresponding to an expired BlockKey. When it talks to the other side, the expired BlockKey would fail the connection. Therefore my rev 01 patch would fix all the problems because of this mismatch. There are two potential fixes: * Change BlockTokenSecretManager so that DEK expiration is based on current BlockKey expiration. * Change Balancer to catch InvalidEncryptionKeyException, generate a new DEK and repeat the connection. I feel the first fix is the right one. But it changes every participant in HDFS, so want to double check here. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11741.001.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993858#comment-15993858 ] Andrew Wang commented on HDFS-11741: Hi Wei-chiu, patch looks good overall, few quick questions: * For other tokens, we renew before the token is expired, e.g. after half the token lifetime has elapsed. This handles clock skew and TOCTOU issues. Should we do this here too? * Is it possible to write a unit test using a FakeTimer rather than using Thread.sleep? * Test is using the JUnit 3 assert, please use JUnit 4's asserts instead. > Long running balancer may fail due to expired DataEncryptionKey > --- > > Key: HDFS-11741 > URL: https://issues.apache.org/jira/browse/HDFS-11741 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Environment: CDH5.8.2, Kerberos, Data transfer encryption enabled. > Balancer login using keytab >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-11741.001.patch > > > We found a long running balancer may fail despite using keytab, because > KeyManager returns expired DataEncryptionKey, and it throws the following > exception: > {noformat} > 2017-04-30 05:03:58,661 WARN [pool-1464-thread-10] balancer.Dispatcher > (Dispatcher.java:dispatch(325)) - Failed to move blk_1067352712_3913241 with > size=546650 from 10.0.0.134:50010:DISK to 10.0.0.98:50010:DISK through > 10.0.0.134:50010 > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=1005215027) doesn't exist. Current key: 1005215030 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:311) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2300(Dispatcher.java:182) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:899) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This bug is similar in nature to HDFS-10609. While balancer KeyManager > actively synchronizes itself with NameNode w.r.t block keys, it does not > update DataEncryptionKey accordingly. > In a specific cluster, with Kerberos ticket life time 10 hours, and default > block token expiration/life time 10 hours, a long running balancer failed > after 20~30 hours. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11741) Long running balancer may fail due to expired DataEncryptionKey
[ https://issues.apache.org/jira/browse/HDFS-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993731#comment-15993731 ] Hadoop QA commented on HDFS-11741: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 1s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 3s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}113m 1s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}159m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.namenode.TestStartup | | | hadoop.hdfs.server.namenode.TestMetadataVersionOutput | | | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Timed out junit tests | org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-11741 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12866019/HDFS-11741.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 1fc5f3e01f5c 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dcc292d | | Default Java | 1.8.0_121 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/19274/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19274/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19274/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs