[jira] [Updated] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-13441: - Attachment: HDFS-13441.003.patch > DataNode missed BlockKey update from NameNode due to HeartbeatResponse was > dropped > -- > > Key: HDFS-13441 > URL: https://issues.apache.org/jira/browse/HDFS-13441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.1 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-13441.002.patch, HDFS-13441.003.patch, > HDFS-13441.patch > > > After NameNode failover, lots of application failed due to some DataNodes > can't re-compute password from block token. > {code:java} > 2018-04-11 20:10:52,448 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error > processing unknown operation src: /10.142.74.116:57404 dst: > /10.142.77.45:50010 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist.] > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > ... 7 more > {code} > > In the DataNode log, we didn't see DataNode update block keys around > 2018-04-11 09:55:00 and around 2018-04-11 19:55:00. > {code:java} > 2018-04-10 14:51:36,424 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-10 23:55:38,420 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 00:51:34,792 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 10:51:39,403 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 20:51:44,422 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-12 02:54:47,855 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-12 05:55:44,456 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > {code} > The reason is there is SocketTimeOutException when sending heartbeat
[jira] [Updated] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-13441: - Attachment: HDFS-13441.002.patch > DataNode missed BlockKey update from NameNode due to HeartbeatResponse was > dropped > -- > > Key: HDFS-13441 > URL: https://issues.apache.org/jira/browse/HDFS-13441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.1 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-13441.002.patch, HDFS-13441.patch > > > After NameNode failover, lots of application failed due to some DataNodes > can't re-compute password from block token. > {code:java} > 2018-04-11 20:10:52,448 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error > processing unknown operation src: /10.142.74.116:57404 dst: > /10.142.77.45:50010 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist.] > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > ... 7 more > {code} > > In the DataNode log, we didn't see DataNode update block keys around > 2018-04-11 09:55:00 and around 2018-04-11 19:55:00. > {code:java} > 2018-04-10 14:51:36,424 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-10 23:55:38,420 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 00:51:34,792 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 10:51:39,403 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 20:51:44,422 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-12 02:54:47,855 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-12 05:55:44,456 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > {code} > The reason is there is SocketTimeOutException when sending heartbeat to > StandbyNameNode >
[jira] [Updated] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-13441: - Description: After NameNode failover, lots of application failed due to some DataNodes can't re-compute password from block token. {code:java} 2018-04-11 20:10:52,448 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error processing unknown operation src: /10.142.74.116:57404 dst: /10.142.77.45:50010 javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't re-compute password for block_token_identifier (expiryDate=1523538652448, keyId=1762737944, userId=hadoop, blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, access modes=[WRITE]), since the required block key (keyID=1762737944) doesn't exist.] at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't re-compute password for block_token_identifier (expiryDate=1523538652448, keyId=1762737944, userId=hadoop, blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, access modes=[WRITE]), since the required block key (keyID=1762737944) doesn't exist. at org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) at org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) ... 7 more {code} In the DataNode log, we didn't see DataNode update block keys around 2018-04-11 09:55:00 and around 2018-04-11 19:55:00. {code:java} 2018-04-10 14:51:36,424 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-10 23:55:38,420 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-11 00:51:34,792 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-11 10:51:39,403 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-11 20:51:44,422 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-12 02:54:47,855 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-12 05:55:44,456 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys {code} The reason is there is SocketTimeOutException when sending heartbeat to StandbyNameNode {code:java} 2018-04-11 09:55:34,699 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.SocketTimeoutException: Call From hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com/10.142.77.45 to ares-nn.vip.ebay.com:8030 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.142.77.45:48803 remote=ares-nn.vip.ebay.com/10.103.108.200:8030]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown Source) at
[jira] [Updated] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-13441: - Description: After NameNode failover, lots of application failed due to some DataNodes can't re-compute password from block token. {code:java} 2018-04-11 20:10:52,448 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error processing unknown operation src: /10.142.74.116:57404 dst: /10.142.77.45:50010 javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't re-compute password for block_token_identifier (expiryDate=1523538652448, keyId=1762737944, userId=hadoop, blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, access modes=[WRITE]), since the required block key (keyID=1762737944) doesn't exist.] at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't re-compute password for block_token_identifier (expiryDate=1523538652448, keyId=1762737944, userId=hadoop, blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, access modes=[WRITE]), since the required block key (keyID=1762737944) doesn't exist. at org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) at org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) ... 7 more {code} In the DataNode log, we didn't see DataNode update block keys around 2018-04-11 09:55:00 and around 2018-04-11 19:55:00. {code:java} 2018-04-10 14:51:36,424 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-10 23:55:38,420 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-11 00:51:34,792 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-11 10:51:39,403 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-11 20:51:44,422 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-12 02:54:47,855 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-12 05:55:44,456 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys {code} The reason is there is SocketTimeOutException when sending heartbeat to StandbyNameNode {code:java} 2018-04-11 09:55:34,699 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.SocketTimeoutException: Call From hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com/10.142.77.45 to ares-nn.vip.ebay.com:8030 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.142.77.45:48803 remote=ares-nn.vip.ebay.com/10.103.108.200:8030]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown Source) at
[jira] [Updated] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-13441: - Status: Patch Available (was: Open) There are two ways to fix this bug: one is making sure NameNode's send new BlockKey to DataNodes successfully; another one is once DataNode can't find BlockKey, re-register the Datanode to NameNodes to make sure DataNode get the newest BlockKeys. The first way is much more complex and need change more code than second way. The attached patch is re-register DataNode to NameNodes. Not tested, just for ideas. > DataNode missed BlockKey update from NameNode due to HeartbeatResponse was > dropped > -- > > Key: HDFS-13441 > URL: https://issues.apache.org/jira/browse/HDFS-13441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.1 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-13441.patch > > > After NameNode failover, lots of application failed due to some DataNodes > can't re-compute password from block token. > {code:java} > 2018-04-11 20:10:52,448 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error > processing unknown operation src: /10.142.74.116:57404 dst: > /10.142.77.45:50010 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist.] > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > ... 7 more > {code} > > In the DataNode log, we didn't see DataNode update block keys around > 2018-04-11 09:55. > {code:java} > 2018-04-10 14:51:36,424 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-10 23:55:38,420 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 00:51:34,792 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 10:51:39,403 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 20:51:44,422 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting >
[jira] [Updated] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-13441: - Attachment: HDFS-13441.patch > DataNode missed BlockKey update from NameNode due to HeartbeatResponse was > dropped > -- > > Key: HDFS-13441 > URL: https://issues.apache.org/jira/browse/HDFS-13441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.1 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-13441.patch > > > After NameNode failover, lots of application failed due to some DataNodes > can't re-compute password from block token. > {code:java} > 2018-04-11 20:10:52,448 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error > processing unknown operation src: /10.142.74.116:57404 dst: > /10.142.77.45:50010 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist.] > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > ... 7 more > {code} > > In the DataNode log, we didn't see DataNode update block keys around > 2018-04-11 09:55. > {code:java} > 2018-04-10 14:51:36,424 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-10 23:55:38,420 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 00:51:34,792 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 10:51:39,403 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 20:51:44,422 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-12 02:54:47,855 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-12 05:55:44,456 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > {code} > The reason is there is SocketTimeOutException when send heartbeat to > StandbyNameNode > {code:java} > 2018-04-11 09:55:34,699 WARN
[jira] [Updated] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-13441: - Description: After NameNode failover, lots of application failed due to some DataNodes can't re-compute password from block token. {code:java} 2018-04-11 20:10:52,448 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error processing unknown operation src: /10.142.74.116:57404 dst: /10.142.77.45:50010 javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't re-compute password for block_token_identifier (expiryDate=1523538652448, keyId=1762737944, userId=hadoop, blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, access modes=[WRITE]), since the required block key (keyID=1762737944) doesn't exist.] at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't re-compute password for block_token_identifier (expiryDate=1523538652448, keyId=1762737944, userId=hadoop, blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, access modes=[WRITE]), since the required block key (keyID=1762737944) doesn't exist. at org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) at org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) ... 7 more {code} In the DataNode log, we didn't see DataNode update block keys around 2018-04-11 09:55. {code:java} 2018-04-10 14:51:36,424 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-10 23:55:38,420 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-11 00:51:34,792 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-11 10:51:39,403 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-11 20:51:44,422 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-12 02:54:47,855 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys 2018-04-12 05:55:44,456 INFO org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting block keys {code} The reason is there is SocketTimeOutException when send heartbeat to StandbyNameNode {code:java} 2018-04-11 09:55:34,699 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.SocketTimeoutException: Call From hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com/10.142.77.45 to ares-nn.vip.ebay.com:8030 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.142.77.45:48803 remote=ares-nn.vip.ebay.com/10.103.108.200:8030]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at