[jira] [Assigned] (HDFS-13398) Hdfs recursive listing operation is very slow
[ https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey reassigned HDFS-13398: --- Assignee: Ajay Sachdev > Hdfs recursive listing operation is very slow > - > > Key: HDFS-13398 > URL: https://issues.apache.org/jira/browse/HDFS-13398 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.1 > Environment: HCFS file system where HDP 2.6.1 is connected to ECS > (Object Store). >Reporter: Ajay Sachdev >Assignee: Ajay Sachdev >Priority: Major > Fix For: 2.7.1 > > Attachments: parallelfsPatch > > > The hdfs dfs -ls -R command is sequential in nature and is very slow for a > HCFS system. We have seen around 6 mins for 40K directory/files structure. > The proposal is to use multithreading approach to speed up recursive list, du > and count operations. > We have tried a ForkJoinPool implementation to improve performance for > recursive listing operation. > [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli] > commit id : > 82387c8cd76c2e2761bd7f651122f83d45ae8876 > Another implementation is to use Java Executor Service to improve performance > to run listing operation in multiple threads in parallel. This has > significantly reduced the time to 40 secs from 6 mins. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438531#comment-16438531 ] yunjiong zhao commented on HDFS-13441: -- [~hexiaoqiao] , this issue is different, it is not about DN register to NN, it is about lost heartbeat responses from Standby NameNode could end with some DataNodes missed the block key after Standby NameNode become active. > DataNode missed BlockKey update from NameNode due to HeartbeatResponse was > dropped > -- > > Key: HDFS-13441 > URL: https://issues.apache.org/jira/browse/HDFS-13441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.1 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-13441.patch > > > After NameNode failover, lots of application failed due to some DataNodes > can't re-compute password from block token. > {code:java} > 2018-04-11 20:10:52,448 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error > processing unknown operation src: /10.142.74.116:57404 dst: > /10.142.77.45:50010 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist.] > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > ... 7 more > {code} > > In the DataNode log, we didn't see DataNode update block keys around > 2018-04-11 09:55:00 and around 2018-04-11 19:55:00. > {code:java} > 2018-04-10 14:51:36,424 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-10 23:55:38,420 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 00:51:34,792 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 10:51:39,403 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 20:51:44,422 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-12 02:54:47,855 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-12
[jira] [Commented] (HDFS-13424) Ozone: Refactor MiniOzoneClassicCluster
[ https://issues.apache.org/jira/browse/HDFS-13424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438529#comment-16438529 ] genericqa commented on HDFS-13424: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 49 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 6s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 32m 10s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 38m 35s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 50s{color} | {color:green} HDFS-7240 passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 28s{color} | {color:red} container-service in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 34s{color} | {color:red} integration-test in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 26s{color} | {color:red} tools in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 28s{color} | {color:red} hadoop-ozone in HDFS-7240 failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 27s{color} | {color:red} container-service in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 25s{color} | {color:red} tools in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 25s{color} | {color:red} hadoop-ozone in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s{color} | {color:red} container-service in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s{color} | {color:red} integration-test in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s{color} | {color:red} tools in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s{color} | {color:red} hadoop-ozone in HDFS-7240 failed. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 11s{color} | {color:red} container-service in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 11s{color} | {color:red} integration-test in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 12s{color} | {color:red} tools in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 11s{color} | {color:red} hadoop-ozone in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 30m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 24s{color} | {color:red} container-service in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s{color} | {color:red} integration-test in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s{color} | {color:red} tools in the patch failed. {color} | |
[jira] [Updated] (HDFS-13394) Ozone: ContainerID has incorrect package name
[ https://issues.apache.org/jira/browse/HDFS-13394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-13394: --- Description: {{ContainerID}} class's package name and the directory structure where the class is present doesn't match. (was: {{ContainerID}} package name and the directory structure where the class is present doesn't match.) > Ozone: ContainerID has incorrect package name > - > > Key: HDFS-13394 > URL: https://issues.apache.org/jira/browse/HDFS-13394 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Lokesh Jain >Priority: Major > Labels: newbie > > {{ContainerID}} class's package name and the directory structure where the > class is present doesn't match. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12752) Ozone: SCM: Make container report processing async
[ https://issues.apache.org/jira/browse/HDFS-12752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-12752: --- Summary: Ozone: SCM: Make container report processing async (was: Ozone: SCM: Make container report processing asynchronous) > Ozone: SCM: Make container report processing async > -- > > Key: HDFS-12752 > URL: https://issues.apache.org/jira/browse/HDFS-12752 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Priority: Major > > {{StorageContainerManager#sendContainerReport}} processes the container > reports sent by datanodes, this calls > {{ContainerMapping#processContainerReports}} to do the actual processing. > This jira is to make {{ContainerMapping#processContainerReports}} call > asynchronous. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13424) Ozone: Refactor MiniOzoneClassicCluster
[ https://issues.apache.org/jira/browse/HDFS-13424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-13424: --- Status: Patch Available (was: Open) > Ozone: Refactor MiniOzoneClassicCluster > --- > > Key: HDFS-13424 > URL: https://issues.apache.org/jira/browse/HDFS-13424 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Attachments: HDFS-13424-HDFS-7240.000.patch > > > This jira will track the refactoring work on {{MiniOzoneClassicCluster}} > which removes the dependency and changes made in {{MiniDFSCluster}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13424) Ozone: Refactor MiniOzoneClassicCluster
[ https://issues.apache.org/jira/browse/HDFS-13424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438491#comment-16438491 ] Nanda kumar commented on HDFS-13424: This patch refactors {{MiniOzoneCluster}} and tries to fix {{hadoop-ozone/integration-test}} test-cases. Before the patch: [ERROR] Tests run: 244, Failures: 6, Errors: 22, Skipped: 9 After the patch: [ERROR] Tests run: 244, Failures: 0, Errors: 3, Skipped: 10 Failing/Error test-cases are: # org.apache.hadoop.ozone.TestOzoneConfigurationFields#testCompareXmlAgainstConfigurationClass # org.apache.hadoop.ozone.container.common.TestBlockDeletingService#testBlockDeletionTimeout # org.apache.hadoop.ozone.web.client.TestKeys#testPutAndGetKeyWithDnRestart 1 & 2 are not related to this patch, created HDFS-13449 and HDFS-13450 to track them. 3 seems to be related to this patch, we can create a tracking jira for it once this patch is committed. > Ozone: Refactor MiniOzoneClassicCluster > --- > > Key: HDFS-13424 > URL: https://issues.apache.org/jira/browse/HDFS-13424 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Attachments: HDFS-13424-HDFS-7240.000.patch > > > This jira will track the refactoring work on {{MiniOzoneClassicCluster}} > which removes the dependency and changes made in {{MiniDFSCluster}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13424) Ozone: Refactor MiniOzoneClassicCluster
[ https://issues.apache.org/jira/browse/HDFS-13424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-13424: --- Attachment: HDFS-13424-HDFS-7240.000.patch > Ozone: Refactor MiniOzoneClassicCluster > --- > > Key: HDFS-13424 > URL: https://issues.apache.org/jira/browse/HDFS-13424 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Attachments: HDFS-13424-HDFS-7240.000.patch > > > This jira will track the refactoring work on {{MiniOzoneClassicCluster}} > which removes the dependency and changes made in {{MiniDFSCluster}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13450) Ozone: TestBlockDeletingService test-case is failing
[ https://issues.apache.org/jira/browse/HDFS-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438486#comment-16438486 ] Nanda kumar commented on HDFS-13450: Stacktrace {code} [INFO] Running org.apache.hadoop.ozone.container.common.TestBlockDeletingService [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 102.855 s <<< FAILURE! - in org.apache.hadoop.ozone.container.common.TestBlockDeletingService [ERROR] testBlockDeletionTimeout(org.apache.hadoop.ozone.container.common.TestBlockDeletingService) Time elapsed: 100.25 s <<< ERROR! java.util.concurrent.TimeoutException: Timed out waiting for condition. Thread diagnostics: Timestamp: 2018-04-15 01:05:47,175 "Finalizer" daemon prio=8 tid=3 in Object.wait() java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) "BlockDeletingService#6" daemon prio=5 tid=31 in Object.wait() java.lang.Thread.State: WAITING (on object monitor) at sun.misc.Unsafe.park(Native Method) {code} > Ozone: TestBlockDeletingService test-case is failing > > > Key: HDFS-13450 > URL: https://issues.apache.org/jira/browse/HDFS-13450 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Priority: Major > > {{org.apache.hadoop.ozone.container.common.TestBlockDeletingService#testBlockDeletionTimeout}} > test-case is failing consistently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13450) Ozone: TestBlockDeletingService test-case is failing
Nanda kumar created HDFS-13450: -- Summary: Ozone: TestBlockDeletingService test-case is failing Key: HDFS-13450 URL: https://issues.apache.org/jira/browse/HDFS-13450 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Nanda kumar {{org.apache.hadoop.ozone.container.common.TestBlockDeletingService#testBlockDeletionTimeout}} test-case is failing consistently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13449) Ozone: TestOzoneConfigurationFields is failing
[ https://issues.apache.org/jira/browse/HDFS-13449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438483#comment-16438483 ] Nanda kumar commented on HDFS-13449: Stacktrace {code} [INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.259 s <<< FAILURE! - in org.apache.hadoop.ozone.TestOzoneConfigurationFields [ERROR] testCompareXmlAgainstConfigurationClass(org.apache.hadoop.ozone.TestOzoneConfigurationFields) Time elapsed: 0.112 s <<< FAILURE! java.lang.AssertionError: ozone-default.xml has 2 properties missing in class org.apache.hadoop.ozone.OzoneConfigKeys class org.apache.hadoop.hdds.scm.ScmConfigKeys class org.apache.hadoop.ozone.ksm.KSMConfigKeys Entries: hadoop.custom.tags ozone.system.tags expected:<0> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareXmlAgainstConfigurationClass(TestConfigurationFieldsBase.java:540) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {code} > Ozone: TestOzoneConfigurationFields is failing > -- > > Key: HDFS-13449 > URL: https://issues.apache.org/jira/browse/HDFS-13449 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Priority: Major > > {{TestOzoneConfigurationFields}} is failing because of two properties > introduced in ozone-default.xml by HDFS-13197 > * hadoop.custom.tags > * ozone.system.tags > > Which are not present in any ConfigurationClasses. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13449) Ozone: TestOzoneConfigurationFields is failing
[ https://issues.apache.org/jira/browse/HDFS-13449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-13449: --- Description: {{TestOzoneConfigurationFields}} is failing because of two properties introduced in ozone-default.xml by HDFS-13197 * hadoop.custom.tags * ozone.system.tags Which are not present in any ConfigurationClasses. was: {{TestOzoneConfigurationFields}} is failing because of two properties introduced in ozone-default.xml by HDFS-13197 * hadoop.custom.tags * ozone.system.tags Which are not present in any ConfigurationClasses. > Ozone: TestOzoneConfigurationFields is failing > -- > > Key: HDFS-13449 > URL: https://issues.apache.org/jira/browse/HDFS-13449 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Priority: Major > > {{TestOzoneConfigurationFields}} is failing because of two properties > introduced in ozone-default.xml by HDFS-13197 > * hadoop.custom.tags > * ozone.system.tags > > Which are not present in any ConfigurationClasses. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13449) Ozone: TestOzoneConfigurationFields is failing
Nanda kumar created HDFS-13449: -- Summary: Ozone: TestOzoneConfigurationFields is failing Key: HDFS-13449 URL: https://issues.apache.org/jira/browse/HDFS-13449 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Nanda kumar {{TestOzoneConfigurationFields}} is failing because of two properties introduced in ozone-default.xml by HDFS-13197 * hadoop.custom.tags * ozone.system.tags Which are not present in any ConfigurationClasses. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13424) Ozone: Refactor MiniOzoneClassicCluster
[ https://issues.apache.org/jira/browse/HDFS-13424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-13424: --- Attachment: (was: HDFS-13424-HDFS-7240.000.patch) > Ozone: Refactor MiniOzoneClassicCluster > --- > > Key: HDFS-13424 > URL: https://issues.apache.org/jira/browse/HDFS-13424 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > > This jira will track the refactoring work on {{MiniOzoneClassicCluster}} > which removes the dependency and changes made in {{MiniDFSCluster}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13446) Ozone: Fix OzoneFileSystem contract test failures
[ https://issues.apache.org/jira/browse/HDFS-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-13446: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Ozone: Fix OzoneFileSystem contract test failures > - > > Key: HDFS-13446 > URL: https://issues.apache.org/jira/browse/HDFS-13446 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13446-HDFS-7240.001.patch > > > This jira refactors contract tests to the src/test directory and also fixes > the ozone filsystem contract tests as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13446) Ozone: Fix OzoneFileSystem contract test failures
[ https://issues.apache.org/jira/browse/HDFS-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438461#comment-16438461 ] Nanda kumar commented on HDFS-13446: Thanks [~msingh] for the contribution, I have committed this patch to the feature branch. > Ozone: Fix OzoneFileSystem contract test failures > - > > Key: HDFS-13446 > URL: https://issues.apache.org/jira/browse/HDFS-13446 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13446-HDFS-7240.001.patch > > > This jira refactors contract tests to the src/test directory and also fixes > the ozone filsystem contract tests as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13446) Ozone: Fix OzoneFileSystem contract test failures
[ https://issues.apache.org/jira/browse/HDFS-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438455#comment-16438455 ] Nanda kumar commented on HDFS-13446: Thanks [~msingh] for working on this. +1, the patch looks good to me. I will commit the patch shortly. {{OzoneContract}} and {{TestOzoneFileInterfaces}} have unused imports, I will fix them while committing. > Ozone: Fix OzoneFileSystem contract test failures > - > > Key: HDFS-13446 > URL: https://issues.apache.org/jira/browse/HDFS-13446 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13446-HDFS-7240.001.patch > > > This jira refactors contract tests to the src/test directory and also fixes > the ozone filsystem contract tests as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438448#comment-16438448 ] He Xiaoqiao edited comment on HDFS-13441 at 4/14/18 6:09 PM: - [~zhaoyunjiong], If I understand correctly, this issue was caused by StandbyNameNode restart. NameNode's load is very high when restart in a large cluster, especially when NameNode starts to process block report, if there are some DataNode to reregister in this period, it may be timeout very likely, since BPServiceActor#register can not catch IOException which wrap over SocketTimeoutException. In one word, NameNode correctly processed the registration, but the DataNode timeout before receiving the response. so #updateBlockKeysWhenStartup could not be invoke. {code:java} void register(NamespaceInfo nsInfo) throws IOException { // The handshake() phase loaded the block pool storage // off disk - so update the bpRegistration object from that info DatanodeRegistration newBpRegistration = bpos.createRegistration(); LOG.info(this + " beginning handshake with NN"); while (shouldRun()) { try { // Use returned registration from namenode with updated fields newBpRegistration = bpNamenode.registerDatanode(newBpRegistration); newBpRegistration.setNamespaceInfo(nsInfo); bpRegistration = newBpRegistration; break; } catch(EOFException e) { // namenode might have just restarted LOG.info("Problem connecting to server: " + nnAddr + " :" + e.getLocalizedMessage()); sleepAndLogInterrupts(1000, "connecting to server"); } catch(SocketTimeoutException e) { // namenode is busy LOG.info("Problem connecting to server: " + nnAddr); sleepAndLogInterrupts(1000, "connecting to server"); } } LOG.info("Block pool " + this + " successfully registered with NN"); bpos.registrationSucceeded(this, bpRegistration); // random short delay - helps scatter the BR from all DNs scheduler.scheduleBlockReport(dnConf.initialBlockReportDelay); updateBlockKeysWhenStartup(); } {code} In this case, the following Read/Write from client to this DataNode would be certain to thrown {{SaslException}}. HDFS-12749 is trying to resolve this matter once for all. FYI. was (Author: hexiaoqiao): [~zhaoyunjiong], If I understand correctly, this issue was caused by StandbyNameNode restart. NameNode's load is very high when restart in a large cluster, especially when NameNode starts to process block report, if there are some DataNode to reregister in this period, it may be timeout very likely, since BPServiceActor#register can not catch SocketTimeoutException. In one word, NameNode correctly processed the registration, but the DataNode timeout before receiving the response. so #updateBlockKeysWhenStartup could not be invoke. {code:java} void register(NamespaceInfo nsInfo) throws IOException { // The handshake() phase loaded the block pool storage // off disk - so update the bpRegistration object from that info DatanodeRegistration newBpRegistration = bpos.createRegistration(); LOG.info(this + " beginning handshake with NN"); while (shouldRun()) { try { // Use returned registration from namenode with updated fields newBpRegistration = bpNamenode.registerDatanode(newBpRegistration); newBpRegistration.setNamespaceInfo(nsInfo); bpRegistration = newBpRegistration; break; } catch(EOFException e) { // namenode might have just restarted LOG.info("Problem connecting to server: " + nnAddr + " :" + e.getLocalizedMessage()); sleepAndLogInterrupts(1000, "connecting to server"); } catch(SocketTimeoutException e) { // namenode is busy LOG.info("Problem connecting to server: " + nnAddr); sleepAndLogInterrupts(1000, "connecting to server"); } } LOG.info("Block pool " + this + " successfully registered with NN"); bpos.registrationSucceeded(this, bpRegistration); // random short delay - helps scatter the BR from all DNs scheduler.scheduleBlockReport(dnConf.initialBlockReportDelay); updateBlockKeysWhenStartup(); } {code} In this case, the following Read/Write from client to this DataNode would be certain to thrown {{SaslException}}. HDFS-12749 is trying to resolve this matter once for all. FYI. > DataNode missed BlockKey update from NameNode due to HeartbeatResponse was > dropped > -- > > Key: HDFS-13441 > URL: https://issues.apache.org/jira/browse/HDFS-13441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.1 >Reporter: yunjiong zhao >Assignee:
[jira] [Updated] (HDFS-13424) Ozone: Refactor MiniOzoneClassicCluster
[ https://issues.apache.org/jira/browse/HDFS-13424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar updated HDFS-13424: --- Attachment: HDFS-13424-HDFS-7240.000.patch > Ozone: Refactor MiniOzoneClassicCluster > --- > > Key: HDFS-13424 > URL: https://issues.apache.org/jira/browse/HDFS-13424 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Attachments: HDFS-13424-HDFS-7240.000.patch > > > This jira will track the refactoring work on {{MiniOzoneClassicCluster}} > which removes the dependency and changes made in {{MiniDFSCluster}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438448#comment-16438448 ] He Xiaoqiao commented on HDFS-13441: [~zhaoyunjiong], If I understand correctly, this issue was caused by StandbyNameNode restart. NameNode's load is very high when restart in a large cluster, especially when NameNode starts to process block report, if there are some DataNode to reregister in this period, it may be timeout very likely, since BPServiceActor#register can not catch SocketTimeoutException. In one word, NameNode correctly processed the registration, but the DataNode timeout before receiving the response. so #updateBlockKeysWhenStartup could not be invoke. {code:java} void register(NamespaceInfo nsInfo) throws IOException { // The handshake() phase loaded the block pool storage // off disk - so update the bpRegistration object from that info DatanodeRegistration newBpRegistration = bpos.createRegistration(); LOG.info(this + " beginning handshake with NN"); while (shouldRun()) { try { // Use returned registration from namenode with updated fields newBpRegistration = bpNamenode.registerDatanode(newBpRegistration); newBpRegistration.setNamespaceInfo(nsInfo); bpRegistration = newBpRegistration; break; } catch(EOFException e) { // namenode might have just restarted LOG.info("Problem connecting to server: " + nnAddr + " :" + e.getLocalizedMessage()); sleepAndLogInterrupts(1000, "connecting to server"); } catch(SocketTimeoutException e) { // namenode is busy LOG.info("Problem connecting to server: " + nnAddr); sleepAndLogInterrupts(1000, "connecting to server"); } } LOG.info("Block pool " + this + " successfully registered with NN"); bpos.registrationSucceeded(this, bpRegistration); // random short delay - helps scatter the BR from all DNs scheduler.scheduleBlockReport(dnConf.initialBlockReportDelay); updateBlockKeysWhenStartup(); } {code} In this case, the following Read/Write from client to this DataNode would be certain to thrown {{SaslException}}. HDFS-12749 is trying to resolve this matter once for all. FYI. > DataNode missed BlockKey update from NameNode due to HeartbeatResponse was > dropped > -- > > Key: HDFS-13441 > URL: https://issues.apache.org/jira/browse/HDFS-13441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.1 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-13441.patch > > > After NameNode failover, lots of application failed due to some DataNodes > can't re-compute password from block token. > {code:java} > 2018-04-11 20:10:52,448 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error > processing unknown operation src: /10.142.74.116:57404 dst: > /10.142.77.45:50010 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist.] > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist. > at >
[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438254#comment-16438254 ] genericqa commented on HDFS-13448: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 25m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 25m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 23s{color} | {color:orange} root: The patch generated 3 new + 475 unchanged - 0 fixed = 478 total (was 475) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 36s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 14s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 29s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}112m 50s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}248m 26s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.tools.TestHdfsConfigFields | | | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | HDFS-13448 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12919043/HDFS-13448.2.patch | | Optional Tests | asflicense compile javac javadoc