[jira] [Commented] (HDFS-13114) CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path
[ https://issues.apache.org/jira/browse/HDFS-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381301#comment-16381301 ] Hanisha Koneru commented on HDFS-13114: --- Thank you [~xyao] for committing the patch and [~jojochuang] for the review. > CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path > > > Key: HDFS-13114 > URL: https://issues.apache.org/jira/browse/HDFS-13114 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 3.1.0, 3.0.2, 3.2.0 > > Attachments: HDFS-13114.001.patch > > > The {{crypto -reencryptZone -path }} command takes in a path > argument. But when creating {{HdfsAdmin}} object, it takes the defaultFs > instead of resolving from the path. This causes the following exception if > the authority component in path does not match the authority of default Fs. > {code:java} > $ hdfs crypto -reencryptZone -start -path hdfs://mycluster-node-1:8020/zone1 > IllegalArgumentException: Wrong FS: hdfs://mycluster-node-1:8020/zone1, > expected: hdfs://ns1{code} > {code:java} > $ hdfs crypto -reencryptZone -start -path hdfs://ns2/zone2 > IllegalArgumentException: Wrong FS: hdfs://ns2/zone2, expected: > hdfs://ns1{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13109) Support fully qualified hdfs path in EZ commands
[ https://issues.apache.org/jira/browse/HDFS-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13109: -- Attachment: HDFS-13109.004.patch > Support fully qualified hdfs path in EZ commands > > > Key: HDFS-13109 > URL: https://issues.apache.org/jira/browse/HDFS-13109 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13109.001.patch, HDFS-13109.002.patch, > HDFS-13109.003.patch, HDFS-13109.004.patch > > > When creating an Encryption Zone, if the fully qualified path is specified in > the path argument, it throws the following error. > {code:java} > ~$ hdfs crypto -createZone -keyName mykey1 -path hdfs://ns1/zone1 > IllegalArgumentException: hdfs://ns1/zone1 is not the root of an encryption > zone. Do you mean /zone1? > ~$ hdfs crypto -createZone -keyName mykey1 -path "hdfs://namenode:9000/zone2" > IllegalArgumentException: hdfs://namenode:9000/zone2 is not the root of an > encryption zone. Do you mean /zone2? > {code} > The EZ creation succeeds as the path is resolved in > DFS#createEncryptionZone(). But while creating the Trash directory, the path > is not resolved and it throws the above error. > A fully qualified path should be supported by {{crypto}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13109) Support fully qualified hdfs path in EZ commands
[ https://issues.apache.org/jira/browse/HDFS-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379287#comment-16379287 ] Hanisha Koneru commented on HDFS-13109: --- Thanks for the review, [~xyao]. bq. You have already resolved the path in the calling function public void provisionEZTrash. You can just pass the resolved path to the private method provisionEZTrash instead of getPathName. [~shahrs87], we would have to call {{getPathName()}} as the {{FileSystemLinkResolver.resolve}} function in the calling {{public void provisionEZTrash}} doesn't verify that the path belongs to the correct filesystem. Please let me know if I am missing something here. I have reverted {{p.toUri().getPath()}} to {{getPathName(p)}} in patch v04. > Support fully qualified hdfs path in EZ commands > > > Key: HDFS-13109 > URL: https://issues.apache.org/jira/browse/HDFS-13109 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13109.001.patch, HDFS-13109.002.patch, > HDFS-13109.003.patch, HDFS-13109.004.patch > > > When creating an Encryption Zone, if the fully qualified path is specified in > the path argument, it throws the following error. > {code:java} > ~$ hdfs crypto -createZone -keyName mykey1 -path hdfs://ns1/zone1 > IllegalArgumentException: hdfs://ns1/zone1 is not the root of an encryption > zone. Do you mean /zone1? > ~$ hdfs crypto -createZone -keyName mykey1 -path "hdfs://namenode:9000/zone2" > IllegalArgumentException: hdfs://namenode:9000/zone2 is not the root of an > encryption zone. Do you mean /zone2? > {code} > The EZ creation succeeds as the path is resolved in > DFS#createEncryptionZone(). But while creating the Trash directory, the path > is not resolved and it throws the above error. > A fully qualified path should be supported by {{crypto}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13114) CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path
[ https://issues.apache.org/jira/browse/HDFS-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379298#comment-16379298 ] Hanisha Koneru commented on HDFS-13114: --- Thanks for the reivew, [~xyao]. {{ListZonesCommand#run}} and \{{ListReencryptionStatusCommand#run}} do not have path parameters. So we have to fallback to defaultUri only. For these two commands, we would need to utilize the generic -fs option to specify the nameservice. > CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path > > > Key: HDFS-13114 > URL: https://issues.apache.org/jira/browse/HDFS-13114 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13114.001.patch > > > The {{crypto -reencryptZone -path }} command takes in a path > argument. But when creating {{HdfsAdmin}} object, it takes the defaultFs > instead of resolving from the path. This causes the following exception if > the authority component in path does not match the authority of default Fs. > {code:java} > $ hdfs crypto -reencryptZone -start -path hdfs://mycluster-node-1:8020/zone1 > IllegalArgumentException: Wrong FS: hdfs://mycluster-node-1:8020/zone1, > expected: hdfs://ns1{code} > {code:java} > $ hdfs crypto -reencryptZone -start -path hdfs://ns2/zone2 > IllegalArgumentException: Wrong FS: hdfs://ns2/zone2, expected: > hdfs://ns1{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10803) TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently due to no free space available
[ https://issues.apache.org/jira/browse/HDFS-10803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396329#comment-16396329 ] Hanisha Koneru commented on HDFS-10803: --- Thanks for the fix [~linyiqun]. The patch LGTM. Tested with multiple runs with and without the patch. +1. > TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails > intermittently due to no free space available > > > Key: HDFS-10803 > URL: https://issues.apache.org/jira/browse/HDFS-10803 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-10803.001.patch > > > The test {{TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools}} > fails intermittently. The stack > infos(https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithMultipleNameNodes/testBalancing2OutOf3Blockpools/): > {code} > java.io.IOException: Creating block, no free space available > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset$BInfo.(SimulatedFSDataset.java:151) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.injectBlocks(SimulatedFSDataset.java:580) > at > org.apache.hadoop.hdfs.MiniDFSCluster.injectBlocks(MiniDFSCluster.java:2679) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.unevenDistribution(TestBalancerWithMultipleNameNodes.java:405) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancing2OutOf3Blockpools(TestBalancerWithMultipleNameNodes.java:516) > {code} > The error message means that the datanode's capacity has used up and there is > no other space to create a new file block. > I looked into the code, I found the main reason seemed that the > {{capacities}} for cluster is not correctly constructed in the second > cluster startup before preparing to redistribute blocks in test. > The related code: > {code} > // Here we do redistribute blocks nNameNodes times for each node, > // we need to adjust the capacities. Otherwise it will cause the no > // free space errors sometimes. > final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) > .nnTopology(MiniDFSNNTopology.simpleFederatedTopology(nNameNodes)) > .numDataNodes(nDataNodes) > .racks(racks) > .simulatedCapacities(newCapacities) > .format(false) > .build(); > LOG.info("UNEVEN 11"); > ... > for(int n = 0; n < nNameNodes; n++) { > // redistribute blocks > final Block[][] blocksDN = TestBalancer.distributeBlocks( > blocks[n], s.replication, distributionPerNN); > > for(int d = 0; d < blocksDN.length; d++) > cluster.injectBlocks(n, d, Arrays.asList(blocksDN[d])); > LOG.info("UNEVEN 13: n=" + n); > } > {code} > And that means the totalUsed value has been increased as > {{nNameNodes*usedSpacePerNN}} rather than {{usedSpacePerNN}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13239) Fix non-empty dir warning message when setting default EC policy
[ https://issues.apache.org/jira/browse/HDFS-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396320#comment-16396320 ] Hanisha Koneru commented on HDFS-13239: --- +1 pending Jenkins. > Fix non-empty dir warning message when setting default EC policy > > > Key: HDFS-13239 > URL: https://issues.apache.org/jira/browse/HDFS-13239 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Bharat Viswanadham >Priority: Minor > Attachments: HDFS-13239.00.patch, HDFS-13239.01.patch, > HDFS-13239.02.patch, HDFS-13239.03.patch, HDFS-13239.04.patch > > > When EC policy is set on a non-empty directory, the following warning message > is given: > {code} > $hdfs ec -setPolicy -policy RS-6-3-1024k -path /ec1 > Warning: setting erasure coding policy on a non-empty directory will not > automatically convert existing files to RS-6-3-1024k > {code} > When we do not specify the -policy parameter when setting EC policy on a > directory, it takes the default EC policy. Setting default EC policy in this > way on a non-empty directory gives the following warning message: > {code} > $hdfs ec -setPolicy -path /ec2 > Warning: setting erasure coding policy on a non-empty directory will not > automatically convert existing files to null > {code} > Notice that the warning message in the 2nd case has the ecPolicy name shown > as null. We should instead give the default EC policy name in this message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13239) Fix non-empty dir warning message when setting default EC policy
[ https://issues.apache.org/jira/browse/HDFS-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391730#comment-16391730 ] Hanisha Koneru commented on HDFS-13239: --- Thanks for working on this, [~bharatviswa]. Looks good to me overall, just few minor comments: # We can directly assign the default policy to {{ecPolicyName}} and not need another variable {{ecName}}. # We would not need the below if condition as ecPolicyName cannot be null anymore. {code:java} if (ecPolicyName == null){ System.out.println("Set default erasure coding policy " + ecName + " on " + path); } {code} > Fix non-empty dir warning message when setting default EC policy > > > Key: HDFS-13239 > URL: https://issues.apache.org/jira/browse/HDFS-13239 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Bharat Viswanadham >Priority: Minor > Attachments: HDFS-13239.00.patch > > > When EC policy is set on a non-empty directory, the following warning message > is given: > {code} > $hdfs ec -setPolicy -policy RS-6-3-1024k -path /ec1 > Warning: setting erasure coding policy on a non-empty directory will not > automatically convert existing files to RS-6-3-1024k > {code} > When we do not specify the -policy parameter when setting EC policy on a > directory, it takes the default EC policy. Setting default EC policy in this > way on a non-empty directory gives the following warning message: > {code} > $hdfs ec -setPolicy -path /ec2 > Warning: setting erasure coding policy on a non-empty directory will not > automatically convert existing files to null > {code} > Notice that the warning message in the 2nd case has the ecPolicy name shown > as null. We should instead give the default EC policy name in this message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13239) Fix non-empty dir warning message when setting default EC policy
[ https://issues.apache.org/jira/browse/HDFS-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391816#comment-16391816 ] Hanisha Koneru commented on HDFS-13239: --- Thanks [~bharatviswa]. Got it now. Can we have some boolean {{isDefault}} or something instead of {{ecName}}. The two variables {{ecName}} and {{ecPolicyName}} are confusing :). > Fix non-empty dir warning message when setting default EC policy > > > Key: HDFS-13239 > URL: https://issues.apache.org/jira/browse/HDFS-13239 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Bharat Viswanadham >Priority: Minor > Attachments: HDFS-13239.00.patch > > > When EC policy is set on a non-empty directory, the following warning message > is given: > {code} > $hdfs ec -setPolicy -policy RS-6-3-1024k -path /ec1 > Warning: setting erasure coding policy on a non-empty directory will not > automatically convert existing files to RS-6-3-1024k > {code} > When we do not specify the -policy parameter when setting EC policy on a > directory, it takes the default EC policy. Setting default EC policy in this > way on a non-empty directory gives the following warning message: > {code} > $hdfs ec -setPolicy -path /ec2 > Warning: setting erasure coding policy on a non-empty directory will not > automatically convert existing files to null > {code} > Notice that the warning message in the 2nd case has the ecPolicy name shown > as null. We should instead give the default EC policy name in this message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI
[ https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391848#comment-16391848 ] Hanisha Koneru commented on HDFS-13244: --- +1 pending Jenkins. Will trigger a Jenkins run. > Add stack, conf, metrics links to utilities dropdown in NN webUI > > > Key: HDFS-13244 > URL: https://issues.apache.org/jira/browse/HDFS-13244 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 > AM.png > > > Add stack, conf, metrics links to utilities dropdown in NN webUI > cc [~arpitagarwal] for suggesting this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI
[ https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393728#comment-16393728 ] Hanisha Koneru commented on HDFS-13244: --- Looks like Jenkins cannot process html changes. Thanks for pointing it out [~elgoiri]. I will commit this shortly. > Add stack, conf, metrics links to utilities dropdown in NN webUI > > > Key: HDFS-13244 > URL: https://issues.apache.org/jira/browse/HDFS-13244 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 > AM.png > > > Add stack, conf, metrics links to utilities dropdown in NN webUI > cc [~arpitagarwal] for suggesting this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI
[ https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393728#comment-16393728 ] Hanisha Koneru edited comment on HDFS-13244 at 3/9/18 11:01 PM: Looks like Jenkins cannot process html changes. Thanks for pointing it out [~elgoiri]. Tested it on a test cluster. I will commit this shortly. was (Author: hanishakoneru): Looks like Jenkins cannot process html changes. Thanks for pointing it out [~elgoiri]. I will commit this shortly. > Add stack, conf, metrics links to utilities dropdown in NN webUI > > > Key: HDFS-13244 > URL: https://issues.apache.org/jira/browse/HDFS-13244 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 > AM.png > > > Add stack, conf, metrics links to utilities dropdown in NN webUI > cc [~arpitagarwal] for suggesting this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13023) Journal Sync does not work on a secure cluster
[ https://issues.apache.org/jira/browse/HDFS-13023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13023: -- Fix Version/s: 3.1.0 > Journal Sync does not work on a secure cluster > -- > > Key: HDFS-13023 > URL: https://issues.apache.org/jira/browse/HDFS-13023 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 3.1.0 > > Attachments: HDFS-13023.00.patch, HDFS-13023.01.patch, > HDFS-13023.02.patch, HDFS-13023.03.patch > > > Fails with the following exception. > {code} > 2018-01-10 01:15:40,517 INFO server.JournalNodeSyncer > (JournalNodeSyncer.java:syncWithJournalAtIndex(235)) - Syncing Journal > /0.0.0.0:8485 with xxx, journal id: mycluster > 2018-01-10 01:15:40,583 ERROR server.JournalNodeSyncer > (JournalNodeSyncer.java:syncWithJournalAtIndex(259)) - Could not sync with > Journal at xxx/xxx:8485 > com.google.protobuf.ServiceException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): > User nn/xxx (auth:PROXY) via jn/xxx (auth:KERBEROS) is not authorized for > protocol interface org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol: > this service is only accessible by nn/x...@example.com > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:242) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy16.getEditLogManifest(Unknown Source) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:254) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:230) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:190) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): > User nn/xxx (auth:PROXY) via jn/xxx (auth:KERBEROS) is not authorized for > protocol interface org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol: > this service is only accessible by nn/xxx > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI
[ https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13244: -- Resolution: Fixed Fix Version/s: 3.2.0 3.0.1 3.1.0 Status: Resolved (was: Patch Available) > Add stack, conf, metrics links to utilities dropdown in NN webUI > > > Key: HDFS-13244 > URL: https://issues.apache.org/jira/browse/HDFS-13244 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 3.1.0, 3.0.1, 3.2.0 > > Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 > AM.png > > > Add stack, conf, metrics links to utilities dropdown in NN webUI > cc [~arpitagarwal] for suggesting this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI
[ https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393830#comment-16393830 ] Hanisha Koneru commented on HDFS-13244: --- Committed to trunk, branch-3.1 and branch-3.0. Thanks for the contribution [~bharatviswa] and thanks for the review [~ajayydv]. > Add stack, conf, metrics links to utilities dropdown in NN webUI > > > Key: HDFS-13244 > URL: https://issues.apache.org/jira/browse/HDFS-13244 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 3.1.0, 3.0.1, 3.2.0 > > Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 > AM.png > > > Add stack, conf, metrics links to utilities dropdown in NN webUI > cc [~arpitagarwal] for suggesting this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11394) Add method for getting erasure coding policy through WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-11394: -- Attachment: HDFS-11394.006.patch > Add method for getting erasure coding policy through WebHDFS > - > > Key: HDFS-11394 > URL: https://issues.apache.org/jira/browse/HDFS-11394 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, namenode >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Major > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-11394.005.patch, HDFS-11394.006.patch, > HDFS-11394.01.patch, HDFS-11394.02.patch, HDFS-11394.03.patch, > HDFS-11394.04.patch > > > We can expose erasure coding policy by erasure coded directory through > WebHDFS method as well as storage policy. This information can be used by > NameNode Web UI and show the detail of erasure coded directories. > see: HDFS-8196 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11394) Add method for getting erasure coding policy through WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392240#comment-16392240 ] Hanisha Koneru commented on HDFS-11394: --- Thanks for the review, [~arpitagarwal]. Addressed javadoc and checkstyle issues in patch v06. > Add method for getting erasure coding policy through WebHDFS > - > > Key: HDFS-11394 > URL: https://issues.apache.org/jira/browse/HDFS-11394 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, namenode >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Major > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-11394.005.patch, HDFS-11394.006.patch, > HDFS-11394.01.patch, HDFS-11394.02.patch, HDFS-11394.03.patch, > HDFS-11394.04.patch > > > We can expose erasure coding policy by erasure coded directory through > WebHDFS method as well as storage policy. This information can be used by > NameNode Web UI and show the detail of erasure coded directories. > see: HDFS-8196 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13239) Fix non-empty dir warning message when setting default EC policy
[ https://issues.apache.org/jira/browse/HDFS-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392096#comment-16392096 ] Hanisha Koneru commented on HDFS-13239: --- Thanks [~bharatviswa]. +1 for patch v01 pending Jenkins. > Fix non-empty dir warning message when setting default EC policy > > > Key: HDFS-13239 > URL: https://issues.apache.org/jira/browse/HDFS-13239 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Hanisha Koneru >Assignee: Bharat Viswanadham >Priority: Minor > Attachments: HDFS-13239.00.patch, HDFS-13239.01.patch > > > When EC policy is set on a non-empty directory, the following warning message > is given: > {code} > $hdfs ec -setPolicy -policy RS-6-3-1024k -path /ec1 > Warning: setting erasure coding policy on a non-empty directory will not > automatically convert existing files to RS-6-3-1024k > {code} > When we do not specify the -policy parameter when setting EC policy on a > directory, it takes the default EC policy. Setting default EC policy in this > way on a non-empty directory gives the following warning message: > {code} > $hdfs ec -setPolicy -path /ec2 > Warning: setting erasure coding policy on a non-empty directory will not > automatically convert existing files to null > {code} > Notice that the warning message in the 2nd case has the ecPolicy name shown > as null. We should instead give the default EC policy name in this message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI
[ https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392179#comment-16392179 ] Hanisha Koneru commented on HDFS-13244: --- No but just wanted Jenkins to +1 it to follow conventions :) > Add stack, conf, metrics links to utilities dropdown in NN webUI > > > Key: HDFS-13244 > URL: https://issues.apache.org/jira/browse/HDFS-13244 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 > AM.png > > > Add stack, conf, metrics links to utilities dropdown in NN webUI > cc [~arpitagarwal] for suggesting this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13148) Unit test for EZ with KMS and Federation
[ https://issues.apache.org/jira/browse/HDFS-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392091#comment-16392091 ] Hanisha Koneru commented on HDFS-13148: --- Hi [~shahrs87], thanks for the review. If we make {{TestEncryptionZonesWithKMSandFederation}} extend {{TestEncryptionZonesWithKMS}} then the former would run all the test cases in the later with its initial {{setup}}. This would fail all the tests in {{TestEncryptionZonesWithKMS}} run against the setup of {{TestEncryptionZonesWithKMSandFederation}}. To get over this, we would have to modify all the test cases in {{TestEncryptionZonesWithKMS}} and {{TestEncryptionZones}} to work with the federated configuration setup. For example, instead of using variable {{dfsAdmin}} in {{TestEncryptionZonesWithKMS}}, we would have to change it to \{{dsfAdmin[0]}} to match the federated setup. I think it would just complicate all the three Tests by doing this. We could instead have a \{{TestEncryptionZonesBaseTest}} class and make all the other Tests extend this class. Or we could just keep the non-federated Tests and federated Tests separate (as is in patch v03). Please let me know your thoughts. > Unit test for EZ with KMS and Federation > > > Key: HDFS-13148 > URL: https://issues.apache.org/jira/browse/HDFS-13148 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13148.001.patch, HDFS-13148.002.patch, > HDFS-13148.003.patch > > > It would be good to have some unit tests for testing KMS and EZ on a > federated cluster. We can start with basic EZ operations. For example, create > EZs on two namespaces with different keys using one KMS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13442) Handle Datanode Registration failure
Hanisha Koneru created HDFS-13442: - Summary: Handle Datanode Registration failure Key: HDFS-13442 URL: https://issues.apache.org/jira/browse/HDFS-13442 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Hanisha Koneru Assignee: Hanisha Koneru If a datanode is not able to register itself, we need to handle that correctly. If the number of unsuccessful attempts to register with the SCM exceeds a configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13442) Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13442: -- Attachment: HDFS-13442-HDFS-7240.001.patch > Handle Datanode Registration failure > > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13442: -- Summary: Ozone: Handle Datanode Registration failure (was: Handle Datanode Registration failure) > Ozone: Handle Datanode Registration failure > --- > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13442: -- Attachment: HDFS-13442-HDFS-7240.002.patch > Ozone: Handle Datanode Registration failure > --- > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch, > HDFS-13442-HDFS-7240.002.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13442: -- Attachment: (was: HDFS-13442-HDFS-7240.002.patch) > Ozone: Handle Datanode Registration failure > --- > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13079) Provide a config to start namenode in safemode state upto a certain transaction id
[ https://issues.apache.org/jira/browse/HDFS-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443191#comment-16443191 ] Hanisha Koneru commented on HDFS-13079: --- Thanks for working on this [~shashikant]. bq. Please note that in case a checkpoint has already happened and the requested transaction id has been subsumed in an FSImage, then the namenode will be started with the next nearest transaction id. Further FSImage files and edits will be ignored. In case the requested tx id falls within the latest fsImage , do we want to load the said fsImage or fallback to a previous fsimage with lastTxId < requested txId. IMO, we should load the fsImage with the endTxId <= requested txId. * In {{FsImage#loadFSImage}}, the check for whether we should load a fsImage is made after the image is already being loaded. The line {{loader.load(curFile, requireSameLayoutVersion)}} loads the fsImage transactions into the NN. {code} FSImageFormat.LoaderDelegator loader = FSImageFormat.newLoader(conf, target); loader.load(curFile, requireSameLayoutVersion); long lastTxIdToLoad = target.getLastTxidToLoad(); long txId = loader.getLoadedImageTxId(); if (lastTxIdToLoad != HdfsServerConstants.INVALID_TXID && txId > lastTxIdToLoad) { {code} * When we skip loading the latest fsImage, we should keep falling back to try and load the next latest fsImage. For example, say we have the 2 fsImages - fsimage_00090 and fsimage_00150. Now say we want to start the namenode in safemode upto txId 120. We first check fsimage_00150 and reject it. After this, the NN should attempt to load the next latest fsimage i.e. fsimage_00090. We can throw an exception when skipping an fsImage and catch that exception in following code path in {{FSImage#loadFsImage}}. This way the next latest fsimage will be loaded. {code} 721FSImageFile imageFile = null; 722for (int i = 0; i < imageFiles.size(); i++) { 723 try { 724imageFile = imageFiles.get(i); 725loadFSImageFile(target, recovery, imageFile, startOpt); 726break; {code} * What do we do when there are no fsImages with endTxId <= requested txId? IMO, we should stop the NN and throw an error. > Provide a config to start namenode in safemode state upto a certain > transaction id > -- > > Key: HDFS-13079 > URL: https://issues.apache.org/jira/browse/HDFS-13079 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13079.001.patch, HDFS-13079.002.patch > > > In some cases it necessary to rollback the Namenode back to a certain > transaction id. This is especially needed when the user issues a {{rm -Rf > -skipTrash}} by mistake. > Rolling back to a transaction id helps in taking a peek at the filesystem at > a particular instant. This jira proposes to provide a configuration variable > using which the namenode can be started upto a certain transaction id. The > filesystem will be in a readonly safemode which cannot be overridden > manually. It will only be overridden by removing the config value from the > config file. Please also note that this will not cause any changes in the > filesystem state, the filesystem will be in safemode state and no changes to > the filesystem state will be allowed. > Please note that in case a checkpoint has already happened and the requested > transaction id has been subsumed in an FSImage, then the namenode will be > started with the next nearest transaction id. Further FSImage files and edits > will be ignored. > If the checkpoint hasn't happen then the namenode will be started with the > exact transaction id. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13442: -- Attachment: HDFS-13442-HDFS-7240.002.patch > Ozone: Handle Datanode Registration failure > --- > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch, > HDFS-13442-HDFS-7240.002.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13372) New Expunge Replica Trash Client-Namenode-Protocol
[ https://issues.apache.org/jira/browse/HDFS-13372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446270#comment-16446270 ] Hanisha Koneru commented on HDFS-13372: --- Thanks for working on this [~bharatviswa]. The patch LGTM overall. Just two NITs: # In {{DFSAdmin#expungeReplicaTrash} there is a typo in the System.out message -> "operation is queued and will be sent to -in- datanodes". # In {{RouterRpcServer}}, can you please remove the space between {{TO DO}} and add a description on what the todo is. Just so that it shows up when viewing TODO items in editor tool window. Will trigger Jenkins manually if it doesn't run next time as well. > New Expunge Replica Trash Client-Namenode-Protocol > -- > > Key: HDFS-13372 > URL: https://issues.apache.org/jira/browse/HDFS-13372 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13372-HDFS-12996.00.patch > > > When client issues an expunge replica-trash RPC call to Namenode, the > Namenode will queue > a new heartbeat command response - DN_EXPUNGE directing the DataNodes to > expunge the > replica-trash. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13373) Handle expunge command on NN and DN
[ https://issues.apache.org/jira/browse/HDFS-13373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446358#comment-16446358 ] Hanisha Koneru commented on HDFS-13373: --- Thanks for the patch [~bharatviswa]. # In {{PBHelper#convert(ReplicaTrashCommandProto)}}, the switch case should have a default case which throws AssertionError (please refer to {{BlockCommandProto convert(BlockCommand cmd)}}). # In {{BPOfferService#processCommandFromActive}}, can we move the check whether the command is ReplicaTrashCommand inside the switch case so as to avoid making this check for every Datanode command. {code} final ReplicaTrashCommand rcmd = cmd instanceof ReplicaTrashCommand ? (ReplicaTrashCommand)cmd : null; {code} # We should handle the case where blocks are deleted and moved to Replica Trash after the expunge command is issued. These new blocks should not be removed from the Replica Trash. One option is to note down the timestamp when expunge command was received. All invalidated blocks moved to replica trash after this timestamp should not be expunged. > Handle expunge command on NN and DN > --- > > Key: HDFS-13373 > URL: https://issues.apache.org/jira/browse/HDFS-13373 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13373-HDFS-12996.00.patch > > > When DataNodes receive the DN_EXPUNGE command from Namenode, they will > purge all the block replicas in replica-trash -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13373) Handle expunge command on NN and DN
[ https://issues.apache.org/jira/browse/HDFS-13373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446358#comment-16446358 ] Hanisha Koneru edited comment on HDFS-13373 at 4/20/18 9:00 PM: Thanks for the patch [~bharatviswa]. # In {{PBHelper#convert(ReplicaTrashCommandProto)}}, the switch case should have a default case which throws AssertionError (please refer to {{BlockCommandProto convert(BlockCommand cmd)}}). # In {{BPOfferService#processCommandFromActive}}, can we move the check whether the command is ReplicaTrashCommand inside the switch case so as to avoid making this check for every Datanode command. {code} final ReplicaTrashCommand rcmd = cmd instanceof ReplicaTrashCommand ? (ReplicaTrashCommand)cmd : null; {code} # We should handle the case where blocks are deleted and moved to Replica Trash after the expunge command is issued. These new blocks should not be removed from the Replica Trash. One option is to note down the timestamp when expunge command was received. All invalidated blocks moved to replica trash after this timestamp will not be expunged. was (Author: hanishakoneru): Thanks for the patch [~bharatviswa]. # In {{PBHelper#convert(ReplicaTrashCommandProto)}}, the switch case should have a default case which throws AssertionError (please refer to {{BlockCommandProto convert(BlockCommand cmd)}}). # In {{BPOfferService#processCommandFromActive}}, can we move the check whether the command is ReplicaTrashCommand inside the switch case so as to avoid making this check for every Datanode command. {code} final ReplicaTrashCommand rcmd = cmd instanceof ReplicaTrashCommand ? (ReplicaTrashCommand)cmd : null; {code} # We should handle the case where blocks are deleted and moved to Replica Trash after the expunge command is issued. These new blocks should not be removed from the Replica Trash. One option is to note down the timestamp when expunge command was received. All invalidated blocks moved to replica trash after this timestamp should not be expunged. > Handle expunge command on NN and DN > --- > > Key: HDFS-13373 > URL: https://issues.apache.org/jira/browse/HDFS-13373 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13373-HDFS-12996.00.patch > > > When DataNodes receive the DN_EXPUNGE command from Namenode, they will > purge all the block replicas in replica-trash -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444626#comment-16444626 ] Hanisha Koneru commented on HDFS-13442: --- In patch v02, I just changed the config name and updated \{{StorageContainerDatanodeProtocol.proto}} to add a new ErrorCode - {{nodeAlreadyRegistered}} > Ozone: Handle Datanode Registration failure > --- > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch, > HDFS-13442-HDFS-7240.002.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441526#comment-16441526 ] Hanisha Koneru commented on HDFS-13442: --- Thanks for the review [~anu]. This patch only modifies the case when we get _errorNodeNotPermitted_. This happens when the node is able to contact the SCM but SCM does not register the node. {quote}if the data nodes boot up earlier than SCM we would not want the data nodes to do silent after 10 tries {quote} In this case, the datanode keeps retrying as the EndPointTask state remains as {{HEARTBEAT}}. In the code snippet below, if the datanode does not get a response from SCM, it catches the exception and logs it, if needed. {code:java} try { SCMRegisteredCmdResponseProto response = rpcEndPoint.getEndPoint() .register(datanodeDetails.getProtoBufMessage(), conf.getStrings(ScmConfigKeys.OZONE_SCM_NAMES)); ... ... processResponse(response); } catch (IOException ex) { rpcEndPoint.logIfNeeded(ex); } {code} {quote}also in the case, we get the error, errorNodeNotPermitted, should we shut down the data node and create some kind of error record on SCM so we can get that info back from SCM? I am also ok with the current approach where we will let the system slowly go time out. {quote} I think we should let the DN make a few retries before shutting it down. > Ozone: Handle Datanode Registration failure > --- > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441526#comment-16441526 ] Hanisha Koneru edited comment on HDFS-13442 at 4/17/18 9:34 PM: Thanks for the review [~anu]. This patch only modifies the case when we get _errorNodeNotPermitted_. This happens when the node is able to contact the SCM but SCM does not register the node. {quote}if the data nodes boot up earlier than SCM we would not want the data nodes to do silent after 10 tries {quote} In this case, the datanode keeps retrying as the EndPointTask state remains as {{REGISTER}}. In the code snippet below, if the datanode does not get a response from SCM, it catches the exception and logs it, if needed. {code:java} try { SCMRegisteredCmdResponseProto response = rpcEndPoint.getEndPoint() .register(datanodeDetails.getProtoBufMessage(), conf.getStrings(ScmConfigKeys.OZONE_SCM_NAMES)); ... ... processResponse(response); } catch (IOException ex) { rpcEndPoint.logIfNeeded(ex); } {code} {quote}also in the case, we get the error, errorNodeNotPermitted, should we shut down the data node and create some kind of error record on SCM so we can get that info back from SCM? I am also ok with the current approach where we will let the system slowly go time out. {quote} I think we should let the DN make a few retries before shutting it down. was (Author: hanishakoneru): Thanks for the review [~anu]. This patch only modifies the case when we get _errorNodeNotPermitted_. This happens when the node is able to contact the SCM but SCM does not register the node. {quote}if the data nodes boot up earlier than SCM we would not want the data nodes to do silent after 10 tries {quote} In this case, the datanode keeps retrying as the EndPointTask state remains as {{HEARTBEAT}}. In the code snippet below, if the datanode does not get a response from SCM, it catches the exception and logs it, if needed. {code:java} try { SCMRegisteredCmdResponseProto response = rpcEndPoint.getEndPoint() .register(datanodeDetails.getProtoBufMessage(), conf.getStrings(ScmConfigKeys.OZONE_SCM_NAMES)); ... ... processResponse(response); } catch (IOException ex) { rpcEndPoint.logIfNeeded(ex); } {code} {quote}also in the case, we get the error, errorNodeNotPermitted, should we shut down the data node and create some kind of error record on SCM so we can get that info back from SCM? I am also ok with the current approach where we will let the system slowly go time out. {quote} I think we should let the DN make a few retries before shutting it down. > Ozone: Handle Datanode Registration failure > --- > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
[ https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444977#comment-16444977 ] Hanisha Koneru commented on HDFS-13477: --- Thanks for the patch [~ajayydv]. The patch LGTM overall. In case the httpServer start fails, should we add the httpServer as a Service port to KSM services in {{KeySpaceManager#getServiceList()}}? > Httpserver start failure should be non fatal for KSM and SCM startup > > > Key: HDFS-13477 > URL: https://issues.apache.org/jira/browse/HDFS-13477 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13477-HDFS-7240.00.patch > > > Currently KSM and SCM startup will fail if corresponding HttpServer fails > with some Exception. HttpServer is not essential for operations of KSM and > SCM so we should allow them to start even if httpServer fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13277) Improve move to Replica trash to limit trash sub-dir size
[ https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13277: -- Summary: Improve move to Replica trash to limit trash sub-dir size (was: Improve move to account for usage (number of files) to limit trash dir size) > Improve move to Replica trash to limit trash sub-dir size > - > > Key: HDFS-13277 > URL: https://issues.apache.org/jira/browse/HDFS-13277 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13277-HDFS-12996.00.patch, > HDFS-13277-HDFS-12996.01.patch, HDFS-13277-HDFS-12996.02.patch, > HDFS-13277-HDFS-12996.03.patch, HDFS-13277-HDFS-12996.04.patch, > HDFS-13277-HDFS-12996.06.patch > > > The trash subdirectory maximum entries. This puts an upper limit on the size > of subdirectories in replica-trash. Set this default value to > blockinvalidateLimit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13329) Add/ Update disk space counters for trash (trash used, disk remaining etc.)
[ https://issues.apache.org/jira/browse/HDFS-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424760#comment-16424760 ] Hanisha Koneru edited comment on HDFS-13329 at 4/3/18 11:39 PM: Thanks for working on this, [~bharatviswa]. Looks good overall. I have a some comments: # Can you add Javadoc and License to the {color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}. # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and the excludedPath and then subtracting the later from the former. We end up calculating the space used by replica trash twice this way. {code:java} setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 1024));{code} we could instead utilized the {{--exclude}} option of {{du}} command. Also, can we add the exclude option to {{DU.java}} itself instead of another class? I am not sure how complicated that would get though. I am ok with this approach too. # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be consistent with the naming. # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo. {code:java} assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize + slack)); {code} Should have been {code:java} du <= (writtenSize + slack) {code} # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters after the {{DFSRemaining%}} counter. # In {{DFSConfigKeys}}, {code:java} public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT = "dfs.datanode.replica.trash.keep.alive.interval"; {code} The value for the config parameter is mistyped. # In {{BlockPoolSlice}}, ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used. ** In {{loadReplicaTrashUsed}}, if we are using separate {{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we should have separate Cache files too. # {{FsVolumeImpl#replicaTrashLimit}} variable can be final. # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number of blocks count in the BP. # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled or not and send the report accordingly? was (Author: hanishakoneru): Thanks for working on this, [~bharatviswa]. Looks good overall. I have a few comments: # Can you add Javadoc and License to the {color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}. # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and the excludedPath and then subtracting the later from the former. We end up calculating the space used by replica trash twice this way. {code:java} setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 1024));{code} we could instead utilized the {{--exclude}} option of {{du}} command. Also, can we add the exclude option to {{DU.java}} itself instead of another class? I am not sure how complicated that would get though. I am ok with this approach too. # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be consistent with the naming. # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo. {code:java} assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize + slack)); {code} Should have been {code:java} du <= (writtenSize + slack) {code} # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters after the {{DFSRemaining%}} counter. # In {{DFSConfigKeys}}, {code:java} public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT = "dfs.datanode.replica.trash.keep.alive.interval"; {code} The value for the config parameter is mistyped. # In {{BlockPoolSlice}}, ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used. ** In {{loadReplicaTrashUsed}}, if we are using separate {{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we should have separate Cache files too. # {{FsVolumeImpl#replicaTrashLimit}} variable can be final. # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number of blocks count in the BP. # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled or not and send the report accordingly? > Add/ Update disk space counters for trash (trash used, disk remaining etc.) > > > Key: HDFS-13329 > URL: https://issues.apache.org/jira/browse/HDFS-13329 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13329-HDFS-12996.01.patch, > HDFS-13329-HDFS-12996.02.patch > > > Add 3 more counters required for datanode replica trash. > # diskAvailable > #
[jira] [Comment Edited] (HDFS-13329) Add/ Update disk space counters for trash (trash used, disk remaining etc.)
[ https://issues.apache.org/jira/browse/HDFS-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424760#comment-16424760 ] Hanisha Koneru edited comment on HDFS-13329 at 4/3/18 11:41 PM: Thanks for working on this, [~bharatviswa]. Looks good overall. I have a some comments: 1. Can you add Javadoc and License to the {color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}. 2. In {{DUWithExclude}}, we are calculating the {{du}} for both the path and the excludedPath and then subtracting the later from the former. We end up calculating the space used by replica trash twice this way. {code:java} setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 1024));{code} we could instead utilized the {{--exclude}} option of {{du}} command. Also, can we add the exclude option to {{DU.java}} itself instead of another class? I am not sure how complicated that would get though. I am ok with this approach too. 3. Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be consistent with the naming. 4. In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo. {code:java} assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize + slack)); {code} Should have been {code:java} du <= (writtenSize + slack) {code} 5. In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters after the {{DFSRemaining%}} counter. 6. In {{DFSConfigKeys}}, {code:java} public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT = "dfs.datanode.replica.trash.keep.alive.interval"; {code} The value for the config parameter is mistyped. 7. In {{BlockPoolSlice}}, ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used. ** In {{loadReplicaTrashUsed}}, if we are using separate {{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we should have separate Cache files too. 8. {{FsVolumeImpl#replicaTrashLimit}} variable can be final. 9. In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number of blocks count in the BP. 10. {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled or not and send the report accordingly? was (Author: hanishakoneru): Thanks for working on this, [~bharatviswa]. Looks good overall. I have a some comments: # Can you add Javadoc and License to the {color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}. # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and the excludedPath and then subtracting the later from the former. We end up calculating the space used by replica trash twice this way. {code:java} setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 1024));{code} we could instead utilized the {{--exclude}} option of {{du}} command. Also, can we add the exclude option to {{DU.java}} itself instead of another class? I am not sure how complicated that would get though. I am ok with this approach too. # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be consistent with the naming. # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo. {code:java} assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize + slack)); {code} Should have been {code:java} du <= (writtenSize + slack) {code} # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters after the {{DFSRemaining%}} counter. # In {{DFSConfigKeys}}, {code:java} public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT = "dfs.datanode.replica.trash.keep.alive.interval"; {code} The value for the config parameter is mistyped. # In {{BlockPoolSlice}}, ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used. ** In {{loadReplicaTrashUsed}}, if we are using separate {{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we should have separate Cache files too. # {{FsVolumeImpl#replicaTrashLimit}} variable can be final. # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number of blocks count in the BP. # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled or not and send the report accordingly? > Add/ Update disk space counters for trash (trash used, disk remaining etc.) > > > Key: HDFS-13329 > URL: https://issues.apache.org/jira/browse/HDFS-13329 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13329-HDFS-12996.01.patch, > HDFS-13329-HDFS-12996.02.patch > > > Add 3 more counters required for datanode replica trash. > # diskAvailable > #
[jira] [Comment Edited] (HDFS-13329) Add/ Update disk space counters for trash (trash used, disk remaining etc.)
[ https://issues.apache.org/jira/browse/HDFS-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424760#comment-16424760 ] Hanisha Koneru edited comment on HDFS-13329 at 4/3/18 11:39 PM: Thanks for working on this, [~bharatviswa]. Looks good overall. I have a some comments: # Can you add Javadoc and License to the {color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}. # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and the excludedPath and then subtracting the later from the former. We end up calculating the space used by replica trash twice this way. {code:java} setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 1024));{code} we could instead utilized the {{--exclude}} option of {{du}} command. Also, can we add the exclude option to {{DU.java}} itself instead of another class? I am not sure how complicated that would get though. I am ok with this approach too. # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be consistent with the naming. # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo. {code:java} assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize + slack)); {code} Should have been {code:java} du <= (writtenSize + slack) {code} # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters after the {{DFSRemaining%}} counter. # In {{DFSConfigKeys}}, {code:java} public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT = "dfs.datanode.replica.trash.keep.alive.interval"; {code} The value for the config parameter is mistyped. # In {{BlockPoolSlice}}, ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used. ** In {{loadReplicaTrashUsed}}, if we are using separate {{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we should have separate Cache files too. # {{FsVolumeImpl#replicaTrashLimit}} variable can be final. # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number of blocks count in the BP. # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled or not and send the report accordingly? was (Author: hanishakoneru): Thanks for working on this, [~bharatviswa]. Looks good overall. I have a some comments: # Can you add Javadoc and License to the {color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}. # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and the excludedPath and then subtracting the later from the former. We end up calculating the space used by replica trash twice this way. {code:java} setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 1024));{code} we could instead utilized the {{--exclude}} option of {{du}} command. Also, can we add the exclude option to {{DU.java}} itself instead of another class? I am not sure how complicated that would get though. I am ok with this approach too. # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be consistent with the naming. # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo. {code:java} assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize + slack)); {code} Should have been {code:java} du <= (writtenSize + slack) {code} # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters after the {{DFSRemaining%}} counter. # In {{DFSConfigKeys}}, {code:java} public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT = "dfs.datanode.replica.trash.keep.alive.interval"; {code} The value for the config parameter is mistyped. # In {{BlockPoolSlice}}, ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used. ** In {{loadReplicaTrashUsed}}, if we are using separate {{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we should have separate Cache files too. # {{FsVolumeImpl#replicaTrashLimit}} variable can be final. # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number of blocks count in the BP. # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled or not and send the report accordingly? > Add/ Update disk space counters for trash (trash used, disk remaining etc.) > > > Key: HDFS-13329 > URL: https://issues.apache.org/jira/browse/HDFS-13329 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13329-HDFS-12996.01.patch, > HDFS-13329-HDFS-12996.02.patch > > > Add 3 more counters required for datanode replica trash. > # diskAvailable > #
[jira] [Commented] (HDFS-13329) Add/ Update disk space counters for trash (trash used, disk remaining etc.)
[ https://issues.apache.org/jira/browse/HDFS-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424760#comment-16424760 ] Hanisha Koneru commented on HDFS-13329: --- Thanks for working on this, [~bharatviswa]. Looks good overall. I have a few comments: # Can you add Javadoc and License to the {color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}. # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and the excludedPath and then subtracting the later from the former. We end up calculating the space used by replica trash twice this way. {code:java} setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 1024));{code} we could instead utilized the {{--exclude}} option of {{du}} command. Also, can we add the exclude option to {{DU.java}} itself instead of another class? I am not sure how complicated that would get though. I am ok with this approach too. # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be consistent with the naming. # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo. {code:java} assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize + slack)); {code} Should have been {code:java} du <= (writtenSize + slack) {code} # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters after the {{DFSRemaining%}} counter. # In {{DFSConfigKeys}}, {code:java} public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT = "dfs.datanode.replica.trash.keep.alive.interval"; {code} The value for the config parameter is mistyped. # In {{BlockPoolSlice}}, ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used. ** In {{loadReplicaTrashUsed}}, if we are using separate {{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we should have separate Cache files too. # {{FsVolumeImpl#replicaTrashLimit}} variable can be final. # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number of blocks count in the BP. # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled or not and send the report accordingly? > Add/ Update disk space counters for trash (trash used, disk remaining etc.) > > > Key: HDFS-13329 > URL: https://issues.apache.org/jira/browse/HDFS-13329 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13329-HDFS-12996.01.patch, > HDFS-13329-HDFS-12996.02.patch > > > Add 3 more counters required for datanode replica trash. > # diskAvailable > # replicaTrashUsed > # replicaTrashRemaining > For more info on these counters, refer design document uploaded in HDFS-12996 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13277) Improve move to Replica trash to limit trash sub-dir size
[ https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13277: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Improve move to Replica trash to limit trash sub-dir size > - > > Key: HDFS-13277 > URL: https://issues.apache.org/jira/browse/HDFS-13277 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13277-HDFS-12996.00.patch, > HDFS-13277-HDFS-12996.01.patch, HDFS-13277-HDFS-12996.02.patch, > HDFS-13277-HDFS-12996.03.patch, HDFS-13277-HDFS-12996.04.patch, > HDFS-13277-HDFS-12996.06.patch > > > The trash subdirectory maximum entries. This puts an upper limit on the size > of subdirectories in replica-trash. Set this default value to > blockinvalidateLimit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13277) Improve move to Replica trash to limit trash sub-dir size
[ https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414265#comment-16414265 ] Hanisha Koneru commented on HDFS-13277: --- Test failures are unrelated. Committed to branch HDFS-12996. Thank you [~bharatviswa] for the contribution and [~ajayydv] for the review. > Improve move to Replica trash to limit trash sub-dir size > - > > Key: HDFS-13277 > URL: https://issues.apache.org/jira/browse/HDFS-13277 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13277-HDFS-12996.00.patch, > HDFS-13277-HDFS-12996.01.patch, HDFS-13277-HDFS-12996.02.patch, > HDFS-13277-HDFS-12996.03.patch, HDFS-13277-HDFS-12996.04.patch, > HDFS-13277-HDFS-12996.06.patch > > > The trash subdirectory maximum entries. This puts an upper limit on the size > of subdirectories in replica-trash. Set this default value to > blockinvalidateLimit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13277) Improve move to account for usage (number of files) to limit trash dir size
[ https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407224#comment-16407224 ] Hanisha Koneru edited comment on HDFS-13277 at 3/20/18 11:29 PM: - Thanks for the patch [~bharatviswa]. LGTM overall (still have to review unit test). Just have a few very minor comments: # In {{ReplicaTrashInfo}}, it would be good to rename {{entries}} to {{numBlocks}} as we are tracking the number of blocks in a subDir. # Can you rename {{curDir}} to indicate that it is the current ReplicaTrash subdir. Maybe {{curSubDir}} or {{curReplicaTrashSubDir?}} was (Author: hanishakoneru): Thanks for the patch [~bharatviswa]. LGTM overall (still have to review unit test). Just have a few very minor comments: # In {{ReplicaTrashInfo}}, it would be good to rename {{entries}} to {{numBlocks}} as we are tracking the number of blocks in a subDir. # Can you rename \{{curDir}} to indicate that it is the current ReplicaTrash subdir. So that it is not confused with the current directory of the block pool. Maybe {{curSubDir}} or {{curReplicaTrashSubDir?}}{{}} > Improve move to account for usage (number of files) to limit trash dir size > --- > > Key: HDFS-13277 > URL: https://issues.apache.org/jira/browse/HDFS-13277 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13277-HDFS-12996.00.patch, > HDFS-13277-HDFS-12996.01.patch > > > The trash subdirectory maximum entries. This puts an upper limit on the size > of subdirectories in replica-trash. Set this default value to > blockinvalidateLimit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13277) Improve move to account for usage (number of files) to limit trash dir size
[ https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407224#comment-16407224 ] Hanisha Koneru commented on HDFS-13277: --- Thanks for the patch [~bharatviswa]. LGTM overall (still have to review unit test). Just have a few very minor comments: # In {{ReplicaTrashInfo}}, it would be good to rename {{entries}} to {{numBlocks}} as we are tracking the number of blocks in a subDir. # Can you rename \{{curDir}} to indicate that it is the current ReplicaTrash subdir. So that it is not confused with the current directory of the block pool. Maybe {{curSubDir}} or {{curReplicaTrashSubDir?}}{{}} > Improve move to account for usage (number of files) to limit trash dir size > --- > > Key: HDFS-13277 > URL: https://issues.apache.org/jira/browse/HDFS-13277 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13277-HDFS-12996.00.patch, > HDFS-13277-HDFS-12996.01.patch > > > The trash subdirectory maximum entries. This puts an upper limit on the size > of subdirectories in replica-trash. Set this default value to > blockinvalidateLimit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13277) Improve move to account for usage (number of files) to limit trash dir size
[ https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410218#comment-16410218 ] Hanisha Koneru commented on HDFS-13277: --- Thanks [~bharatviswa]. The unit test LGTM overall. * Before iterating over the \{{locations}}, can you add an assert that the number of locations is 1 as {{storagesPerDatanode}} is set to 1 (or a comment). > Improve move to account for usage (number of files) to limit trash dir size > --- > > Key: HDFS-13277 > URL: https://issues.apache.org/jira/browse/HDFS-13277 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13277-HDFS-12996.00.patch, > HDFS-13277-HDFS-12996.01.patch > > > The trash subdirectory maximum entries. This puts an upper limit on the size > of subdirectories in replica-trash. Set this default value to > blockinvalidateLimit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13277) Improve move to account for usage (number of files) to limit trash dir size
[ https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412104#comment-16412104 ] Hanisha Koneru commented on HDFS-13277: --- Thanks for updating the patch , [~bharatviswa]. Patch v06 LGTM. +1 pending Jenkins. > Improve move to account for usage (number of files) to limit trash dir size > --- > > Key: HDFS-13277 > URL: https://issues.apache.org/jira/browse/HDFS-13277 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13277-HDFS-12996.00.patch, > HDFS-13277-HDFS-12996.01.patch, HDFS-13277-HDFS-12996.02.patch, > HDFS-13277-HDFS-12996.03.patch, HDFS-13277-HDFS-12996.04.patch, > HDFS-13277-HDFS-12996.06.patch > > > The trash subdirectory maximum entries. This puts an upper limit on the size > of subdirectories in replica-trash. Set this default value to > blockinvalidateLimit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13277) Improve move to account for usage (number of files) to limit trash dir size
[ https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411761#comment-16411761 ] Hanisha Koneru commented on HDFS-13277: --- Thanks for the update Bharat. I am sorry I missed this earlier. The default value for {{max.blocks}} is being set to the default value of {{block.invalidate.limit}}. It should be set to the configured value of this limit instead. Also, in {{hdfs-default.xml}}, we need to mention that if the new parameter is not set, it would take the value of the parameter {{dfs.block.invalidate.limit}}. NITs:{color:#3b73af} {color} # {color:#3b73af}{{{color}FsDatasetAsyncDiskService# L104}} has "information" twice in the comment. # It might be good to avoid abbreviations in hdfs-default.xml as it would be reflected in the docs (referring to no for number). > Improve move to account for usage (number of files) to limit trash dir size > --- > > Key: HDFS-13277 > URL: https://issues.apache.org/jira/browse/HDFS-13277 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDFS-13277-HDFS-12996.00.patch, > HDFS-13277-HDFS-12996.01.patch, HDFS-13277-HDFS-12996.02.patch, > HDFS-13277-HDFS-12996.03.patch, HDFS-13277-HDFS-12996.04.patch > > > The trash subdirectory maximum entries. This puts an upper limit on the size > of subdirectories in replica-trash. Set this default value to > blockinvalidateLimit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13148) Unit test for EZ with KMS and Federation
[ https://issues.apache.org/jira/browse/HDFS-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13148: -- Attachment: HDFS-13148.002.patch > Unit test for EZ with KMS and Federation > > > Key: HDFS-13148 > URL: https://issues.apache.org/jira/browse/HDFS-13148 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13148.001.patch, HDFS-13148.002.patch > > > It would be good to have some unit tests for testing KMS and EZ on a > federated cluster. We can start with basic EZ operations. For example, create > EZs on two namespaces with different keys using one KMS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13148) Unit test for EZ with KMS and Federation
[ https://issues.apache.org/jira/browse/HDFS-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386677#comment-16386677 ] Hanisha Koneru commented on HDFS-13148: --- Thanks for the review [~xyao]. Addressed all the comments in patch v02. I did not fix all the checkstyle warnings as the link has expired. I will fix it after the next jenkins run. > Unit test for EZ with KMS and Federation > > > Key: HDFS-13148 > URL: https://issues.apache.org/jira/browse/HDFS-13148 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13148.001.patch, HDFS-13148.002.patch > > > It would be good to have some unit tests for testing KMS and EZ on a > federated cluster. We can start with basic EZ operations. For example, create > EZs on two namespaces with different keys using one KMS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11394) Add method for getting erasure coding policy through WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382905#comment-16382905 ] Hanisha Koneru commented on HDFS-11394: --- Hi [~lewuathe], are you planning to continue working on this Jira? If not, I would like to take it up. Please let me know. > Add method for getting erasure coding policy through WebHDFS > - > > Key: HDFS-11394 > URL: https://issues.apache.org/jira/browse/HDFS-11394 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, namenode >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Major > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-11394.01.patch, HDFS-11394.02.patch, > HDFS-11394.03.patch, HDFS-11394.04.patch > > > We can expose erasure coding policy by erasure coded directory through > WebHDFS method as well as storage policy. This information can be used by > NameNode Web UI and show the detail of erasure coded directories. > see: HDFS-8196 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13114) CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path
[ https://issues.apache.org/jira/browse/HDFS-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379573#comment-16379573 ] Hanisha Koneru commented on HDFS-13114: --- [~xyao], yes, the ListZonesCommand and ListReencryptionStatusCommand work as expected with the -fs command (without the fix too). > CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path > > > Key: HDFS-13114 > URL: https://issues.apache.org/jira/browse/HDFS-13114 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13114.001.patch > > > The {{crypto -reencryptZone -path }} command takes in a path > argument. But when creating {{HdfsAdmin}} object, it takes the defaultFs > instead of resolving from the path. This causes the following exception if > the authority component in path does not match the authority of default Fs. > {code:java} > $ hdfs crypto -reencryptZone -start -path hdfs://mycluster-node-1:8020/zone1 > IllegalArgumentException: Wrong FS: hdfs://mycluster-node-1:8020/zone1, > expected: hdfs://ns1{code} > {code:java} > $ hdfs crypto -reencryptZone -start -path hdfs://ns2/zone2 > IllegalArgumentException: Wrong FS: hdfs://ns2/zone2, expected: > hdfs://ns1{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-183) Integrate Volumeset, ContainerSet and HddsDispatcher
[ https://issues.apache.org/jira/browse/HDDS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-183: Status: Patch Available (was: Open) > Integrate Volumeset, ContainerSet and HddsDispatcher > > > Key: HDDS-183 > URL: https://issues.apache.org/jira/browse/HDDS-183 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-183-HDDS-48.00.patch, HDDS-183-HDDS-48.01.patch, > HDDS-183-HDDS-48.02.patch, HDDS-183-HDDS-48.03.patch, > HDDS-183-HDDS-48.04.patch > > > This Jira adds following: > 1. Use new VolumeSet. > 2. build container map from .container files during startup. > 3. Integrate HddsDispatcher. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-173) Refactor Dispatcher and implement Handler for new ContainerIO design
[ https://issues.apache.org/jira/browse/HDDS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526515#comment-16526515 ] Hanisha Koneru commented on HDDS-173: - Hi [~xyao], [~bharatviswa] The compile failure looks like a Jenkins issue. It compiles successfully for me locally. There are a couple of Findbug errors which I will fix in HDDS-182. And will fix the unit test along with integration tests. Can we go ahead with committing patch v005? > Refactor Dispatcher and implement Handler for new ContainerIO design > > > Key: HDDS-173 > URL: https://issues.apache.org/jira/browse/HDDS-173 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-173-HDDS-48.001.patch, HDDS-173-HDDS-48.002.patch, > HDDS-173-HDDS-48.003.patch, HDDS-173-HDDS-48.004.patch, > HDDS-173-HDDS-48.005.patch > > > Dispatcher will pass the ContainerCommandRequests to the corresponding > Handler based on the ContainerType. Each ContainerType will have its own > Handler. The Handler class will process the message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-173) Refactor Dispatcher and implement Handler for new ContainerIO design
[ https://issues.apache.org/jira/browse/HDDS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-173: Description: HddsDispatcher will pass the ContainerCommandRequests to the corresponding Handler based on the ContainerType. Each ContainerType will have its own Handler. The Handler class will process the message. Current Dispatcher will be replaced by HddsDispatcher in HDDS-183. was:Dispatcher will pass the ContainerCommandRequests to the corresponding Handler based on the ContainerType. Each ContainerType will have its own Handler. The Handler class will process the message. > Refactor Dispatcher and implement Handler for new ContainerIO design > > > Key: HDDS-173 > URL: https://issues.apache.org/jira/browse/HDDS-173 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-173-HDDS-48.001.patch, HDDS-173-HDDS-48.002.patch, > HDDS-173-HDDS-48.003.patch, HDDS-173-HDDS-48.004.patch, > HDDS-173-HDDS-48.005.patch > > > HddsDispatcher will pass the ContainerCommandRequests to the corresponding > Handler based on the ContainerType. Each ContainerType will have its own > Handler. The Handler class will process the message. > Current Dispatcher will be replaced by HddsDispatcher in HDDS-183. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-183) Integrate Volumeset, ContainerSet and HddsDispatcher
[ https://issues.apache.org/jira/browse/HDDS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-183: Status: Open (was: Patch Available) > Integrate Volumeset, ContainerSet and HddsDispatcher > > > Key: HDDS-183 > URL: https://issues.apache.org/jira/browse/HDDS-183 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-183-HDDS-48.00.patch, HDDS-183-HDDS-48.01.patch, > HDDS-183-HDDS-48.02.patch, HDDS-183-HDDS-48.03.patch, > HDDS-183-HDDS-48.04.patch > > > This Jira adds following: > 1. Use new VolumeSet. > 2. build container map from .container files during startup. > 3. Integrate HddsDispatcher. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-173) Refactor Dispatcher and implement Handler for new ContainerIO design
[ https://issues.apache.org/jira/browse/HDDS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526587#comment-16526587 ] Hanisha Koneru commented on HDDS-173: - Thank you [~bharatviswa] and [~xyao] for the reviews. Committed this to feature branch. > Refactor Dispatcher and implement Handler for new ContainerIO design > > > Key: HDDS-173 > URL: https://issues.apache.org/jira/browse/HDDS-173 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-173-HDDS-48.001.patch, HDDS-173-HDDS-48.002.patch, > HDDS-173-HDDS-48.003.patch, HDDS-173-HDDS-48.004.patch, > HDDS-173-HDDS-48.005.patch > > > HddsDispatcher will pass the ContainerCommandRequests to the corresponding > Handler based on the ContainerType. Each ContainerType will have its own > Handler. The Handler class will process the message. > Current Dispatcher will be replaced by HddsDispatcher in HDDS-183. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-173) Refactor Dispatcher and implement Handler for new ContainerIO design
[ https://issues.apache.org/jira/browse/HDDS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-173: Resolution: Fixed Status: Resolved (was: Patch Available) > Refactor Dispatcher and implement Handler for new ContainerIO design > > > Key: HDDS-173 > URL: https://issues.apache.org/jira/browse/HDDS-173 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-173-HDDS-48.001.patch, HDDS-173-HDDS-48.002.patch, > HDDS-173-HDDS-48.003.patch, HDDS-173-HDDS-48.004.patch, > HDDS-173-HDDS-48.005.patch > > > HddsDispatcher will pass the ContainerCommandRequests to the corresponding > Handler based on the ContainerType. Each ContainerType will have its own > Handler. The Handler class will process the message. > Current Dispatcher will be replaced by HddsDispatcher in HDDS-183. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-183) Integrate Volumeset, ContainerSet and HddsDispatcher
[ https://issues.apache.org/jira/browse/HDDS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526773#comment-16526773 ] Hanisha Koneru commented on HDDS-183: - +1 for patch v04 contingent upon addressing the other Findbug errors in cleanup Jira along with integration test fixes. > Integrate Volumeset, ContainerSet and HddsDispatcher > > > Key: HDDS-183 > URL: https://issues.apache.org/jira/browse/HDDS-183 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 0.2.1 > > Attachments: HDDS-183-HDDS-48.00.patch, HDDS-183-HDDS-48.01.patch, > HDDS-183-HDDS-48.02.patch, HDDS-183-HDDS-48.03.patch, > HDDS-183-HDDS-48.04.patch > > > This Jira adds following: > 1. Use new VolumeSet. > 2. build container map from .container files during startup. > 3. Integrate HddsDispatcher. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-289) While creating bucket everything after '/' is ignored without any warning
[ https://issues.apache.org/jira/browse/HDDS-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631044#comment-16631044 ] Hanisha Koneru commented on HDDS-289: - Thanks for working on this [~candychencan]. Patch LGTM overall. A few comments: # PutKey should allow "/" in the key name. We can create keys using {{ozone fs}} (and they can have "/" in the keyname). So {{ozone sh key}} should also allow keys with a "/" in them. # The error message "Path ... too long in ..." is ambiguous. Can we expand it to say something like "Invalid bucket name.Delimiters ("/") not allowed in bucket name" # A minor NIT: Most of the handlers already have a check with respect to path.getNameCount(). We could probably optimize by combining them. Something like below: {code:java} int pathNameCount = path.getNameCount(); if (pathNameCount != 2) { String errorMessage; if (pathNameCount < 2) { errorMessage = "volume and bucket name required in createBucket"; } else { errorMessage = "invalid bucket name. Delimiters (/) not allowed in " + "bucket name"; } throw new OzoneClientException(errorMessage); } {code} > While creating bucket everything after '/' is ignored without any warning > - > > Key: HDDS-289 > URL: https://issues.apache.org/jira/browse/HDDS-289 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.2.1 >Reporter: Namit Maheshwari >Assignee: chencan >Priority: Major > Labels: newbie > Attachments: HDDS-289.001.patch, HDDS-289.002.patch, > HDDS-289.003.patch > > > Please see below example. Here the user issues command to create bucket like > below. Where /namit is the volume. > {code} > hadoop@288c0999be17:~$ ozone oz -createBucket /namit/hjk/fgh > 2018-07-24 00:30:52 WARN NativeCodeLoader:60 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2018-07-24 00:30:52 INFO RpcClient:337 - Creating Bucket: namit/hjk, with > Versioning false and Storage Type set to DISK > {code} > As seen above it just ignored '/fgh' > There should be a Warning / Error message instead of just ignoring everything > after a '/' -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-551) Fix the close container status check in CloseContainerCommandHandler
[ https://issues.apache.org/jira/browse/HDDS-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631154#comment-16631154 ] Hanisha Koneru commented on HDDS-551: - Thanks [~shashikant] for working on this. LGTM. +1 (There is one checkstyle issue - line longer than 80, in CloseContainerCommandHandler#83. I will fix it while committing). > Fix the close container status check in CloseContainerCommandHandler > > > Key: HDDS-551 > URL: https://issues.apache.org/jira/browse/HDDS-551 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-551.000.patch > > > If the container is already closed while retrying to close the container in a > Datanode which is not a leader, we just log the info and still submit the > close request to Ratis. Ideally, this check should be moved to > CloseContainerCommandhandler and we should just return without submitting any > request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-551) Fix the close container status check in CloseContainerCommandHandler
[ https://issues.apache.org/jira/browse/HDDS-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631158#comment-16631158 ] Hanisha Koneru commented on HDDS-551: - Committed to trunk. Thanks for the contribution [~shashikant]. > Fix the close container status check in CloseContainerCommandHandler > > > Key: HDDS-551 > URL: https://issues.apache.org/jira/browse/HDDS-551 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-551.000.patch > > > If the container is already closed while retrying to close the container in a > Datanode which is not a leader, we just log the info and still submit the > close request to Ratis. Ideally, this check should be moved to > CloseContainerCommandhandler and we should just return without submitting any > request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-551) Fix the close container status check in CloseContainerCommandHandler
[ https://issues.apache.org/jira/browse/HDDS-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-551: Resolution: Fixed Fix Version/s: 0.2.2 Status: Resolved (was: Patch Available) > Fix the close container status check in CloseContainerCommandHandler > > > Key: HDDS-551 > URL: https://issues.apache.org/jira/browse/HDDS-551 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.2.2 > > Attachments: HDDS-551.000.patch > > > If the container is already closed while retrying to close the container in a > Datanode which is not a leader, we just log the info and still submit the > close request to Ratis. Ideally, this check should be moved to > CloseContainerCommandhandler and we should just return without submitting any > request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-361) Use DBStore and TableStore for DN metadata
[ https://issues.apache.org/jira/browse/HDDS-361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640421#comment-16640421 ] Hanisha Koneru edited comment on HDDS-361 at 10/5/18 10:37 PM: --- [~ljain], thanks for working on this. The patch looks very good. I just have a few minor comments. # In {{BlockDeletingService#executeDeleteTxn()}}, if we cannot delete a block, we skip that block and continue deleting other blocks. So the actual number of blocks deleted might be less than were scheduled in the transaction. In {{BlockDeletingService#call()}}, we update the {{numBlocksDeleted}} with the count of number of blocks scheduled for deletion. {code:java} if (delTxn != null) { executeDeleteTxn(delTxn, defaultStore); // increment number of blocks deleted for the container numBlocksDeleted += delTxn.getLocalIDCount(); // if successful, txn can be removed from delete table{code} Instead, we should update {{numBlocksDeleted}} with the number of blocks actually deleted in {{executeDeleteTxn}} {code:java} if (delTxn != null) { int deletedBlocksCount = executeDeleteTxn(delTxn, defaultStore); // increment number of blocks deleted for the container numBlocksDeleted += deletedBlocksCount; // if successful, txn can be removed from delete table{code} # Also, before deleting the transaction from the Pending Deletes tables, we should verify that all the blocks in the transaction were successfully deleted. {code:java} // if successful, txn can be removed from delete table if (deletedBlocksCount == delTxn.getLocalIDCount()) { batch.delete(pendingDeletes.getHandle(), Longs.toByteArray(delTxn.getTxID())); } {code} # In {{BlockDeletingService#executeDeleteTxn}}, the null check for delTxn is redundant. We perform this check before calling the function too. # A NIT: In {{BlockManagerImpl}}, most of the functions get the DB and then get the default table. We could have this in a private function to avoid redundancy. {code:java} private Table getDefaultTable(containerData, conf) { DBStore db = BlockUtils.getDB(cData, config); return db.getTable(DEFAULT_TABLE) } {code} P.S: The patch does not apply to trunk anymore. was (Author: hanishakoneru): [~ljain], thanks for working on this. The patch looks very good. I just have a few minor comments. # In {{BlockDeletingService#executeDeleteTxn()}}, if we cannot delete a block, we skip that block and continue deleting other blocks. So the actual number of blocks deleted might be less than were scheduled in the transaction. In {{BlockDeletingService#call()}}, we update the {{numBlocksDeleted}} with the count of number of blocks scheduled for deletion. {code:java} if (delTxn != null) { executeDeleteTxn(delTxn, defaultStore); // increment number of blocks deleted for the container numBlocksDeleted += delTxn.getLocalIDCount(); // if successful, txn can be removed from delete table{code} Instead, we should update {{numBlocksDeleted}} with the number of blocks actually deleted in {{executeDeleteTxn}} {code:java} if (delTxn != null) { int deletedBlocksCount = executeDeleteTxn(delTxn, defaultStore); // increment number of blocks deleted for the container numBlocksDeleted += deletedBlocksCount; // if successful, txn can be removed from delete table{code} # Also, before deleting the transaction from the Pending Deletes tables, we should verify that all the blocks in the transaction were successfully deleted. {code:java} // if successful, txn can be removed from delete table if (deletedBlocksCount == delTxn.getLocalIDCount()) { batch.delete(pendingDeletes.getHandle(), Longs.toByteArray(delTxn.getTxID())); } {code} # In {{BlockDeletingService#executeDeleteTxn}}, the null check for delTxn is redundant. We perform this check before calling the function too. # A NIT: In {{BlockManagerImpl}}, most of the functions get the DB and then get the default table. We could have this in a private function to avoid redundancy. {code:java} private Table getDefaultTable(containerData, conf) { DBStore db = BlockUtils.getDB(cData, config); return db.getTable(DEFAULT_TABLE) } {code} P.S: The patch does not apply to trunk anymore. > Use DBStore and TableStore for DN metadata > -- > > Key: HDDS-361 > URL: https://issues.apache.org/jira/browse/HDDS-361 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Lokesh Jain >Priority: Major > Attachments: HDDS-361.001.patch, HDDS-361.002.patch > > > As part of OM performance improvement we used Tables for storing a particular > type of key value pair in the rocks db. This Jira aims to use Tables for > separating block keys and deletion transactions in the container db. -- This message was sent by Atlassian
[jira] [Commented] (HDDS-361) Use DBStore and TableStore for DN metadata
[ https://issues.apache.org/jira/browse/HDDS-361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640421#comment-16640421 ] Hanisha Koneru commented on HDDS-361: - [~ljain], thanks for working on this. The patch looks very good. I just have a few minor comments. # In {{BlockDeletingService#executeDeleteTxn()}}, if we cannot delete a block, we skip that block and continue deleting other blocks. So the actual number of blocks deleted might be less than were scheduled in the transaction. In {{BlockDeletingService#call()}}, we update the {{numBlocksDeleted}} with the count of number of blocks scheduled for deletion. {code:java} if (delTxn != null) { executeDeleteTxn(delTxn, defaultStore); // increment number of blocks deleted for the container numBlocksDeleted += delTxn.getLocalIDCount(); // if successful, txn can be removed from delete table{code} Instead, we should update {{numBlocksDeleted}} with the number of blocks actually deleted in {{executeDeleteTxn}} {code:java} if (delTxn != null) { int deletedBlocksCount = executeDeleteTxn(delTxn, defaultStore); // increment number of blocks deleted for the container numBlocksDeleted += deletedBlocksCount; // if successful, txn can be removed from delete table{code} # Also, before deleting the transaction from the Pending Deletes tables, we should verify that all the blocks in the transaction were successfully deleted. {code:java} // if successful, txn can be removed from delete table if (deletedBlocksCount == delTxn.getLocalIDCount()) { batch.delete(pendingDeletes.getHandle(), Longs.toByteArray(delTxn.getTxID())); } {code} # In {{BlockDeletingService#executeDeleteTxn}}, the null check for delTxn is redundant. We perform this check before calling the function too. # A NIT: In {{BlockManagerImpl}}, most of the functions get the DB and then get the default table. We could have this in a private function to avoid redundancy. {code:java} private Table getDefaultTable(containerData, conf) { DBStore db = BlockUtils.getDB(cData, config); return db.getTable(DEFAULT_TABLE) } {code} P.S: The patch does not apply to trunk anymore. > Use DBStore and TableStore for DN metadata > -- > > Key: HDDS-361 > URL: https://issues.apache.org/jira/browse/HDDS-361 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Lokesh Jain >Priority: Major > Attachments: HDDS-361.001.patch, HDDS-361.002.patch > > > As part of OM performance improvement we used Tables for storing a particular > type of key value pair in the rocks db. This Jira aims to use Tables for > separating block keys and deletion transactions in the container db. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-283) Need an option to list all volumes created in the cluster
[ https://issues.apache.org/jira/browse/HDDS-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642258#comment-16642258 ] Hanisha Koneru commented on HDDS-283: - Hi [~nilotpalnandi], Are you working on this Jira or planning to? If not, please let me know and I can take it up. Thanks. > Need an option to list all volumes created in the cluster > - > > Key: HDDS-283 > URL: https://issues.apache.org/jira/browse/HDDS-283 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Blocker > Labels: alpha2 > Fix For: 0.3.0 > > Attachments: HDDS-283.001.patch > > > Currently , listVolume command either gives : > 1) all the volumes created by a particular user , using -user argument. > 2) or , all the volumes created by the logged in user , if no -user argument > is provided. > > We need an option to list all the volumes created in the cluster -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-609) SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-609: Description: When SCM is restart While running a MapReduce job, we got "Allocate block failed, error:INTERNAL_ERROR". This {code:java} SCM logs{code} {code:java} 2018-10-09 23:37:28,984 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9863, call Call#101 Retry#0 org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock from 172.27.56.9:33814 org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed for allocateBlock at org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) at org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) at org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) 2018-10-09 23:37:35,232 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9863, call Call#103 Retry#0 org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock from 172.27.56.9:33814 org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed for allocateBlock at org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) at org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) at org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) 2018-10-09 23:37:42,044 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9863, call Call#105 Retry#0 org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock from 172.27.56.9:33814 org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed for allocateBlock at org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) at org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) at org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at
[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-609: Summary: On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state (was: SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state) > On restart, SCM does not exit chill mode as it expects DNs to report > containers in ALLOCATED state > -- > > Key: HDDS-609 > URL: https://issues.apache.org/jira/browse/HDDS-609 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Priority: Major > > When SCM is restart > While running a MapReduce job, we got "Allocate block failed, > error:INTERNAL_ERROR". This > {code:java} > SCM logs{code} > {code:java} > 2018-10-09 23:37:28,984 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 4 on 9863, call Call#101 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:33814 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > 2018-10-09 23:37:35,232 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 4 on 9863, call Call#103 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:33814 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > 2018-10-09 23:37:42,044 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 3 on 9863, call Call#105 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:33814 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at >
[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-609: Attachment: HDDS-609.002.patch > On restart, SCM does not exit chill mode as it expects DNs to report > containers in ALLOCATED state > -- > > Key: HDDS-609 > URL: https://issues.apache.org/jira/browse/HDDS-609 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-609.001.patch, HDDS-609.002.patch > > > Note: Updated the description to describe the root cause of the bug and moved > the error logs to comments. > On restart, SCM can exit chill mode only if it receives report of 99% > (default) of containers from the DNs. > SCM includes containers in ALLOCATED state in calculating the total number of > containers. But since ALLOCATED containers are not reported by DNs, the > calculation of percentage of reported containers is misconstrued. > {code:java} > For example, say we have 1DN in the cluster and we restart SCM. > Total number of containers in SCM ContainerMap = 20 > Containers in OPEN state = 2 > Containers in ALLOCATED state = 18 > Containers reported by DN on SCM restart = 2 > Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) > = 0.10 > {code} > We should not include the ALLOCATED containers while calculating the total > number of containers for chill mode exit rule. Otherwise, for scenarios such > as above, SCM can never come out of chill mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-609) SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-609: Summary: SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state (was: Mapreduce example fails with Allocate block failed, error:INTERNAL_ERROR) > SCM does not exit chill mode as it expects DNs to report containers in > ALLOCATED state > -- > > Key: HDDS-609 > URL: https://issues.apache.org/jira/browse/HDDS-609 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Priority: Major > > {code:java} > -bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar > /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar > wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5 > 18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History > server at ctr-e138-1518143905142-510793-01-04.hwx.site/172.27.79.197:10200 > 18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > 18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure > Coding for path: /user/hdfs/.staging/job_1539125785626_0007 > 18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1 > 18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9] > 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1 > 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1539125785626_0007 > 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: [] > 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml > at file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml > 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application > application_1539125785626_0007 > 18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: > http://ctr-e138-1518143905142-510793-01-05.hwx.site:8088/proxy/application_1539125785626_0007/ > 18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007 > 18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in > uber mode : false > 18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0% > 18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0% > 18/10/09 23:37:29 INFO mapreduce.Job: Task Id : > attempt_1539125785626_0007_r_00_0, Status : FAILED > Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) > at > org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64) > at > org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at
[jira] [Commented] (HDDS-601) On restart, SCM throws 'No such datanode' exception
[ https://issues.apache.org/jira/browse/HDDS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647018#comment-16647018 ] Hanisha Koneru commented on HDDS-601: - Thanks [~ssulav] for reporting the issue and [~anu] for the review. I have committed this to trunk and ozone-0.3 branch. > On restart, SCM throws 'No such datanode' exception > --- > > Key: HDDS-601 > URL: https://issues.apache.org/jira/browse/HDDS-601 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.2.1 >Reporter: Soumitra Sulav >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.3.0, 0.4.0 > > Attachments: HDDS-601.001.patch, HDDS-601.002.patch > > > Encountered below exception after I changed a configuration in ozone-site and > restarted SCM and Datanode : > Ozone Cluster : 1 SCM, 1 OM, 3 DNs > {code:java} > 2018-10-04 09:35:59,716 INFO org.apache.hadoop.hdds.server.BaseHttpServer: > HTTP server of SCM is listening at http://0.0.0.0:9876 > 2018-10-04 09:36:03,618 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: > SCM receive heartbeat from unregistered datanode > 127a8e17-b2df-4663-924c-1a6909adb293{ip: 172.22.119.19, host: > hcatest-2.openstacklocal} > 2018-10-04 09:36:09,063 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: > SCM receive heartbeat from unregistered datanode > 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: > hcatest-3.openstacklocal} > 2018-10-04 09:36:09,083 ERROR > org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Error on > processing container report from datanode > 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: > hcatest-3.openstacklocal} > org.apache.hadoop.hdds.scm.exceptions.SCMException: No such datanode > at > org.apache.hadoop.hdds.scm.node.states.Node2ContainerMap.setContainersForDatanode(Node2ContainerMap.java:82) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:45) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-601) On restart, SCM throws 'No such datanode' exception
[ https://issues.apache.org/jira/browse/HDDS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-601: Fix Version/s: 0.4.0 0.3.0 > On restart, SCM throws 'No such datanode' exception > --- > > Key: HDDS-601 > URL: https://issues.apache.org/jira/browse/HDDS-601 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.2.1 >Reporter: Soumitra Sulav >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.3.0, 0.4.0 > > Attachments: HDDS-601.001.patch, HDDS-601.002.patch > > > Encountered below exception after I changed a configuration in ozone-site and > restarted SCM and Datanode : > Ozone Cluster : 1 SCM, 1 OM, 3 DNs > {code:java} > 2018-10-04 09:35:59,716 INFO org.apache.hadoop.hdds.server.BaseHttpServer: > HTTP server of SCM is listening at http://0.0.0.0:9876 > 2018-10-04 09:36:03,618 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: > SCM receive heartbeat from unregistered datanode > 127a8e17-b2df-4663-924c-1a6909adb293{ip: 172.22.119.19, host: > hcatest-2.openstacklocal} > 2018-10-04 09:36:09,063 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: > SCM receive heartbeat from unregistered datanode > 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: > hcatest-3.openstacklocal} > 2018-10-04 09:36:09,083 ERROR > org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Error on > processing container report from datanode > 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: > hcatest-3.openstacklocal} > org.apache.hadoop.hdds.scm.exceptions.SCMException: No such datanode > at > org.apache.hadoop.hdds.scm.node.states.Node2ContainerMap.setContainersForDatanode(Node2ContainerMap.java:82) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:45) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-601) On restart, SCM throws 'No such datanode' exception
[ https://issues.apache.org/jira/browse/HDDS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-601: Resolution: Fixed Status: Resolved (was: Patch Available) > On restart, SCM throws 'No such datanode' exception > --- > > Key: HDDS-601 > URL: https://issues.apache.org/jira/browse/HDDS-601 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.2.1 >Reporter: Soumitra Sulav >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.3.0, 0.4.0 > > Attachments: HDDS-601.001.patch, HDDS-601.002.patch > > > Encountered below exception after I changed a configuration in ozone-site and > restarted SCM and Datanode : > Ozone Cluster : 1 SCM, 1 OM, 3 DNs > {code:java} > 2018-10-04 09:35:59,716 INFO org.apache.hadoop.hdds.server.BaseHttpServer: > HTTP server of SCM is listening at http://0.0.0.0:9876 > 2018-10-04 09:36:03,618 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: > SCM receive heartbeat from unregistered datanode > 127a8e17-b2df-4663-924c-1a6909adb293{ip: 172.22.119.19, host: > hcatest-2.openstacklocal} > 2018-10-04 09:36:09,063 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: > SCM receive heartbeat from unregistered datanode > 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: > hcatest-3.openstacklocal} > 2018-10-04 09:36:09,083 ERROR > org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Error on > processing container report from datanode > 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: > hcatest-3.openstacklocal} > org.apache.hadoop.hdds.scm.exceptions.SCMException: No such datanode > at > org.apache.hadoop.hdds.scm.node.states.Node2ContainerMap.setContainersForDatanode(Node2ContainerMap.java:82) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:45) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-609: Description: Note: Updated the description to describe the root cause of the bug and moved the error logs to comments. On restart, SCM can exit chill mode only if it receives report of 99% (default) of containers from the DNs. SCM includes containers in ALLOCATED state in calculating the total number of containers. But since ALLOCATED containers are not reported by DNs, the calculation of percentage of reported containers is misconstrued. {code:java} For example, say we have 1DN in the cluster and we restart SCM. Total number of containers in SCM ContainerMap = 20 Containers in OPEN state = 2 Containers in ALLOCATED state = 18 Containers reported by DN on SCM restart = 2 Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) = 0.10 {code} We should not include the ALLOCATED containers while calculating the total number of containers for chill mode exit rule. Otherwise, for scenarios such as above, SCM can never come out of chill mode. was: {code:java} -bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5 18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History server at ctr-e138-1518143905142-510793-01-04.hwx.site/172.27.79.197:10200 18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hdfs/.staging/job_1539125785626_0007 18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1 18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9] 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539125785626_0007 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: [] 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application application_1539125785626_0007 18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: http://ctr-e138-1518143905142-510793-01-05.hwx.site:8088/proxy/application_1539125785626_0007/ 18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007 18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in uber mode : false 18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0% 18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0% 18/10/09 23:37:29 INFO mapreduce.Job: Task Id : attempt_1539125785626_0007_r_00_0, Status : FAILED Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250) at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64) at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at
[jira] [Commented] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646970#comment-16646970 ] Hanisha Koneru commented on HDDS-609: - Initial error logs reported by [~nmaheshwari] (moved from the description): {code:java} -bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5 18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History server at ctr-e138-1518143905142-510793-01-04.hwx.site/172.27.79.197:10200 18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hdfs/.staging/job_1539125785626_0007 18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1 18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9] 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539125785626_0007 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: [] 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application application_1539125785626_0007 18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: http://ctr-e138-1518143905142-510793-01-05.hwx.site:8088/proxy/application_1539125785626_0007/ 18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007 18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in uber mode : false 18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0% 18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0% 18/10/09 23:37:29 INFO mapreduce.Job: Task Id : attempt_1539125785626_0007_r_00_0, Status : FAILED Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250) at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64) at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) 18/10/09 23:37:35 INFO mapreduce.Job: Task Id : attempt_1539125785626_0007_r_00_1, Status : FAILED Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475) at
[jira] [Commented] (HDDS-600) Mapreduce example fails with java.lang.IllegalArgumentException: Bucket or Volume name has an unsupported character
[ https://issues.apache.org/jira/browse/HDDS-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646944#comment-16646944 ] Hanisha Koneru commented on HDDS-600: - [~nmaheshwari], can we close this issue? > Mapreduce example fails with java.lang.IllegalArgumentException: Bucket or > Volume name has an unsupported character > --- > > Key: HDDS-600 > URL: https://issues.apache.org/jira/browse/HDDS-600 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Blocker > > Set up a hadoop cluster where ozone is also installed. Ozone can be > referenced via o3://xx.xx.xx.xx:9889 > {code:java} > [root@ctr-e138-1518143905142-510793-01-02 ~]# ozone sh bucket list > o3://xx.xx.xx.xx:9889/volume1/ > 2018-10-09 07:21:24,624 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > [ { > "volumeName" : "volume1", > "bucketName" : "bucket1", > "createdOn" : "Tue, 09 Oct 2018 06:48:02 GMT", > "acls" : [ { > "type" : "USER", > "name" : "root", > "rights" : "READ_WRITE" > }, { > "type" : "GROUP", > "name" : "root", > "rights" : "READ_WRITE" > } ], > "versioning" : "DISABLED", > "storageType" : "DISK" > } ] > [root@ctr-e138-1518143905142-510793-01-02 ~]# ozone sh key list > o3://xx.xx.xx.xx:9889/volume1/bucket1 > 2018-10-09 07:21:54,500 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > [ { > "version" : 0, > "md5hash" : null, > "createdOn" : "Tue, 09 Oct 2018 06:58:32 GMT", > "modifiedOn" : "Tue, 09 Oct 2018 06:58:32 GMT", > "size" : 0, > "keyName" : "mr_job_dir" > } ] > [root@ctr-e138-1518143905142-510793-01-02 ~]#{code} > Hdfs is also set fine as below > {code:java} > [root@ctr-e138-1518143905142-510793-01-02 ~]# hdfs dfs -ls > /tmp/mr_jobs/input/ > Found 1 items > -rw-r--r-- 3 root hdfs 215755 2018-10-09 06:37 > /tmp/mr_jobs/input/wordcount_input_1.txt > [root@ctr-e138-1518143905142-510793-01-02 ~]#{code} > Now try to run Mapreduce example job against ozone o3: > {code:java} > [root@ctr-e138-1518143905142-510793-01-02 ~]# > /usr/hdp/current/hadoop-client/bin/hadoop jar > /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar > wordcount /tmp/mr_jobs/input/ > o3://xx.xx.xx.xx:9889/volume1/bucket1/mr_job_dir/output > 18/10/09 07:15:38 INFO conf.Configuration: Removed undeclared tags: > java.lang.IllegalArgumentException: Bucket or Volume name has an unsupported > character : : > at > org.apache.hadoop.hdds.scm.client.HddsClientUtils.verifyResourceName(HddsClientUtils.java:143) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.getVolumeDetails(RpcClient.java:231) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.ozone.client.OzoneClientInvocationHandler.invoke(OzoneClientInvocationHandler.java:54) > at com.sun.proxy.$Proxy16.getVolumeDetails(Unknown Source) > at org.apache.hadoop.ozone.client.ObjectStore.getVolume(ObjectStore.java:92) > at > org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(OzoneFileSystem.java:121) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:178) > at org.apache.hadoop.examples.WordCount.main(WordCount.java:85) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at >
[jira] [Updated] (HDDS-601) On restart, SCM throws 'No such datanode
[ https://issues.apache.org/jira/browse/HDDS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-601: Summary: On restart, SCM throws 'No such datanode (was: SCMException: No such datanode) > On restart, SCM throws 'No such datanode > > > Key: HDDS-601 > URL: https://issues.apache.org/jira/browse/HDDS-601 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.2.1 >Reporter: Soumitra Sulav >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-601.001.patch, HDDS-601.002.patch > > > Encountered below exception after I changed a configuration in ozone-site and > restarted SCM and Datanode : > Ozone Cluster : 1 SCM, 1 OM, 3 DNs > {code:java} > 2018-10-04 09:35:59,716 INFO org.apache.hadoop.hdds.server.BaseHttpServer: > HTTP server of SCM is listening at http://0.0.0.0:9876 > 2018-10-04 09:36:03,618 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: > SCM receive heartbeat from unregistered datanode > 127a8e17-b2df-4663-924c-1a6909adb293{ip: 172.22.119.19, host: > hcatest-2.openstacklocal} > 2018-10-04 09:36:09,063 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: > SCM receive heartbeat from unregistered datanode > 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: > hcatest-3.openstacklocal} > 2018-10-04 09:36:09,083 ERROR > org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Error on > processing container report from datanode > 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: > hcatest-3.openstacklocal} > org.apache.hadoop.hdds.scm.exceptions.SCMException: No such datanode > at > org.apache.hadoop.hdds.scm.node.states.Node2ContainerMap.setContainersForDatanode(Node2ContainerMap.java:82) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:45) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-601) On restart, SCM throws 'No such datanode' exception
[ https://issues.apache.org/jira/browse/HDDS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-601: Summary: On restart, SCM throws 'No such datanode' exception (was: On restart, SCM throws 'No such datanode) > On restart, SCM throws 'No such datanode' exception > --- > > Key: HDDS-601 > URL: https://issues.apache.org/jira/browse/HDDS-601 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.2.1 >Reporter: Soumitra Sulav >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-601.001.patch, HDDS-601.002.patch > > > Encountered below exception after I changed a configuration in ozone-site and > restarted SCM and Datanode : > Ozone Cluster : 1 SCM, 1 OM, 3 DNs > {code:java} > 2018-10-04 09:35:59,716 INFO org.apache.hadoop.hdds.server.BaseHttpServer: > HTTP server of SCM is listening at http://0.0.0.0:9876 > 2018-10-04 09:36:03,618 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: > SCM receive heartbeat from unregistered datanode > 127a8e17-b2df-4663-924c-1a6909adb293{ip: 172.22.119.19, host: > hcatest-2.openstacklocal} > 2018-10-04 09:36:09,063 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: > SCM receive heartbeat from unregistered datanode > 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: > hcatest-3.openstacklocal} > 2018-10-04 09:36:09,083 ERROR > org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Error on > processing container report from datanode > 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: > hcatest-3.openstacklocal} > org.apache.hadoop.hdds.scm.exceptions.SCMException: No such datanode > at > org.apache.hadoop.hdds.scm.node.states.Node2ContainerMap.setContainersForDatanode(Node2ContainerMap.java:82) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:45) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647027#comment-16647027 ] Hanisha Koneru commented on HDDS-609: - Updated patch v02 to fix test failure in TestSCMChillModeManager. > On restart, SCM does not exit chill mode as it expects DNs to report > containers in ALLOCATED state > -- > > Key: HDDS-609 > URL: https://issues.apache.org/jira/browse/HDDS-609 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-609.001.patch, HDDS-609.002.patch > > > Note: Updated the description to describe the root cause of the bug and moved > the error logs to comments. > On restart, SCM can exit chill mode only if it receives report of 99% > (default) of containers from the DNs. > SCM includes containers in ALLOCATED state in calculating the total number of > containers. But since ALLOCATED containers are not reported by DNs, the > calculation of percentage of reported containers is misconstrued. > {code:java} > For example, say we have 1DN in the cluster and we restart SCM. > Total number of containers in SCM ContainerMap = 20 > Containers in OPEN state = 2 > Containers in ALLOCATED state = 18 > Containers reported by DN on SCM restart = 2 > Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) > = 0.10 > {code} > We should not include the ALLOCATED containers while calculating the total > number of containers for chill mode exit rule. Otherwise, for scenarios such > as above, SCM can never come out of chill mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-635) On Datanode restart, DatanodeStateMachine throws NullPointerException
Hanisha Koneru created HDDS-635: --- Summary: On Datanode restart, DatanodeStateMachine throws NullPointerException Key: HDDS-635 URL: https://issues.apache.org/jira/browse/HDDS-635 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Hanisha Koneru Assignee: Hanisha Koneru {code:java} 2018-10-11 12:08:33,676 ERROR org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: Unable to start the DatanodeState Machine java.io.IOException: Premature EOF from inputStream at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100) at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:250) at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:155) at org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStream.java:128) at org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:110) at org.apache.ratis.server.storage.LogSegment.loadSegment(LogSegment.java:132) at org.apache.ratis.server.storage.RaftLogCache.loadSegment(RaftLogCache.java:110) at org.apache.ratis.server.storage.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:155) at org.apache.ratis.server.storage.SegmentedRaftLog.open(SegmentedRaftLog.java:123) at org.apache.ratis.server.impl.ServerState.initLog(ServerState.java:162) at org.apache.ratis.server.impl.ServerState.(ServerState.java:110) at org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:106) at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$0(RaftServerProxy.java:191) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) 2018-10-11 12:08:33,677 ERROR org.apache.hadoop.ozone.HddsDatanodeService: Exception in HddsDatanodeService. java.lang.NullPointerException at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.join(DatanodeStateMachine.java:332) at org.apache.hadoop.ozone.HddsDatanodeService.join(HddsDatanodeService.java:191) at org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:250) 2018-10-11 12:08:33,678 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.lang.NullPointerException{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-609: Description: {code:java} -bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5 18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History server at ctr-e138-1518143905142-510793-01-04.hwx.site/172.27.79.197:10200 18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hdfs/.staging/job_1539125785626_0007 18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1 18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9] 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539125785626_0007 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: [] 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: 18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application application_1539125785626_0007 18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: http://ctr-e138-1518143905142-510793-01-05.hwx.site:8088/proxy/application_1539125785626_0007/ 18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007 18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in uber mode : false 18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0% 18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0% 18/10/09 23:37:29 INFO mapreduce.Job: Task Id : attempt_1539125785626_0007_r_00_0, Status : FAILED Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250) at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64) at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) 18/10/09 23:37:35 INFO mapreduce.Job: Task Id : attempt_1539125785626_0007_r_00_1, Status : FAILED Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271) at
[jira] [Assigned] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru reassigned HDDS-609: --- Assignee: Hanisha Koneru > On restart, SCM does not exit chill mode as it expects DNs to report > containers in ALLOCATED state > -- > > Key: HDDS-609 > URL: https://issues.apache.org/jira/browse/HDDS-609 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > > {code:java} > -bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar > /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar > wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5 > 18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History > server at ctr-e138-1518143905142-510793-01-04.hwx.site/172.27.79.197:10200 > 18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > 18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure > Coding for path: /user/hdfs/.staging/job_1539125785626_0007 > 18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1 > 18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9] > 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1 > 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1539125785626_0007 > 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: [] > 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml > at file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml > 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags: > 18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application > application_1539125785626_0007 > 18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: > http://ctr-e138-1518143905142-510793-01-05.hwx.site:8088/proxy/application_1539125785626_0007/ > 18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007 > 18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in > uber mode : false > 18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0% > 18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0% > 18/10/09 23:37:29 INFO mapreduce.Job: Task Id : > attempt_1539125785626_0007_r_00_0, Status : FAILED > Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271) > at > org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) > at > org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64) > at > org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at >
[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-609: Status: Patch Available (was: Open) > On restart, SCM does not exit chill mode as it expects DNs to report > containers in ALLOCATED state > -- > > Key: HDDS-609 > URL: https://issues.apache.org/jira/browse/HDDS-609 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-609.001.patch > > > Note: Updated the description to describe the root cause of the bug and moved > the error logs to comments. > On restart, SCM can exit chill mode only if it receives report of 99% > (default) of containers from the DNs. > SCM includes containers in ALLOCATED state in calculating the total number of > containers. But since ALLOCATED containers are not reported by DNs, the > calculation of percentage of reported containers is misconstrued. > {code:java} > For example, say we have 1DN in the cluster and we restart SCM. > Total number of containers in SCM ContainerMap = 20 > Containers in OPEN state = 2 > Containers in ALLOCATED state = 18 > Containers reported by DN on SCM restart = 2 > Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) > = 0.10 > {code} > We should not include the ALLOCATED containers while calculating the total > number of containers for chill mode exit rule. Otherwise, for scenarios such > as above, SCM can never come out of chill mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-609: Attachment: HDDS-609.001.patch > On restart, SCM does not exit chill mode as it expects DNs to report > containers in ALLOCATED state > -- > > Key: HDDS-609 > URL: https://issues.apache.org/jira/browse/HDDS-609 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-609.001.patch > > > Note: Updated the description to describe the root cause of the bug and moved > the error logs to comments. > On restart, SCM can exit chill mode only if it receives report of 99% > (default) of containers from the DNs. > SCM includes containers in ALLOCATED state in calculating the total number of > containers. But since ALLOCATED containers are not reported by DNs, the > calculation of percentage of reported containers is misconstrued. > {code:java} > For example, say we have 1DN in the cluster and we restart SCM. > Total number of containers in SCM ContainerMap = 20 > Containers in OPEN state = 2 > Containers in ALLOCATED state = 18 > Containers reported by DN on SCM restart = 2 > Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) > = 0.10 > {code} > We should not include the ALLOCATED containers while calculating the total > number of containers for chill mode exit rule. Otherwise, for scenarios such > as above, SCM can never come out of chill mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-609: Attachment: HDDS-609.003.patch > On restart, SCM does not exit chill mode as it expects DNs to report > containers in ALLOCATED state > -- > > Key: HDDS-609 > URL: https://issues.apache.org/jira/browse/HDDS-609 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-609.001.patch, HDDS-609.002.patch, > HDDS-609.003.patch > > > Note: Updated the description to describe the root cause of the bug and moved > the error logs to comments. > On restart, SCM can exit chill mode only if it receives report of 99% > (default) of containers from the DNs. > SCM includes containers in ALLOCATED state in calculating the total number of > containers. But since ALLOCATED containers are not reported by DNs, the > calculation of percentage of reported containers is misconstrued. > {code:java} > For example, say we have 1DN in the cluster and we restart SCM. > Total number of containers in SCM ContainerMap = 20 > Containers in OPEN state = 2 > Containers in ALLOCATED state = 18 > Containers reported by DN on SCM restart = 2 > Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) > = 0.10 > {code} > We should not include the ALLOCATED containers while calculating the total > number of containers for chill mode exit rule. Otherwise, for scenarios such > as above, SCM can never come out of chill mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-612) Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock fails with ChillModePrecheck exception
[ https://issues.apache.org/jira/browse/HDDS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-612: Attachment: HDDS-612.001.patch > Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock > fails with ChillModePrecheck exception > > > Key: HDDS-612 > URL: https://issues.apache.org/jira/browse/HDDS-612 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-612.001.patch > > > {code:java} > 2018-10-09 23:11:58,047 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 0 on 9863, call Call#70 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:53442 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
[ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647172#comment-16647172 ] Hanisha Koneru commented on HDDS-609: - Thanks [~anu] Fixed the failing unit test and added one for the current change. > On restart, SCM does not exit chill mode as it expects DNs to report > containers in ALLOCATED state > -- > > Key: HDDS-609 > URL: https://issues.apache.org/jira/browse/HDDS-609 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-609.001.patch, HDDS-609.002.patch, > HDDS-609.003.patch > > > Note: Updated the description to describe the root cause of the bug and moved > the error logs to comments. > On restart, SCM can exit chill mode only if it receives report of 99% > (default) of containers from the DNs. > SCM includes containers in ALLOCATED state in calculating the total number of > containers. But since ALLOCATED containers are not reported by DNs, the > calculation of percentage of reported containers is misconstrued. > {code:java} > For example, say we have 1DN in the cluster and we restart SCM. > Total number of containers in SCM ContainerMap = 20 > Containers in OPEN state = 2 > Containers in ALLOCATED state = 18 > Containers reported by DN on SCM restart = 2 > Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) > = 0.10 > {code} > We should not include the ALLOCATED containers while calculating the total > number of containers for chill mode exit rule. Otherwise, for scenarios such > as above, SCM can never come out of chill mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-612) Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock fails with ChillModePrecheck exception
[ https://issues.apache.org/jira/browse/HDDS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-612: Status: Patch Available (was: Open) > Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock > fails with ChillModePrecheck exception > > > Key: HDDS-612 > URL: https://issues.apache.org/jira/browse/HDDS-612 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-612.001.patch > > > {code:java} > 2018-10-09 23:11:58,047 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 0 on 9863, call Call#70 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:53442 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-661) When a volume fails in datanode, VersionEndpointTask#call ends up in dead lock.
[ https://issues.apache.org/jira/browse/HDDS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru reassigned HDDS-661: --- Assignee: Hanisha Koneru > When a volume fails in datanode, VersionEndpointTask#call ends up in dead > lock. > --- > > Key: HDDS-661 > URL: https://issues.apache.org/jira/browse/HDDS-661 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Nanda kumar >Assignee: Hanisha Koneru >Priority: Major > > When a volume fails in datanode, the call to {{VersionEndpointTask#call}} > ends up in dead-lock. > {code:java} > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:78) > --> we acquire VolumeSet read lock here. > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:93) > > org.apache.hadoop.ozone.container.common.volume.VolumeSet.failVolume(VolumeSet.java:276) > > org.apache.hadoop.ozone.container.common.volume.VolumeSet.writeLock(VolumeSet.java:210) > ---> we wait for VolumeSet write lock. > {code} > Since this thread already holds the read lock, it cannot get the write lock > and ends up in dead-lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-661) When a volume fails in datanode, VersionEndpointTask#call ends up in dead lock.
[ https://issues.apache.org/jira/browse/HDDS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-661: Attachment: HDDS-661.001.patch > When a volume fails in datanode, VersionEndpointTask#call ends up in dead > lock. > --- > > Key: HDDS-661 > URL: https://issues.apache.org/jira/browse/HDDS-661 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Nanda kumar >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-661.001.patch > > > When a volume fails in datanode, the call to {{VersionEndpointTask#call}} > ends up in dead-lock. > {code:java} > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:78) > --> we acquire VolumeSet read lock here. > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:93) > > org.apache.hadoop.ozone.container.common.volume.VolumeSet.failVolume(VolumeSet.java:276) > > org.apache.hadoop.ozone.container.common.volume.VolumeSet.writeLock(VolumeSet.java:210) > ---> we wait for VolumeSet write lock. > {code} > Since this thread already holds the read lock, it cannot get the write lock > and ends up in dead-lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-661) When a volume fails in datanode, VersionEndpointTask#call ends up in dead lock.
[ https://issues.apache.org/jira/browse/HDDS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-661: Status: Patch Available (was: Open) > When a volume fails in datanode, VersionEndpointTask#call ends up in dead > lock. > --- > > Key: HDDS-661 > URL: https://issues.apache.org/jira/browse/HDDS-661 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Nanda kumar >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-661.001.patch > > > When a volume fails in datanode, the call to {{VersionEndpointTask#call}} > ends up in dead-lock. > {code:java} > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:78) > --> we acquire VolumeSet read lock here. > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:93) > > org.apache.hadoop.ozone.container.common.volume.VolumeSet.failVolume(VolumeSet.java:276) > > org.apache.hadoop.ozone.container.common.volume.VolumeSet.writeLock(VolumeSet.java:210) > ---> we wait for VolumeSet write lock. > {code} > Since this thread already holds the read lock, it cannot get the write lock > and ends up in dead-lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-661) When a volume fails in datanode, VersionEndpointTask#call ends up in dead lock.
[ https://issues.apache.org/jira/browse/HDDS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650638#comment-16650638 ] Hanisha Koneru commented on HDDS-661: - Thanks [~nandakumar131] for catching this bug. In VersionEndPointTask, we take the writeLock instead of ReadLock. Since this is the very first call from the DN and DN cannot register/ heartbeat before this call completes, we can safely say that getting the writeLock here would not block any other process. I have posted a patch with this change. > When a volume fails in datanode, VersionEndpointTask#call ends up in dead > lock. > --- > > Key: HDDS-661 > URL: https://issues.apache.org/jira/browse/HDDS-661 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Nanda kumar >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-661.001.patch > > > When a volume fails in datanode, the call to {{VersionEndpointTask#call}} > ends up in dead-lock. > {code:java} > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:78) > --> we acquire VolumeSet read lock here. > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:93) > > org.apache.hadoop.ozone.container.common.volume.VolumeSet.failVolume(VolumeSet.java:276) > > org.apache.hadoop.ozone.container.common.volume.VolumeSet.writeLock(VolumeSet.java:210) > ---> we wait for VolumeSet write lock. > {code} > Since this thread already holds the read lock, it cannot get the write lock > and ends up in dead-lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-681) VolumeSet lock should not be exposed outside of VolumeSet class.
[ https://issues.apache.org/jira/browse/HDDS-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru reassigned HDDS-681: --- Assignee: Hanisha Koneru > VolumeSet lock should not be exposed outside of VolumeSet class. > > > Key: HDDS-681 > URL: https://issues.apache.org/jira/browse/HDDS-681 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Nanda kumar >Assignee: Hanisha Koneru >Priority: Major > > If {{VolumeSet}} lock is exposed outside of {{VolumeSet}} class then someone > who is using it can end up in deadlock situation easily. We should change the > code in such a way that the lock is not exposed and the data structure inside > VolumeSet is also protected by the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-663) Lot of "Removed undeclared tags" logger while running commands
[ https://issues.apache.org/jira/browse/HDDS-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652195#comment-16652195 ] Hanisha Koneru commented on HDDS-663: - This has been fixed in HADOOP-15295. [~nmaheshwari], which version of Hadoop did you see this in? > Lot of "Removed undeclared tags" logger while running commands > -- > > Key: HDDS-663 > URL: https://issues.apache.org/jira/browse/HDDS-663 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Namit Maheshwari >Priority: Major > Labels: newbie > Fix For: 0.3.0 > > > While running commands against OzoneFs see lot of logger like below: > {code:java} > -bash-4.2$ hdfs dfs -ls o3://bucket2.volume2/mr_jobEE > 18/10/15 20:29:17 INFO conf.Configuration: Removed undeclared tags: > 18/10/15 20:29:18 INFO conf.Configuration: Removed undeclared tags: > Found 2 items > rw-rw-rw 1 hdfs hdfs 0 2018-10-15 20:28 o3://bucket2.volume2/mr_jobEE/_SUCCESS > rw-rw-rw 1 hdfs hdfs 5017 1970-07-23 04:33 > o3://bucket2.volume2/mr_jobEE/part-r-0 > 18/10/15 20:29:19 INFO conf.Configuration: Removed undeclared tags: > -bash-4.2$ {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-664) Creating hive table on Ozone fails
[ https://issues.apache.org/jira/browse/HDDS-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru reassigned HDDS-664: --- Assignee: Hanisha Koneru > Creating hive table on Ozone fails > -- > > Key: HDDS-664 > URL: https://issues.apache.org/jira/browse/HDDS-664 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > > Modified HIVE_AUX_JARS_PATH to include Ozone jars. Tried creating Hive > external table on Ozone. It fails with "Error: Error while compiling > statement: FAILED: HiveAuthzPluginException Error getting permissions for > o3://bucket2.volume2/testo3: User: hive is not allowed to impersonate > anonymous (state=42000,code=4)" > {code:java} > -bash-4.2$ beeline > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.0.3.0-63/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.0.3.0-63/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://ctr-e138-1518143905142-510793-01-11.hwx.site:2181,ctr-e138-1518143905142-510793-01-06.hwx.site:2181,ctr-e138-1518143905142-510793-01-08.hwx.site:2181,ctr-e138-1518143905142-510793-01-10.hwx.site:2181,ctr-e138-1518143905142-510793-01-07.hwx.site:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 > Enter username for > jdbc:hive2://ctr-e138-1518143905142-510793-01-11.hwx.site:2181,ctr-e138-1518143905142-510793-01-06.hwx.site:2181,ctr-e138-1518143905142-510793-01-08.hwx.site:2181,ctr-e138-1518143905142-510793-01-10.hwx.site:2181,ctr-e138-1518143905142-510793-01-07.hwx.site:2181/default: > Enter password for > jdbc:hive2://ctr-e138-1518143905142-510793-01-11.hwx.site:2181,ctr-e138-1518143905142-510793-01-06.hwx.site:2181,ctr-e138-1518143905142-510793-01-08.hwx.site:2181,ctr-e138-1518143905142-510793-01-10.hwx.site:2181,ctr-e138-1518143905142-510793-01-07.hwx.site:2181/default: > 18/10/15 21:36:55 [main]: INFO jdbc.HiveConnection: Connected to > ctr-e138-1518143905142-510793-01-04.hwx.site:1 > Connected to: Apache Hive (version 3.1.0.3.0.3.0-63) > Driver: Hive JDBC (version 3.1.0.3.0.3.0-63) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 3.1.0.3.0.3.0-63 by Apache Hive > 0: jdbc:hive2://ctr-e138-1518143905142-510793> create external table testo3 ( > i int, s string, d float) location "o3://bucket2.volume2/testo3"; > Error: Error while compiling statement: FAILED: HiveAuthzPluginException > Error getting permissions for o3://bucket2.volume2/testo3: User: hive is not > allowed to impersonate anonymous (state=42000,code=4) > 0: jdbc:hive2://ctr-e138-1518143905142-510793> {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-612) Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock fails with ChillModePrecheck exception
[ https://issues.apache.org/jira/browse/HDDS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655780#comment-16655780 ] Hanisha Koneru commented on HDDS-612: - Thanks for the review [~arpitagarwal]. {quote}why is the following change necessary? {quote} In {{exitChileMode()}}, we call the {{emitChillModeStatus()}}. So the emit function was being called twice before. Added a unit test and also updated SCMChillModeManager to include a check if chillMode is enabled before checking the exitRules. > Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock > fails with ChillModePrecheck exception > > > Key: HDDS-612 > URL: https://issues.apache.org/jira/browse/HDDS-612 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-612.001.patch, HDDS-612.002.patch, > HDDS-612.003.patch > > > {code:java} > 2018-10-09 23:11:58,047 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 0 on 9863, call Call#70 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:53442 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-686) Incorrect creation time for files created by o3fs.
[ https://issues.apache.org/jira/browse/HDDS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655901#comment-16655901 ] Hanisha Koneru commented on HDDS-686: - [~jnp], I am getting the correct time format for the files created by o3fs. {code:java} $./hadoop fs -mkdir o3fs://bucket1.volume1/key2 $./ozone sh key list /volume1/bucket1 [{ "version" : 0, "md5hash" : null, "createdOn" : "Thu, 18 Oct 2018 21:13:37 GMT", "modifiedOn" : "Thu, 18 Oct 2018 21:13:37 GMT", "size" : 0, "keyName" : "key2/" } ]{code} Could you please give the steps to repro? > Incorrect creation time for files created by o3fs. > -- > > Key: HDDS-686 > URL: https://issues.apache.org/jira/browse/HDDS-686 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Filesystem >Reporter: Jitendra Nath Pandey >Assignee: Hanisha Koneru >Priority: Blocker > Labels: app-compat > > Files created by o3fs show creation timestamp as unix epoch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-612) Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock fails with ChillModePrecheck exception
[ https://issues.apache.org/jira/browse/HDDS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-612: Attachment: HDDS-612.003.patch > Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock > fails with ChillModePrecheck exception > > > Key: HDDS-612 > URL: https://issues.apache.org/jira/browse/HDDS-612 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-612.001.patch, HDDS-612.002.patch, > HDDS-612.003.patch > > > {code:java} > 2018-10-09 23:11:58,047 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 0 on 9863, call Call#70 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:53442 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-338) ozoneFS allows to create file key and directory key with same keyname
[ https://issues.apache.org/jira/browse/HDDS-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654565#comment-16654565 ] Hanisha Koneru commented on HDDS-338: - {code:java} $ ./ozone sh key put /volume2/bucket2/dir1/dir2/dir3/key1234 /etc/hadoop/workers $ ./ozone sh key list /volume2/bucket2 [ { "version" : 0, "md5hash" : null, "createdOn" : "Wed, 17 Oct 2018 23:13:15 GMT", "modifiedOn" : "Wed, 17 Oct 2018 23:13:17 GMT", "size" : 10, "keyName" : "dir1" }]{code} > ozoneFS allows to create file key and directory key with same keyname > - > > Key: HDDS-338 > URL: https://issues.apache.org/jira/browse/HDDS-338 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Nilotpal Nandi >Assignee: Hanisha Koneru >Priority: Critical > Attachments: HDDS-338.001.patch > > > steps taken : > -- > 1. created a directory through ozoneFS interface. > {noformat} > hadoop@1a1fa8a11332:~/bin$ ./ozone fs -mkdir /temp1/ > 2018-08-08 13:50:26 WARN NativeCodeLoader:60 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > hadoop@1a1fa8a11332:~/bin$ ./ozone fs -ls / > 2018-08-08 14:09:59 WARN NativeCodeLoader:60 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Found 1 items > drwxrwxrwx - 0 2018-08-08 13:51 /temp1{noformat} > 2. create a new key with name 'temp1' at same bucket. > {noformat} > hadoop@1a1fa8a11332:~/bin$ ./ozone oz -putKey root-volume/root-bucket/temp1 > -file /etc/passwd > 2018-08-08 14:10:34 WARN NativeCodeLoader:60 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.rpc.type = GRPC (default) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 > (custom) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 > ms (default) > 2018-08-08 14:10:35 INFO ConfUtils:41 - > raft.client.async.outstanding-requests.max = 100 (default) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.async.scheduler-threads = > 3 (default) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB > (=1048576) (default) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 > (custom) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.rpc.request.timeout = > 3000 ms (default) > Aug 08, 2018 2:10:36 PM > org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl detectProxy > WARNING: Failed to construct URI for proxy lookup, proceeding without proxy > java.net.URISyntaxException: Illegal character in hostname at index 13: > https://ozone_datanode_3.ozone_default:9858 > at java.net.URI$Parser.fail(URI.java:2848) > at java.net.URI$Parser.parseHostname(URI.java:3387) > at java.net.URI$Parser.parseServer(URI.java:3236) > at java.net.URI$Parser.parseAuthority(URI.java:3155) > at java.net.URI$Parser.parseHierarchical(URI.java:3097) > at java.net.URI$Parser.parse(URI.java:3053) > at java.net.URI.(URI.java:673) > at > org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl.detectProxy(ProxyDetectorImpl.java:128) > at > org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl.proxyFor(ProxyDetectorImpl.java:118) > at > org.apache.ratis.shaded.io.grpc.internal.InternalSubchannel.startNewTransport(InternalSubchannel.java:207) > at > org.apache.ratis.shaded.io.grpc.internal.InternalSubchannel.obtainActiveTransport(InternalSubchannel.java:188) > at > org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$SubchannelImpl.requestConnection(ManagedChannelImpl.java:1130) > at > org.apache.ratis.shaded.io.grpc.PickFirstBalancerFactory$PickFirstBalancer.handleResolvedAddressGroups(PickFirstBalancerFactory.java:79) > at > org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl$1NamesResolved.run(ManagedChannelImpl.java:1032) > at > org.apache.ratis.shaded.io.grpc.internal.ChannelExecutor.drain(ChannelExecutor.java:73) > at > org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$LbHelperImpl.runSerialized(ManagedChannelImpl.java:1000) > at > org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl.onAddresses(ManagedChannelImpl.java:1044) > at > org.apache.ratis.shaded.io.grpc.internal.DnsNameResolver$1.run(DnsNameResolver.java:201) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){noformat} > Observed that there are multiple entries of 'temp1' when ozone fs -ls
[jira] [Issue Comment Deleted] (HDDS-338) ozoneFS allows to create file key and directory key with same keyname
[ https://issues.apache.org/jira/browse/HDDS-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-338: Comment: was deleted (was: {code:java} $ ./ozone sh key put /volume2/bucket2/dir1/dir2/dir3/key1234 /etc/hadoop/workers $ ./ozone sh key list /volume2/bucket2 [ { "version" : 0, "md5hash" : null, "createdOn" : "Wed, 17 Oct 2018 23:13:15 GMT", "modifiedOn" : "Wed, 17 Oct 2018 23:13:17 GMT", "size" : 10, "keyName" : "dir1" }]{code}) > ozoneFS allows to create file key and directory key with same keyname > - > > Key: HDDS-338 > URL: https://issues.apache.org/jira/browse/HDDS-338 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Nilotpal Nandi >Assignee: Hanisha Koneru >Priority: Critical > Attachments: HDDS-338.001.patch > > > steps taken : > -- > 1. created a directory through ozoneFS interface. > {noformat} > hadoop@1a1fa8a11332:~/bin$ ./ozone fs -mkdir /temp1/ > 2018-08-08 13:50:26 WARN NativeCodeLoader:60 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > hadoop@1a1fa8a11332:~/bin$ ./ozone fs -ls / > 2018-08-08 14:09:59 WARN NativeCodeLoader:60 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Found 1 items > drwxrwxrwx - 0 2018-08-08 13:51 /temp1{noformat} > 2. create a new key with name 'temp1' at same bucket. > {noformat} > hadoop@1a1fa8a11332:~/bin$ ./ozone oz -putKey root-volume/root-bucket/temp1 > -file /etc/passwd > 2018-08-08 14:10:34 WARN NativeCodeLoader:60 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.rpc.type = GRPC (default) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 > (custom) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 > ms (default) > 2018-08-08 14:10:35 INFO ConfUtils:41 - > raft.client.async.outstanding-requests.max = 100 (default) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.async.scheduler-threads = > 3 (default) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB > (=1048576) (default) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 > (custom) > 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.rpc.request.timeout = > 3000 ms (default) > Aug 08, 2018 2:10:36 PM > org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl detectProxy > WARNING: Failed to construct URI for proxy lookup, proceeding without proxy > java.net.URISyntaxException: Illegal character in hostname at index 13: > https://ozone_datanode_3.ozone_default:9858 > at java.net.URI$Parser.fail(URI.java:2848) > at java.net.URI$Parser.parseHostname(URI.java:3387) > at java.net.URI$Parser.parseServer(URI.java:3236) > at java.net.URI$Parser.parseAuthority(URI.java:3155) > at java.net.URI$Parser.parseHierarchical(URI.java:3097) > at java.net.URI$Parser.parse(URI.java:3053) > at java.net.URI.(URI.java:673) > at > org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl.detectProxy(ProxyDetectorImpl.java:128) > at > org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl.proxyFor(ProxyDetectorImpl.java:118) > at > org.apache.ratis.shaded.io.grpc.internal.InternalSubchannel.startNewTransport(InternalSubchannel.java:207) > at > org.apache.ratis.shaded.io.grpc.internal.InternalSubchannel.obtainActiveTransport(InternalSubchannel.java:188) > at > org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$SubchannelImpl.requestConnection(ManagedChannelImpl.java:1130) > at > org.apache.ratis.shaded.io.grpc.PickFirstBalancerFactory$PickFirstBalancer.handleResolvedAddressGroups(PickFirstBalancerFactory.java:79) > at > org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl$1NamesResolved.run(ManagedChannelImpl.java:1032) > at > org.apache.ratis.shaded.io.grpc.internal.ChannelExecutor.drain(ChannelExecutor.java:73) > at > org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$LbHelperImpl.runSerialized(ManagedChannelImpl.java:1000) > at > org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl.onAddresses(ManagedChannelImpl.java:1044) > at > org.apache.ratis.shaded.io.grpc.internal.DnsNameResolver$1.run(DnsNameResolver.java:201) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){noformat} > Observed that there are multiple entries of 'temp1' when ozone fs -ls command >
[jira] [Updated] (HDDS-670) Fix OzoneFS directory rename
[ https://issues.apache.org/jira/browse/HDDS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-670: Description: Renaming a directory within the same parent directory fails with the exception: {code:java} Unable to move: o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1 to: o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved {code} Detailed exception in comment below. was: It fails with {code:java} ERROR : Job Commit failed with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Unable to move: o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1 to: o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved)' org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move: o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1 to: o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved {code} Detailed exception in comment below. > Fix OzoneFS directory rename > > > Key: HDDS-670 > URL: https://issues.apache.org/jira/browse/HDDS-670 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Blocker > Labels: app-compat > Attachments: HDDS-670.001.patch > > > > Renaming a directory within the same parent directory fails with the > exception: > {code:java} > Unable to move: > o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1 > to: > o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved > {code} > Detailed exception in comment below. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-670) Fix OzoneFS directory rename
[ https://issues.apache.org/jira/browse/HDDS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-670: Summary: Fix OzoneFS directory rename (was: Hive insert fails against Ozone external table) > Fix OzoneFS directory rename > > > Key: HDDS-670 > URL: https://issues.apache.org/jira/browse/HDDS-670 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Blocker > Labels: app-compat > Attachments: HDDS-670.001.patch > > > It fails with > {code:java} > ERROR : Job Commit failed with exception > 'org.apache.hadoop.hive.ql.metadata.HiveException(Unable to move: > o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1 > to: > o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved)' > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move: > o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1 > to: > o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved > {code} > > Detailed exception in comment below. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-612) Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock fails with ChillModePrecheck exception
[ https://issues.apache.org/jira/browse/HDDS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-612: Attachment: HDDS-612.002.patch > Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock > fails with ChillModePrecheck exception > > > Key: HDDS-612 > URL: https://issues.apache.org/jira/browse/HDDS-612 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Namit Maheshwari >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDDS-612.001.patch, HDDS-612.002.patch > > > {code:java} > 2018-10-09 23:11:58,047 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 0 on 9863, call Call#70 Retry#0 > org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock > from 172.27.56.9:53442 > org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed > for allocateBlock > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38) > at > org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30) > at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143) > at > org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org