[jira] [Commented] (HDFS-13114) CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path

2018-02-28 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381301#comment-16381301
 ] 

Hanisha Koneru commented on HDFS-13114:
---

Thank you [~xyao] for committing the patch and [~jojochuang] for the review.

> CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path
> 
>
> Key: HDFS-13114
> URL: https://issues.apache.org/jira/browse/HDFS-13114
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 3.1.0, 3.0.2, 3.2.0
>
> Attachments: HDFS-13114.001.patch
>
>
> The {{crypto -reencryptZone  -path }} command takes in a path 
> argument. But when creating {{HdfsAdmin}} object, it takes the defaultFs 
> instead of resolving from the path. This causes the following exception if 
> the authority component in path does not match the authority of default Fs.
> {code:java}
> $ hdfs crypto -reencryptZone -start -path hdfs://mycluster-node-1:8020/zone1
> IllegalArgumentException: Wrong FS: hdfs://mycluster-node-1:8020/zone1, 
> expected: hdfs://ns1{code}
> {code:java}
> $ hdfs crypto -reencryptZone -start -path hdfs://ns2/zone2
> IllegalArgumentException: Wrong FS: hdfs://ns2/zone2, expected: 
> hdfs://ns1{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13109) Support fully qualified hdfs path in EZ commands

2018-02-27 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13109:
--
Attachment: HDFS-13109.004.patch

> Support fully qualified hdfs path in EZ commands
> 
>
> Key: HDFS-13109
> URL: https://issues.apache.org/jira/browse/HDFS-13109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13109.001.patch, HDFS-13109.002.patch, 
> HDFS-13109.003.patch, HDFS-13109.004.patch
>
>
> When creating an Encryption Zone, if the fully qualified path is specified in 
> the path argument, it throws the following error.
> {code:java}
> ~$ hdfs crypto -createZone -keyName mykey1 -path hdfs://ns1/zone1
> IllegalArgumentException: hdfs://ns1/zone1 is not the root of an encryption 
> zone. Do you mean /zone1?
> ~$ hdfs crypto -createZone -keyName mykey1 -path "hdfs://namenode:9000/zone2" 
> IllegalArgumentException: hdfs://namenode:9000/zone2 is not the root of an 
> encryption zone. Do you mean /zone2?
> {code}
> The EZ creation succeeds as the path is resolved in 
> DFS#createEncryptionZone(). But while creating the Trash directory, the path 
> is not resolved and it throws the above error.
>  A fully qualified path should be supported by {{crypto}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13109) Support fully qualified hdfs path in EZ commands

2018-02-27 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379287#comment-16379287
 ] 

Hanisha Koneru commented on HDFS-13109:
---

Thanks for the review, [~xyao].

bq. You have already resolved the path in the calling function public void 
provisionEZTrash. You can just pass the resolved path to the private method 
provisionEZTrash instead of getPathName.
[~shahrs87], we would have to call {{getPathName()}} as the 
{{FileSystemLinkResolver.resolve}} function in the calling {{public void 
provisionEZTrash}} doesn't verify that the path belongs to the correct 
filesystem. Please let me know if I am missing something here.

I have reverted {{p.toUri().getPath()}} to {{getPathName(p)}} in patch v04.



> Support fully qualified hdfs path in EZ commands
> 
>
> Key: HDFS-13109
> URL: https://issues.apache.org/jira/browse/HDFS-13109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13109.001.patch, HDFS-13109.002.patch, 
> HDFS-13109.003.patch, HDFS-13109.004.patch
>
>
> When creating an Encryption Zone, if the fully qualified path is specified in 
> the path argument, it throws the following error.
> {code:java}
> ~$ hdfs crypto -createZone -keyName mykey1 -path hdfs://ns1/zone1
> IllegalArgumentException: hdfs://ns1/zone1 is not the root of an encryption 
> zone. Do you mean /zone1?
> ~$ hdfs crypto -createZone -keyName mykey1 -path "hdfs://namenode:9000/zone2" 
> IllegalArgumentException: hdfs://namenode:9000/zone2 is not the root of an 
> encryption zone. Do you mean /zone2?
> {code}
> The EZ creation succeeds as the path is resolved in 
> DFS#createEncryptionZone(). But while creating the Trash directory, the path 
> is not resolved and it throws the above error.
>  A fully qualified path should be supported by {{crypto}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13114) CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path

2018-02-27 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379298#comment-16379298
 ] 

Hanisha Koneru commented on HDFS-13114:
---

Thanks for the reivew, [~xyao].

{{ListZonesCommand#run}} and \{{ListReencryptionStatusCommand#run}} do not have 
path parameters. So we have to fallback to defaultUri only. For these two 
commands, we would need to utilize the generic -fs option to specify the 
nameservice.

> CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path
> 
>
> Key: HDFS-13114
> URL: https://issues.apache.org/jira/browse/HDFS-13114
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13114.001.patch
>
>
> The {{crypto -reencryptZone  -path }} command takes in a path 
> argument. But when creating {{HdfsAdmin}} object, it takes the defaultFs 
> instead of resolving from the path. This causes the following exception if 
> the authority component in path does not match the authority of default Fs.
> {code:java}
> $ hdfs crypto -reencryptZone -start -path hdfs://mycluster-node-1:8020/zone1
> IllegalArgumentException: Wrong FS: hdfs://mycluster-node-1:8020/zone1, 
> expected: hdfs://ns1{code}
> {code:java}
> $ hdfs crypto -reencryptZone -start -path hdfs://ns2/zone2
> IllegalArgumentException: Wrong FS: hdfs://ns2/zone2, expected: 
> hdfs://ns1{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10803) TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently due to no free space available

2018-03-12 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396329#comment-16396329
 ] 

Hanisha Koneru commented on HDFS-10803:
---

Thanks for the fix [~linyiqun].

The patch LGTM. Tested with multiple runs with and without the patch. +1.

> TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails 
> intermittently due to no free space available
> 
>
> Key: HDFS-10803
> URL: https://issues.apache.org/jira/browse/HDFS-10803
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-10803.001.patch
>
>
> The test {{TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools}} 
> fails intermittently. The stack 
> infos(https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithMultipleNameNodes/testBalancing2OutOf3Blockpools/):
> {code}
> java.io.IOException: Creating block, no free space available
>   at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset$BInfo.(SimulatedFSDataset.java:151)
>   at 
> org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.injectBlocks(SimulatedFSDataset.java:580)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.injectBlocks(MiniDFSCluster.java:2679)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.unevenDistribution(TestBalancerWithMultipleNameNodes.java:405)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancing2OutOf3Blockpools(TestBalancerWithMultipleNameNodes.java:516)
> {code}
> The error message means that the datanode's capacity has used up and there is 
> no other space to create a new file block. 
> I looked into the code, I found the main reason seemed that the 
> {{capacities}}  for cluster is not correctly constructed in the second 
> cluster startup before preparing to redistribute blocks in test.
> The related code:
> {code}
>   // Here we do redistribute blocks nNameNodes times for each node,
>   // we need to adjust the capacities. Otherwise it will cause the no 
>   // free space errors sometimes.
>   final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
>   .nnTopology(MiniDFSNNTopology.simpleFederatedTopology(nNameNodes))
>   .numDataNodes(nDataNodes)
>   .racks(racks)
>   .simulatedCapacities(newCapacities)
>   .format(false)
>   .build();
>   LOG.info("UNEVEN 11");
> ...
> for(int n = 0; n < nNameNodes; n++) {
>   // redistribute blocks
>   final Block[][] blocksDN = TestBalancer.distributeBlocks(
>   blocks[n], s.replication, distributionPerNN);
> 
>   for(int d = 0; d < blocksDN.length; d++)
> cluster.injectBlocks(n, d, Arrays.asList(blocksDN[d]));
>   LOG.info("UNEVEN 13: n=" + n);
> }
> {code}
> And that means the totalUsed value has been increased as 
> {{nNameNodes*usedSpacePerNN}} rather than {{usedSpacePerNN}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13239) Fix non-empty dir warning message when setting default EC policy

2018-03-12 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396320#comment-16396320
 ] 

Hanisha Koneru commented on HDFS-13239:
---

+1 pending Jenkins.

> Fix non-empty dir warning message when setting default EC policy
> 
>
> Key: HDFS-13239
> URL: https://issues.apache.org/jira/browse/HDFS-13239
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Bharat Viswanadham
>Priority: Minor
> Attachments: HDFS-13239.00.patch, HDFS-13239.01.patch, 
> HDFS-13239.02.patch, HDFS-13239.03.patch, HDFS-13239.04.patch
>
>
> When EC policy is set on a non-empty directory, the following warning message 
> is given:
> {code}
> $hdfs ec -setPolicy -policy RS-6-3-1024k -path /ec1
> Warning: setting erasure coding policy on a non-empty directory will not 
> automatically convert existing files to RS-6-3-1024k
> {code}
> When we do not specify the -policy parameter when setting EC policy on a 
> directory, it takes the default EC policy. Setting default EC policy in this 
> way on a non-empty directory gives the following warning message:
> {code}
> $hdfs ec -setPolicy -path /ec2
> Warning: setting erasure coding policy on a non-empty directory will not 
> automatically convert existing files to null
> {code}
> Notice that the warning message in the 2nd case has the ecPolicy name shown 
> as null. We should instead give the default EC policy name in this message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13239) Fix non-empty dir warning message when setting default EC policy

2018-03-08 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391730#comment-16391730
 ] 

Hanisha Koneru commented on HDFS-13239:
---

Thanks for working on this, [~bharatviswa].

Looks good to me overall, just few minor comments:
 # We can directly assign the default policy to {{ecPolicyName}} and not need 
another variable {{ecName}}.
 # We would not need the below if condition as ecPolicyName cannot be null 
anymore.

{code:java}
if (ecPolicyName == null){
  System.out.println("Set default erasure coding policy " + ecName
  + " on " + path);
} {code}

> Fix non-empty dir warning message when setting default EC policy
> 
>
> Key: HDFS-13239
> URL: https://issues.apache.org/jira/browse/HDFS-13239
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Bharat Viswanadham
>Priority: Minor
> Attachments: HDFS-13239.00.patch
>
>
> When EC policy is set on a non-empty directory, the following warning message 
> is given:
> {code}
> $hdfs ec -setPolicy -policy RS-6-3-1024k -path /ec1
> Warning: setting erasure coding policy on a non-empty directory will not 
> automatically convert existing files to RS-6-3-1024k
> {code}
> When we do not specify the -policy parameter when setting EC policy on a 
> directory, it takes the default EC policy. Setting default EC policy in this 
> way on a non-empty directory gives the following warning message:
> {code}
> $hdfs ec -setPolicy -path /ec2
> Warning: setting erasure coding policy on a non-empty directory will not 
> automatically convert existing files to null
> {code}
> Notice that the warning message in the 2nd case has the ecPolicy name shown 
> as null. We should instead give the default EC policy name in this message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13239) Fix non-empty dir warning message when setting default EC policy

2018-03-08 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391816#comment-16391816
 ] 

Hanisha Koneru commented on HDFS-13239:
---

Thanks [~bharatviswa]. Got it now.

Can we have some boolean {{isDefault}} or something instead of {{ecName}}. The 
two variables {{ecName}} and {{ecPolicyName}} are confusing :).

> Fix non-empty dir warning message when setting default EC policy
> 
>
> Key: HDFS-13239
> URL: https://issues.apache.org/jira/browse/HDFS-13239
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Bharat Viswanadham
>Priority: Minor
> Attachments: HDFS-13239.00.patch
>
>
> When EC policy is set on a non-empty directory, the following warning message 
> is given:
> {code}
> $hdfs ec -setPolicy -policy RS-6-3-1024k -path /ec1
> Warning: setting erasure coding policy on a non-empty directory will not 
> automatically convert existing files to RS-6-3-1024k
> {code}
> When we do not specify the -policy parameter when setting EC policy on a 
> directory, it takes the default EC policy. Setting default EC policy in this 
> way on a non-empty directory gives the following warning message:
> {code}
> $hdfs ec -setPolicy -path /ec2
> Warning: setting erasure coding policy on a non-empty directory will not 
> automatically convert existing files to null
> {code}
> Notice that the warning message in the 2nd case has the ecPolicy name shown 
> as null. We should instead give the default EC policy name in this message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI

2018-03-08 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391848#comment-16391848
 ] 

Hanisha Koneru commented on HDFS-13244:
---

+1 pending Jenkins. Will trigger a Jenkins run.

> Add stack, conf, metrics links to utilities dropdown in NN webUI
> 
>
> Key: HDFS-13244
> URL: https://issues.apache.org/jira/browse/HDFS-13244
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 
> AM.png
>
>
> Add stack, conf, metrics links to utilities dropdown in NN webUI 
> cc [~arpitagarwal] for suggesting this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI

2018-03-09 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393728#comment-16393728
 ] 

Hanisha Koneru commented on HDFS-13244:
---

Looks like Jenkins cannot process html changes. Thanks for pointing it out 
[~elgoiri].

I will commit this shortly.

> Add stack, conf, metrics links to utilities dropdown in NN webUI
> 
>
> Key: HDFS-13244
> URL: https://issues.apache.org/jira/browse/HDFS-13244
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 
> AM.png
>
>
> Add stack, conf, metrics links to utilities dropdown in NN webUI 
> cc [~arpitagarwal] for suggesting this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI

2018-03-09 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393728#comment-16393728
 ] 

Hanisha Koneru edited comment on HDFS-13244 at 3/9/18 11:01 PM:


Looks like Jenkins cannot process html changes. Thanks for pointing it out 
[~elgoiri].

Tested it on a test cluster. I will commit this shortly.


was (Author: hanishakoneru):
Looks like Jenkins cannot process html changes. Thanks for pointing it out 
[~elgoiri].

I will commit this shortly.

> Add stack, conf, metrics links to utilities dropdown in NN webUI
> 
>
> Key: HDFS-13244
> URL: https://issues.apache.org/jira/browse/HDFS-13244
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 
> AM.png
>
>
> Add stack, conf, metrics links to utilities dropdown in NN webUI 
> cc [~arpitagarwal] for suggesting this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13023) Journal Sync does not work on a secure cluster

2018-03-09 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13023:
--
Fix Version/s: 3.1.0

> Journal Sync does not work on a secure cluster
> --
>
> Key: HDFS-13023
> URL: https://issues.apache.org/jira/browse/HDFS-13023
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HDFS-13023.00.patch, HDFS-13023.01.patch, 
> HDFS-13023.02.patch, HDFS-13023.03.patch
>
>
> Fails with the following exception.
> {code}
> 2018-01-10 01:15:40,517 INFO server.JournalNodeSyncer 
> (JournalNodeSyncer.java:syncWithJournalAtIndex(235)) - Syncing Journal 
> /0.0.0.0:8485 with xxx, journal id: mycluster
>  2018-01-10 01:15:40,583 ERROR server.JournalNodeSyncer 
> (JournalNodeSyncer.java:syncWithJournalAtIndex(259)) - Could not sync with 
> Journal at xxx/xxx:8485
>  com.google.protobuf.ServiceException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User nn/xxx (auth:PROXY) via jn/xxx (auth:KERBEROS) is not authorized for 
> protocol interface org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol: 
> this service is only accessible by nn/x...@example.com
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:242)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>  at com.sun.proxy.$Proxy16.getEditLogManifest(Unknown Source)
>  at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:254)
>  at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:230)
>  at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:190)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User nn/xxx (auth:PROXY) via jn/xxx (auth:KERBEROS) is not authorized for 
> protocol interface org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol: 
> this service is only accessible by nn/xxx
>  at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>  ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI

2018-03-09 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13244:
--
   Resolution: Fixed
Fix Version/s: 3.2.0
   3.0.1
   3.1.0
   Status: Resolved  (was: Patch Available)

> Add stack, conf, metrics links to utilities dropdown in NN webUI
> 
>
> Key: HDFS-13244
> URL: https://issues.apache.org/jira/browse/HDFS-13244
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 3.2.0
>
> Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 
> AM.png
>
>
> Add stack, conf, metrics links to utilities dropdown in NN webUI 
> cc [~arpitagarwal] for suggesting this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI

2018-03-09 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393830#comment-16393830
 ] 

Hanisha Koneru commented on HDFS-13244:
---

Committed to trunk, branch-3.1 and branch-3.0.

Thanks for the contribution [~bharatviswa] and thanks for the review [~ajayydv].

> Add stack, conf, metrics links to utilities dropdown in NN webUI
> 
>
> Key: HDFS-13244
> URL: https://issues.apache.org/jira/browse/HDFS-13244
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 3.2.0
>
> Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 
> AM.png
>
>
> Add stack, conf, metrics links to utilities dropdown in NN webUI 
> cc [~arpitagarwal] for suggesting this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11394) Add method for getting erasure coding policy through WebHDFS

2018-03-08 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-11394:
--
Attachment: HDFS-11394.006.patch

> Add method for getting erasure coding policy through WebHDFS 
> -
>
> Key: HDFS-11394
> URL: https://issues.apache.org/jira/browse/HDFS-11394
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, namenode
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Major
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-11394.005.patch, HDFS-11394.006.patch, 
> HDFS-11394.01.patch, HDFS-11394.02.patch, HDFS-11394.03.patch, 
> HDFS-11394.04.patch
>
>
> We can expose erasure coding policy by erasure coded directory through 
> WebHDFS method as well as storage policy. This information can be used by 
> NameNode Web UI and show the detail of erasure coded directories.
> see: HDFS-8196



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11394) Add method for getting erasure coding policy through WebHDFS

2018-03-08 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392240#comment-16392240
 ] 

Hanisha Koneru commented on HDFS-11394:
---

Thanks for the review, [~arpitagarwal].

Addressed javadoc and checkstyle issues in patch v06.

> Add method for getting erasure coding policy through WebHDFS 
> -
>
> Key: HDFS-11394
> URL: https://issues.apache.org/jira/browse/HDFS-11394
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, namenode
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Major
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-11394.005.patch, HDFS-11394.006.patch, 
> HDFS-11394.01.patch, HDFS-11394.02.patch, HDFS-11394.03.patch, 
> HDFS-11394.04.patch
>
>
> We can expose erasure coding policy by erasure coded directory through 
> WebHDFS method as well as storage policy. This information can be used by 
> NameNode Web UI and show the detail of erasure coded directories.
> see: HDFS-8196



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13239) Fix non-empty dir warning message when setting default EC policy

2018-03-08 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392096#comment-16392096
 ] 

Hanisha Koneru commented on HDFS-13239:
---

Thanks [~bharatviswa].

+1 for patch v01 pending Jenkins.

> Fix non-empty dir warning message when setting default EC policy
> 
>
> Key: HDFS-13239
> URL: https://issues.apache.org/jira/browse/HDFS-13239
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Bharat Viswanadham
>Priority: Minor
> Attachments: HDFS-13239.00.patch, HDFS-13239.01.patch
>
>
> When EC policy is set on a non-empty directory, the following warning message 
> is given:
> {code}
> $hdfs ec -setPolicy -policy RS-6-3-1024k -path /ec1
> Warning: setting erasure coding policy on a non-empty directory will not 
> automatically convert existing files to RS-6-3-1024k
> {code}
> When we do not specify the -policy parameter when setting EC policy on a 
> directory, it takes the default EC policy. Setting default EC policy in this 
> way on a non-empty directory gives the following warning message:
> {code}
> $hdfs ec -setPolicy -path /ec2
> Warning: setting erasure coding policy on a non-empty directory will not 
> automatically convert existing files to null
> {code}
> Notice that the warning message in the 2nd case has the ecPolicy name shown 
> as null. We should instead give the default EC policy name in this message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13244) Add stack, conf, metrics links to utilities dropdown in NN webUI

2018-03-08 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392179#comment-16392179
 ] 

Hanisha Koneru commented on HDFS-13244:
---

No but just wanted Jenkins to +1 it to follow conventions :)

> Add stack, conf, metrics links to utilities dropdown in NN webUI
> 
>
> Key: HDFS-13244
> URL: https://issues.apache.org/jira/browse/HDFS-13244
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13244.00.patch, Screen Shot 2018-03-07 at 11.28.27 
> AM.png
>
>
> Add stack, conf, metrics links to utilities dropdown in NN webUI 
> cc [~arpitagarwal] for suggesting this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13148) Unit test for EZ with KMS and Federation

2018-03-08 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392091#comment-16392091
 ] 

Hanisha Koneru commented on HDFS-13148:
---

Hi [~shahrs87], thanks for the review.

If we make {{TestEncryptionZonesWithKMSandFederation}} extend 
{{TestEncryptionZonesWithKMS}} then the former would run all the test cases in 
the later with its initial {{setup}}. This would fail all the tests in 
{{TestEncryptionZonesWithKMS}} run against the setup of 
{{TestEncryptionZonesWithKMSandFederation}}. To get over this, we would have to 
modify all the test cases in {{TestEncryptionZonesWithKMS}} and 
{{TestEncryptionZones}} to work with the federated configuration setup. For 
example, instead of using variable {{dfsAdmin}} in 
{{TestEncryptionZonesWithKMS}}, we would have to change it to \{{dsfAdmin[0]}} 
to match the federated setup.

I think it would just complicate all the three Tests by doing this. We could 
instead have a \{{TestEncryptionZonesBaseTest}} class and make all the other 
Tests extend this class. Or we could just keep the non-federated Tests and 
federated Tests separate (as is in patch v03). Please let me know your thoughts.

> Unit test for EZ with KMS and Federation
> 
>
> Key: HDFS-13148
> URL: https://issues.apache.org/jira/browse/HDFS-13148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13148.001.patch, HDFS-13148.002.patch, 
> HDFS-13148.003.patch
>
>
> It would be good to have some unit tests for testing KMS and EZ on a 
> federated cluster. We can start with basic EZ operations. For example, create 
> EZs on two namespaces with different keys using one KMS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13442) Handle Datanode Registration failure

2018-04-12 Thread Hanisha Koneru (JIRA)
Hanisha Koneru created HDFS-13442:
-

 Summary: Handle Datanode Registration failure
 Key: HDFS-13442
 URL: https://issues.apache.org/jira/browse/HDFS-13442
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


If a datanode is not able to register itself, we need to handle that correctly. 

If the number of unsuccessful attempts to register with the SCM exceeds a 
configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13442) Handle Datanode Registration failure

2018-04-12 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13442:
--
Attachment: HDFS-13442-HDFS-7240.001.patch

> Handle Datanode Registration failure
> 
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-12 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13442:
--
Summary: Ozone: Handle Datanode Registration failure  (was: Handle Datanode 
Registration failure)

> Ozone: Handle Datanode Registration failure
> ---
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-18 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13442:
--
Attachment: HDFS-13442-HDFS-7240.002.patch

> Ozone: Handle Datanode Registration failure
> ---
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch, 
> HDFS-13442-HDFS-7240.002.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-18 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13442:
--
Attachment: (was: HDFS-13442-HDFS-7240.002.patch)

> Ozone: Handle Datanode Registration failure
> ---
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13079) Provide a config to start namenode in safemode state upto a certain transaction id

2018-04-18 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443191#comment-16443191
 ] 

Hanisha Koneru commented on HDFS-13079:
---

Thanks for working on this [~shashikant]. 
bq. Please note that in case a checkpoint has already happened and the 
requested transaction id has been subsumed in an FSImage, then the namenode 
will be started with the next nearest transaction id. Further FSImage files and 
edits will be ignored.
In case the requested tx id falls within the latest fsImage , do we want to 
load the said fsImage or fallback to a previous fsimage with lastTxId < 
requested txId. IMO, we should load the fsImage with the endTxId <= requested 
txId.

* In {{FsImage#loadFSImage}}, the check for whether we should load a fsImage is 
made after the image is already being loaded. The line {{loader.load(curFile, 
requireSameLayoutVersion)}} loads the fsImage transactions into the NN.
{code}
FSImageFormat.LoaderDelegator loader = FSImageFormat.newLoader(conf, target);
loader.load(curFile, requireSameLayoutVersion);

long lastTxIdToLoad = target.getLastTxidToLoad();
long txId = loader.getLoadedImageTxId();
if (lastTxIdToLoad != HdfsServerConstants.INVALID_TXID && txId > 
lastTxIdToLoad) {
{code}

* When we skip loading the latest fsImage, we should keep falling back to try 
and load the next latest fsImage. For example, say we have the 2 fsImages - 
fsimage_00090 and fsimage_00150. Now say we want to start the namenode in 
safemode upto txId 120. We first check fsimage_00150 and reject it. After this, 
the NN should attempt to load the next latest fsimage i.e. fsimage_00090. 
We can throw an exception when skipping an fsImage and catch that exception in 
following code path in {{FSImage#loadFsImage}}. This way the next latest 
fsimage will be loaded.
{code}
721FSImageFile imageFile = null;
722for (int i = 0; i < imageFiles.size(); i++) {
723  try {
724imageFile = imageFiles.get(i);
725loadFSImageFile(target, recovery, imageFile, startOpt);
726break;

{code}

* What do we do when there are no fsImages with endTxId <= requested txId? 
IMO, we should stop the NN and throw an error.


> Provide a config to start namenode in safemode state upto a certain 
> transaction id
> --
>
> Key: HDFS-13079
> URL: https://issues.apache.org/jira/browse/HDFS-13079
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13079.001.patch, HDFS-13079.002.patch
>
>
> In some cases it necessary to rollback the Namenode back to a certain 
> transaction id. This is especially needed when the user issues a {{rm -Rf 
> -skipTrash}} by mistake.
> Rolling back to a transaction id helps in taking a peek at the filesystem at 
> a particular instant. This jira proposes to provide a configuration variable 
> using which the namenode can be started upto a certain transaction id. The 
> filesystem will be in a readonly safemode which cannot be overridden 
> manually. It will only be overridden by removing the config value from the 
> config file. Please also note that this will not cause any changes in the 
> filesystem state, the filesystem will be in safemode state and no changes to 
> the filesystem state will be allowed.
> Please note that in case a checkpoint has already happened and the requested 
> transaction id has been subsumed in an FSImage, then the namenode will be 
> started with the next nearest transaction id. Further FSImage files and edits 
> will be ignored.
> If the checkpoint hasn't happen then the namenode will be started with the 
> exact transaction id.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-18 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13442:
--
Attachment: HDFS-13442-HDFS-7240.002.patch

> Ozone: Handle Datanode Registration failure
> ---
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch, 
> HDFS-13442-HDFS-7240.002.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13372) New Expunge Replica Trash Client-Namenode-Protocol

2018-04-20 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446270#comment-16446270
 ] 

Hanisha Koneru commented on HDFS-13372:
---

Thanks for working on this [~bharatviswa]. The patch LGTM overall. 

Just two NITs: 
# In {{DFSAdmin#expungeReplicaTrash} there is a typo in the System.out message 
-> "operation is queued and will be sent to -in- datanodes".
# In {{RouterRpcServer}}, can you please remove the space between {{TO DO}} and 
add a description on what the todo is. Just so that it shows up when viewing 
TODO items in editor tool window.

Will trigger Jenkins manually if it doesn't run next time as well.

> New Expunge Replica Trash Client-Namenode-Protocol
> --
>
> Key: HDFS-13372
> URL: https://issues.apache.org/jira/browse/HDFS-13372
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13372-HDFS-12996.00.patch
>
>
> When client issues an expunge replica-trash RPC call to Namenode, the 
> Namenode will queue
> a new heartbeat command response - DN_EXPUNGE directing the DataNodes to 
> expunge the
> replica-trash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13373) Handle expunge command on NN and DN

2018-04-20 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446358#comment-16446358
 ] 

Hanisha Koneru commented on HDFS-13373:
---

Thanks for the patch [~bharatviswa].
# In {{PBHelper#convert(ReplicaTrashCommandProto)}}, the switch case should 
have a default case which throws AssertionError (please refer to 
{{BlockCommandProto convert(BlockCommand cmd)}}).
# In {{BPOfferService#processCommandFromActive}}, can we move the check whether 
the command is ReplicaTrashCommand inside the switch case so as to avoid making 
this check for every Datanode command.
{code}
final ReplicaTrashCommand rcmd = cmd instanceof ReplicaTrashCommand ? 
(ReplicaTrashCommand)cmd : null;
{code}
# We should handle the case where blocks are deleted and moved to Replica Trash 
after the expunge command is issued. These new blocks should not be removed 
from the Replica Trash. One option is to note down the timestamp when expunge 
command was received. All invalidated blocks moved to replica trash after this 
timestamp should not be expunged.

> Handle expunge command on NN and DN
> ---
>
> Key: HDFS-13373
> URL: https://issues.apache.org/jira/browse/HDFS-13373
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13373-HDFS-12996.00.patch
>
>
> When DataNodes receive the DN_EXPUNGE command from Namenode, they will
> purge all the block replicas in replica-trash



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13373) Handle expunge command on NN and DN

2018-04-20 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446358#comment-16446358
 ] 

Hanisha Koneru edited comment on HDFS-13373 at 4/20/18 9:00 PM:


Thanks for the patch [~bharatviswa].
# In {{PBHelper#convert(ReplicaTrashCommandProto)}}, the switch case should 
have a default case which throws AssertionError (please refer to 
{{BlockCommandProto convert(BlockCommand cmd)}}).
# In {{BPOfferService#processCommandFromActive}}, can we move the check whether 
the command is ReplicaTrashCommand inside the switch case so as to avoid making 
this check for every Datanode command.
{code}
final ReplicaTrashCommand rcmd = cmd instanceof ReplicaTrashCommand ? 
(ReplicaTrashCommand)cmd : null;
{code}
# We should handle the case where blocks are deleted and moved to Replica Trash 
after the expunge command is issued. These new blocks should not be removed 
from the Replica Trash. One option is to note down the timestamp when expunge 
command was received. All invalidated blocks moved to replica trash after this 
timestamp will not be expunged.


was (Author: hanishakoneru):
Thanks for the patch [~bharatviswa].
# In {{PBHelper#convert(ReplicaTrashCommandProto)}}, the switch case should 
have a default case which throws AssertionError (please refer to 
{{BlockCommandProto convert(BlockCommand cmd)}}).
# In {{BPOfferService#processCommandFromActive}}, can we move the check whether 
the command is ReplicaTrashCommand inside the switch case so as to avoid making 
this check for every Datanode command.
{code}
final ReplicaTrashCommand rcmd = cmd instanceof ReplicaTrashCommand ? 
(ReplicaTrashCommand)cmd : null;
{code}
# We should handle the case where blocks are deleted and moved to Replica Trash 
after the expunge command is issued. These new blocks should not be removed 
from the Replica Trash. One option is to note down the timestamp when expunge 
command was received. All invalidated blocks moved to replica trash after this 
timestamp should not be expunged.

> Handle expunge command on NN and DN
> ---
>
> Key: HDFS-13373
> URL: https://issues.apache.org/jira/browse/HDFS-13373
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13373-HDFS-12996.00.patch
>
>
> When DataNodes receive the DN_EXPUNGE command from Namenode, they will
> purge all the block replicas in replica-trash



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-19 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444626#comment-16444626
 ] 

Hanisha Koneru commented on HDFS-13442:
---

In patch v02, I just changed the config name and updated 
\{{StorageContainerDatanodeProtocol.proto}} to add a new ErrorCode - 
{{nodeAlreadyRegistered}}

> Ozone: Handle Datanode Registration failure
> ---
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch, 
> HDFS-13442-HDFS-7240.002.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-17 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441526#comment-16441526
 ] 

Hanisha Koneru commented on HDFS-13442:
---

Thanks for the review [~anu].

This patch only modifies the case when we get _errorNodeNotPermitted_. This 
happens when the node is able to contact the SCM but SCM does not register the 
node. 
{quote}if the data nodes boot up earlier than SCM we would not want the data 
nodes to do silent after 10 tries
{quote}
In this case, the datanode keeps retrying as the EndPointTask state remains as 
{{HEARTBEAT}}. In the code snippet below, if the datanode does not get a 
response from SCM, it catches the exception and logs it, if needed.
{code:java}
try {
  SCMRegisteredCmdResponseProto response = rpcEndPoint.getEndPoint()
  .register(datanodeDetails.getProtoBufMessage(),
  conf.getStrings(ScmConfigKeys.OZONE_SCM_NAMES));
  ...
  ...
  processResponse(response);
} catch (IOException ex) {
  rpcEndPoint.logIfNeeded(ex);
}
{code}
{quote}also in the case, we get the error, errorNodeNotPermitted, should we 
shut down the data node and create some kind of error record on SCM so we can 
get that info back from SCM? I am also ok with the current approach where we 
will let the system slowly go time out.
{quote}
I think we should let the DN make a few retries before shutting it down.

> Ozone: Handle Datanode Registration failure
> ---
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-17 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441526#comment-16441526
 ] 

Hanisha Koneru edited comment on HDFS-13442 at 4/17/18 9:34 PM:


Thanks for the review [~anu].

This patch only modifies the case when we get _errorNodeNotPermitted_. This 
happens when the node is able to contact the SCM but SCM does not register the 
node. 
{quote}if the data nodes boot up earlier than SCM we would not want the data 
nodes to do silent after 10 tries
{quote}
In this case, the datanode keeps retrying as the EndPointTask state remains as 
{{REGISTER}}. In the code snippet below, if the datanode does not get a 
response from SCM, it catches the exception and logs it, if needed.
{code:java}
try {
  SCMRegisteredCmdResponseProto response = rpcEndPoint.getEndPoint()
  .register(datanodeDetails.getProtoBufMessage(),
  conf.getStrings(ScmConfigKeys.OZONE_SCM_NAMES));
  ...
  ...
  processResponse(response);
} catch (IOException ex) {
  rpcEndPoint.logIfNeeded(ex);
}
{code}
{quote}also in the case, we get the error, errorNodeNotPermitted, should we 
shut down the data node and create some kind of error record on SCM so we can 
get that info back from SCM? I am also ok with the current approach where we 
will let the system slowly go time out.
{quote}
I think we should let the DN make a few retries before shutting it down.


was (Author: hanishakoneru):
Thanks for the review [~anu].

This patch only modifies the case when we get _errorNodeNotPermitted_. This 
happens when the node is able to contact the SCM but SCM does not register the 
node. 
{quote}if the data nodes boot up earlier than SCM we would not want the data 
nodes to do silent after 10 tries
{quote}
In this case, the datanode keeps retrying as the EndPointTask state remains as 
{{HEARTBEAT}}. In the code snippet below, if the datanode does not get a 
response from SCM, it catches the exception and logs it, if needed.
{code:java}
try {
  SCMRegisteredCmdResponseProto response = rpcEndPoint.getEndPoint()
  .register(datanodeDetails.getProtoBufMessage(),
  conf.getStrings(ScmConfigKeys.OZONE_SCM_NAMES));
  ...
  ...
  processResponse(response);
} catch (IOException ex) {
  rpcEndPoint.logIfNeeded(ex);
}
{code}
{quote}also in the case, we get the error, errorNodeNotPermitted, should we 
shut down the data node and create some kind of error record on SCM so we can 
get that info back from SCM? I am also ok with the current approach where we 
will let the system slowly go time out.
{quote}
I think we should let the DN make a few retries before shutting it down.

> Ozone: Handle Datanode Registration failure
> ---
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-19 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444977#comment-16444977
 ] 

Hanisha Koneru commented on HDFS-13477:
---

Thanks for the patch [~ajayydv]. 

The patch LGTM overall. 

In case the httpServer start fails, should we add the httpServer as a Service 
port to KSM services in {{KeySpaceManager#getServiceList()}}?

> Httpserver start failure should be non fatal for KSM and SCM startup
> 
>
> Key: HDFS-13477
> URL: https://issues.apache.org/jira/browse/HDFS-13477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13477-HDFS-7240.00.patch
>
>
> Currently KSM and SCM startup will fail if corresponding HttpServer fails 
> with some Exception. HttpServer is not essential for operations of KSM and 
> SCM so we should allow them to start even if httpServer fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13277) Improve move to Replica trash to limit trash sub-dir size

2018-03-26 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13277:
--
Summary: Improve move to Replica trash to limit trash sub-dir size  (was: 
Improve move to account for usage (number of files) to limit trash dir size)

> Improve move to Replica trash to limit trash sub-dir size
> -
>
> Key: HDFS-13277
> URL: https://issues.apache.org/jira/browse/HDFS-13277
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13277-HDFS-12996.00.patch, 
> HDFS-13277-HDFS-12996.01.patch, HDFS-13277-HDFS-12996.02.patch, 
> HDFS-13277-HDFS-12996.03.patch, HDFS-13277-HDFS-12996.04.patch, 
> HDFS-13277-HDFS-12996.06.patch
>
>
> The trash subdirectory maximum entries. This puts an upper limit on the size 
> of subdirectories in replica-trash. Set this default value to 
> blockinvalidateLimit.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13329) Add/ Update disk space counters for trash (trash used, disk remaining etc.)

2018-04-03 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424760#comment-16424760
 ] 

Hanisha Koneru edited comment on HDFS-13329 at 4/3/18 11:39 PM:


Thanks for working on this, [~bharatviswa]. 

Looks good overall. I have a some comments:
 # Can you add Javadoc and License to the 
{color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}.
 # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and 
the excludedPath and then subtracting the later from the former. We end up 
calculating the space used by replica trash twice this way.
{code:java}
setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 
1024));{code}
we could instead utilized the {{--exclude}} option of {{du}} command.
 Also, can we add the exclude option to {{DU.java}} itself instead of another 
class? I am not sure how complicated that would get though. I am ok with this 
approach too.

 # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be 
consistent with the naming.
 # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo.
{code:java}
assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize 
+ slack));
{code}
Should have been
{code:java}
 du <= (writtenSize + slack) {code}

 # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters 
after the {{DFSRemaining%}} counter.
 # In {{DFSConfigKeys}},
{code:java}
  public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT =  
"dfs.datanode.replica.trash.keep.alive.interval";
{code}
The value for the config parameter is mistyped.

 # In {{BlockPoolSlice}},
 ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used.
 ** In {{loadReplicaTrashUsed}}, if we are using separate 
{{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we 
should have separate Cache files too.
 # {{FsVolumeImpl#replicaTrashLimit}} variable can be final.
 # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number 
of blocks count in the BP.
 # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled 
or not and send the report accordingly?


was (Author: hanishakoneru):
Thanks for working on this, [~bharatviswa]. 

Looks good overall. I have a few comments:
 # Can you add Javadoc and License to the 
{color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}.
 # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and 
the excludedPath and then subtracting the later from the former. We end up 
calculating the space used by replica trash twice this way.
{code:java}
setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 
1024));{code}
we could instead utilized the {{--exclude}} option of {{du}} command.
 Also, can we add the exclude option to {{DU.java}} itself instead of another 
class? I am not sure how complicated that would get though. I am ok with this 
approach too.

 # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be 
consistent with the naming.
 # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo.
{code:java}
assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize 
+ slack));
{code}
Should have been
{code:java}
 du <= (writtenSize + slack) {code}

 # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters 
after the {{DFSRemaining%}} counter.
 # In {{DFSConfigKeys}},
{code:java}
  public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT =  
"dfs.datanode.replica.trash.keep.alive.interval";
{code}
The value for the config parameter is mistyped.

 # In {{BlockPoolSlice}},
 ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used.
 ** In {{loadReplicaTrashUsed}}, if we are using separate 
{{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we 
should have separate Cache files too.
 # {{FsVolumeImpl#replicaTrashLimit}} variable can be final.
 # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number 
of blocks count in the BP.
 # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled 
or not and send the report accordingly?

> Add/ Update disk space counters for trash (trash used, disk remaining etc.) 
> 
>
> Key: HDFS-13329
> URL: https://issues.apache.org/jira/browse/HDFS-13329
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13329-HDFS-12996.01.patch, 
> HDFS-13329-HDFS-12996.02.patch
>
>
> Add 3 more counters required for datanode replica trash.
>  # diskAvailable
>  # 

[jira] [Comment Edited] (HDFS-13329) Add/ Update disk space counters for trash (trash used, disk remaining etc.)

2018-04-03 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424760#comment-16424760
 ] 

Hanisha Koneru edited comment on HDFS-13329 at 4/3/18 11:41 PM:


Thanks for working on this, [~bharatviswa]. 

Looks good overall. I have a some comments:
1. Can you add Javadoc and License to the 
{color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}.

2. In {{DUWithExclude}}, we are calculating the {{du}} for both the path and 
the excludedPath and then subtracting the later from the former. We end up 
calculating the space used by replica trash twice this way.
{code:java}
setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 
1024));{code}
we could instead utilized the {{--exclude}} option of {{du}} command.
 Also, can we add the exclude option to {{DU.java}} itself instead of another 
class? I am not sure how complicated that would get though. I am ok with this 
approach too.

3. Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be 
consistent with the naming.

4. In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo.
{code:java}
assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize 
+ slack));
{code}
Should have been
{code:java}
 du <= (writtenSize + slack) {code}

5. In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters 
after the {{DFSRemaining%}} counter.
 
6. In {{DFSConfigKeys}},
{code:java}
  public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT =  
"dfs.datanode.replica.trash.keep.alive.interval";
{code}
The value for the config parameter is mistyped.

7. In {{BlockPoolSlice}},
 ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used.
 ** In {{loadReplicaTrashUsed}}, if we are using separate 
{{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we 
should have separate Cache files too.
 
8. {{FsVolumeImpl#replicaTrashLimit}} variable can be final.

9. In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number 
of blocks count in the BP.

10. {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled 
or not and send the report accordingly?


was (Author: hanishakoneru):
Thanks for working on this, [~bharatviswa]. 

Looks good overall. I have a some comments:
 # Can you add Javadoc and License to the 
{color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}.
 # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and 
the excludedPath and then subtracting the later from the former. We end up 
calculating the space used by replica trash twice this way.
{code:java}
setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 
1024));{code}
we could instead utilized the {{--exclude}} option of {{du}} command.
 Also, can we add the exclude option to {{DU.java}} itself instead of another 
class? I am not sure how complicated that would get though. I am ok with this 
approach too.

 # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be 
consistent with the naming.
 # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo.
{code:java}
assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize 
+ slack));
{code}
Should have been
{code:java}
 du <= (writtenSize + slack) {code}

 # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters 
after the {{DFSRemaining%}} counter.
 # In {{DFSConfigKeys}},
{code:java}
  public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT =  
"dfs.datanode.replica.trash.keep.alive.interval";
{code}
The value for the config parameter is mistyped.

 # In {{BlockPoolSlice}},
 ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used.
 ** In {{loadReplicaTrashUsed}}, if we are using separate 
{{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we 
should have separate Cache files too.
 # {{FsVolumeImpl#replicaTrashLimit}} variable can be final.
 # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number 
of blocks count in the BP.
 # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled 
or not and send the report accordingly?

> Add/ Update disk space counters for trash (trash used, disk remaining etc.) 
> 
>
> Key: HDFS-13329
> URL: https://issues.apache.org/jira/browse/HDFS-13329
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13329-HDFS-12996.01.patch, 
> HDFS-13329-HDFS-12996.02.patch
>
>
> Add 3 more counters required for datanode replica trash.
>  # diskAvailable
>  # 

[jira] [Comment Edited] (HDFS-13329) Add/ Update disk space counters for trash (trash used, disk remaining etc.)

2018-04-03 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424760#comment-16424760
 ] 

Hanisha Koneru edited comment on HDFS-13329 at 4/3/18 11:39 PM:


Thanks for working on this, [~bharatviswa]. 

Looks good overall. I have a some comments:
 # Can you add Javadoc and License to the 
{color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}.
 # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and 
the excludedPath and then subtracting the later from the former. We end up 
calculating the space used by replica trash twice this way.
{code:java}
setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 
1024));{code}
we could instead utilized the {{--exclude}} option of {{du}} command.
 Also, can we add the exclude option to {{DU.java}} itself instead of another 
class? I am not sure how complicated that would get though. I am ok with this 
approach too.

 # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be 
consistent with the naming.
 # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo.
{code:java}
assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize 
+ slack));
{code}
Should have been
{code:java}
 du <= (writtenSize + slack) {code}

 # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters 
after the {{DFSRemaining%}} counter.
 # In {{DFSConfigKeys}},
{code:java}
  public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT =  
"dfs.datanode.replica.trash.keep.alive.interval";
{code}
The value for the config parameter is mistyped.

 # In {{BlockPoolSlice}},
 ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used.
 ** In {{loadReplicaTrashUsed}}, if we are using separate 
{{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we 
should have separate Cache files too.
 # {{FsVolumeImpl#replicaTrashLimit}} variable can be final.
 # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number 
of blocks count in the BP.
 # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled 
or not and send the report accordingly?


was (Author: hanishakoneru):
Thanks for working on this, [~bharatviswa]. 

Looks good overall. I have a some comments:
 # Can you add Javadoc and License to the 
{color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}.
 # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and 
the excludedPath and then subtracting the later from the former. We end up 
calculating the space used by replica trash twice this way.
{code:java}
setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 
1024));{code}
we could instead utilized the {{--exclude}} option of {{du}} command.
 Also, can we add the exclude option to {{DU.java}} itself instead of another 
class? I am not sure how complicated that would get though. I am ok with this 
approach too.

 # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be 
consistent with the naming.
 # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo.
{code:java}
assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize 
+ slack));
{code}
Should have been
{code:java}
 du <= (writtenSize + slack) {code}

 # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters 
after the {{DFSRemaining%}} counter.
 # In {{DFSConfigKeys}},
{code:java}
  public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT =  
"dfs.datanode.replica.trash.keep.alive.interval";
{code}
The value for the config parameter is mistyped.

 # In {{BlockPoolSlice}},
 ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used.
 ** In {{loadReplicaTrashUsed}}, if we are using separate 
{{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we 
should have separate Cache files too.
 # {{FsVolumeImpl#replicaTrashLimit}} variable can be final.
 # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number 
of blocks count in the BP.
 # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled 
or not and send the report accordingly?

> Add/ Update disk space counters for trash (trash used, disk remaining etc.) 
> 
>
> Key: HDFS-13329
> URL: https://issues.apache.org/jira/browse/HDFS-13329
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13329-HDFS-12996.01.patch, 
> HDFS-13329-HDFS-12996.02.patch
>
>
> Add 3 more counters required for datanode replica trash.
>  # diskAvailable
>  # 

[jira] [Commented] (HDFS-13329) Add/ Update disk space counters for trash (trash used, disk remaining etc.)

2018-04-03 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424760#comment-16424760
 ] 

Hanisha Koneru commented on HDFS-13329:
---

Thanks for working on this, [~bharatviswa]. 

Looks good overall. I have a few comments:
 # Can you add Javadoc and License to the 
{color:#3b73af}{{{color}CachingGetSpaceUsedWithExclude}} and {{DUWithExclude}}.
 # In {{DUWithExclude}}, we are calculating the {{du}} for both the path and 
the excludedPath and then subtracting the later from the former. We end up 
calculating the space used by replica trash twice this way.
{code:java}
setUsed((Long.parseLong(tokens[0]) * 1024) - (Long.parseLong(tokens1[0]) * 
1024));{code}
we could instead utilized the {{--exclude}} option of {{du}} command.
 Also, can we add the exclude option to {{DU.java}} itself instead of another 
class? I am not sure how complicated that would get though. I am ok with this 
approach too.

 # Can we rename {{TestDU#testDUWithSubtract}}, to {{testDUWithExclude}} to be 
consistent with the naming.
 # In {{TestDU#testDUWithSubtract}}, the last assert statement has a typo.
{code:java}
assertTrue("invalid-disk-size", duSize >= writtenSize && writtenSize <= (duSize 
+ slack));
{code}
Should have been
{code:java}
 du <= (writtenSize + slack) {code}

 # In {{DatanodeInfo#getDatanodeReport()}}, can we report the new disk counters 
after the {{DFSRemaining%}} counter.
 # In {{DFSConfigKeys}},
{code:java}
  public static final String DFS_DATANODE_REPLICA_TRASH_PERCENT =  
"dfs.datanode.replica.trash.keep.alive.interval";
{code}
The value for the config parameter is mistyped.

 # In {{BlockPoolSlice}},
 ** {{In loadDfsUsed(), variable }}{{replicaTrashUsed}} is not used.
 ** In {{loadReplicaTrashUsed}}, if we are using separate 
{{CachingGetSpaceUsed}} objects for {{dfsUsage}} and {{replicaTrashUsage}}, we 
should have separate Cache files too.
 # {{FsVolumeImpl#replicaTrashLimit}} variable can be final.
 # In {{FsVolumeImpl#onMetaFileDeletion()}}, we should not decrement the number 
of blocks count in the BP.
 # {{DFSAdmin}}, can we let the DN figure out whether replicaTrash is enabled 
or not and send the report accordingly?

> Add/ Update disk space counters for trash (trash used, disk remaining etc.) 
> 
>
> Key: HDFS-13329
> URL: https://issues.apache.org/jira/browse/HDFS-13329
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13329-HDFS-12996.01.patch, 
> HDFS-13329-HDFS-12996.02.patch
>
>
> Add 3 more counters required for datanode replica trash.
>  # diskAvailable
>  # replicaTrashUsed
>  # replicaTrashRemaining
> For more info on these counters, refer design document uploaded in HDFS-12996



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13277) Improve move to Replica trash to limit trash sub-dir size

2018-03-26 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13277:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Improve move to Replica trash to limit trash sub-dir size
> -
>
> Key: HDFS-13277
> URL: https://issues.apache.org/jira/browse/HDFS-13277
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13277-HDFS-12996.00.patch, 
> HDFS-13277-HDFS-12996.01.patch, HDFS-13277-HDFS-12996.02.patch, 
> HDFS-13277-HDFS-12996.03.patch, HDFS-13277-HDFS-12996.04.patch, 
> HDFS-13277-HDFS-12996.06.patch
>
>
> The trash subdirectory maximum entries. This puts an upper limit on the size 
> of subdirectories in replica-trash. Set this default value to 
> blockinvalidateLimit.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13277) Improve move to Replica trash to limit trash sub-dir size

2018-03-26 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414265#comment-16414265
 ] 

Hanisha Koneru commented on HDFS-13277:
---

Test failures are unrelated.

Committed to branch HDFS-12996. Thank you [~bharatviswa] for the contribution 
and [~ajayydv] for the review.

> Improve move to Replica trash to limit trash sub-dir size
> -
>
> Key: HDFS-13277
> URL: https://issues.apache.org/jira/browse/HDFS-13277
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13277-HDFS-12996.00.patch, 
> HDFS-13277-HDFS-12996.01.patch, HDFS-13277-HDFS-12996.02.patch, 
> HDFS-13277-HDFS-12996.03.patch, HDFS-13277-HDFS-12996.04.patch, 
> HDFS-13277-HDFS-12996.06.patch
>
>
> The trash subdirectory maximum entries. This puts an upper limit on the size 
> of subdirectories in replica-trash. Set this default value to 
> blockinvalidateLimit.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13277) Improve move to account for usage (number of files) to limit trash dir size

2018-03-20 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407224#comment-16407224
 ] 

Hanisha Koneru edited comment on HDFS-13277 at 3/20/18 11:29 PM:
-

Thanks for the patch [~bharatviswa].

LGTM overall (still have to review unit test). Just have a few very minor 
comments:
 # In {{ReplicaTrashInfo}}, it would be good to rename {{entries}} to 
{{numBlocks}} as we are tracking the number of blocks in a subDir.
 # Can you rename {{curDir}} to indicate that it is the current ReplicaTrash 
subdir. Maybe {{curSubDir}} or {{curReplicaTrashSubDir?}}


was (Author: hanishakoneru):
Thanks for the patch [~bharatviswa].

LGTM overall (still have to review unit test). Just have a few very minor 
comments:
 # In {{ReplicaTrashInfo}}, it would be good to rename {{entries}} to 
{{numBlocks}} as we are tracking the number of blocks in a subDir.
 # Can you rename \{{curDir}} to indicate that it is the current ReplicaTrash 
subdir. So that it is not confused with the current directory of the block 
pool. Maybe {{curSubDir}} or {{curReplicaTrashSubDir?}}{{}}


> Improve move to account for usage (number of files) to limit trash dir size
> ---
>
> Key: HDFS-13277
> URL: https://issues.apache.org/jira/browse/HDFS-13277
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13277-HDFS-12996.00.patch, 
> HDFS-13277-HDFS-12996.01.patch
>
>
> The trash subdirectory maximum entries. This puts an upper limit on the size 
> of subdirectories in replica-trash. Set this default value to 
> blockinvalidateLimit.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13277) Improve move to account for usage (number of files) to limit trash dir size

2018-03-20 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407224#comment-16407224
 ] 

Hanisha Koneru commented on HDFS-13277:
---

Thanks for the patch [~bharatviswa].

LGTM overall (still have to review unit test). Just have a few very minor 
comments:
 # In {{ReplicaTrashInfo}}, it would be good to rename {{entries}} to 
{{numBlocks}} as we are tracking the number of blocks in a subDir.
 # Can you rename \{{curDir}} to indicate that it is the current ReplicaTrash 
subdir. So that it is not confused with the current directory of the block 
pool. Maybe {{curSubDir}} or {{curReplicaTrashSubDir?}}{{}}


> Improve move to account for usage (number of files) to limit trash dir size
> ---
>
> Key: HDFS-13277
> URL: https://issues.apache.org/jira/browse/HDFS-13277
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13277-HDFS-12996.00.patch, 
> HDFS-13277-HDFS-12996.01.patch
>
>
> The trash subdirectory maximum entries. This puts an upper limit on the size 
> of subdirectories in replica-trash. Set this default value to 
> blockinvalidateLimit.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13277) Improve move to account for usage (number of files) to limit trash dir size

2018-03-22 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410218#comment-16410218
 ] 

Hanisha Koneru commented on HDFS-13277:
---

Thanks [~bharatviswa].

The unit test LGTM overall.
 * Before iterating over the \{{locations}}, can you add an assert that the 
number of locations is 1 as {{storagesPerDatanode}} is set to 1 (or a comment).

> Improve move to account for usage (number of files) to limit trash dir size
> ---
>
> Key: HDFS-13277
> URL: https://issues.apache.org/jira/browse/HDFS-13277
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13277-HDFS-12996.00.patch, 
> HDFS-13277-HDFS-12996.01.patch
>
>
> The trash subdirectory maximum entries. This puts an upper limit on the size 
> of subdirectories in replica-trash. Set this default value to 
> blockinvalidateLimit.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13277) Improve move to account for usage (number of files) to limit trash dir size

2018-03-23 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412104#comment-16412104
 ] 

Hanisha Koneru commented on HDFS-13277:
---

Thanks for updating the patch , [~bharatviswa].

Patch v06 LGTM. +1 pending Jenkins.

> Improve move to account for usage (number of files) to limit trash dir size
> ---
>
> Key: HDFS-13277
> URL: https://issues.apache.org/jira/browse/HDFS-13277
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13277-HDFS-12996.00.patch, 
> HDFS-13277-HDFS-12996.01.patch, HDFS-13277-HDFS-12996.02.patch, 
> HDFS-13277-HDFS-12996.03.patch, HDFS-13277-HDFS-12996.04.patch, 
> HDFS-13277-HDFS-12996.06.patch
>
>
> The trash subdirectory maximum entries. This puts an upper limit on the size 
> of subdirectories in replica-trash. Set this default value to 
> blockinvalidateLimit.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13277) Improve move to account for usage (number of files) to limit trash dir size

2018-03-23 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411761#comment-16411761
 ] 

Hanisha Koneru commented on HDFS-13277:
---

Thanks for the update Bharat.

I am sorry I missed this earlier. The default value for {{max.blocks}} is being 
set to the default value of {{block.invalidate.limit}}. It should be set to the 
configured value of this limit instead. Also, in {{hdfs-default.xml}}, we need 
to mention that if the new parameter is not set, it would take the value of the 
parameter {{dfs.block.invalidate.limit}}.

NITs:{color:#3b73af} {color}
 # {color:#3b73af}{{{color}FsDatasetAsyncDiskService# L104}} has "information" 
twice in the comment.
 # It might be good to avoid abbreviations in hdfs-default.xml as it would be 
reflected in the docs (referring to no for number).

> Improve move to account for usage (number of files) to limit trash dir size
> ---
>
> Key: HDFS-13277
> URL: https://issues.apache.org/jira/browse/HDFS-13277
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HDFS-13277-HDFS-12996.00.patch, 
> HDFS-13277-HDFS-12996.01.patch, HDFS-13277-HDFS-12996.02.patch, 
> HDFS-13277-HDFS-12996.03.patch, HDFS-13277-HDFS-12996.04.patch
>
>
> The trash subdirectory maximum entries. This puts an upper limit on the size 
> of subdirectories in replica-trash. Set this default value to 
> blockinvalidateLimit.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13148) Unit test for EZ with KMS and Federation

2018-03-05 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13148:
--
Attachment: HDFS-13148.002.patch

> Unit test for EZ with KMS and Federation
> 
>
> Key: HDFS-13148
> URL: https://issues.apache.org/jira/browse/HDFS-13148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13148.001.patch, HDFS-13148.002.patch
>
>
> It would be good to have some unit tests for testing KMS and EZ on a 
> federated cluster. We can start with basic EZ operations. For example, create 
> EZs on two namespaces with different keys using one KMS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13148) Unit test for EZ with KMS and Federation

2018-03-05 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386677#comment-16386677
 ] 

Hanisha Koneru commented on HDFS-13148:
---

Thanks for the review [~xyao]. Addressed all the comments in patch v02.

I did not fix all the checkstyle warnings as the link has expired. I will fix 
it after the next jenkins run.

> Unit test for EZ with KMS and Federation
> 
>
> Key: HDFS-13148
> URL: https://issues.apache.org/jira/browse/HDFS-13148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13148.001.patch, HDFS-13148.002.patch
>
>
> It would be good to have some unit tests for testing KMS and EZ on a 
> federated cluster. We can start with basic EZ operations. For example, create 
> EZs on two namespaces with different keys using one KMS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11394) Add method for getting erasure coding policy through WebHDFS

2018-03-01 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382905#comment-16382905
 ] 

Hanisha Koneru commented on HDFS-11394:
---

Hi [~lewuathe], are you planning to continue working on this Jira? If not, I 
would like to take it up. Please let me know.

> Add method for getting erasure coding policy through WebHDFS 
> -
>
> Key: HDFS-11394
> URL: https://issues.apache.org/jira/browse/HDFS-11394
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, namenode
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Major
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-11394.01.patch, HDFS-11394.02.patch, 
> HDFS-11394.03.patch, HDFS-11394.04.patch
>
>
> We can expose erasure coding policy by erasure coded directory through 
> WebHDFS method as well as storage policy. This information can be used by 
> NameNode Web UI and show the detail of erasure coded directories.
> see: HDFS-8196



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13114) CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path

2018-02-27 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379573#comment-16379573
 ] 

Hanisha Koneru commented on HDFS-13114:
---

[~xyao], yes, the ListZonesCommand and ListReencryptionStatusCommand work as 
expected with the -fs command (without the fix too).

> CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path
> 
>
> Key: HDFS-13114
> URL: https://issues.apache.org/jira/browse/HDFS-13114
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13114.001.patch
>
>
> The {{crypto -reencryptZone  -path }} command takes in a path 
> argument. But when creating {{HdfsAdmin}} object, it takes the defaultFs 
> instead of resolving from the path. This causes the following exception if 
> the authority component in path does not match the authority of default Fs.
> {code:java}
> $ hdfs crypto -reencryptZone -start -path hdfs://mycluster-node-1:8020/zone1
> IllegalArgumentException: Wrong FS: hdfs://mycluster-node-1:8020/zone1, 
> expected: hdfs://ns1{code}
> {code:java}
> $ hdfs crypto -reencryptZone -start -path hdfs://ns2/zone2
> IllegalArgumentException: Wrong FS: hdfs://ns2/zone2, expected: 
> hdfs://ns1{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-183) Integrate Volumeset, ContainerSet and HddsDispatcher

2018-06-28 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-183:

Status: Patch Available  (was: Open)

> Integrate Volumeset, ContainerSet and HddsDispatcher
> 
>
> Key: HDDS-183
> URL: https://issues.apache.org/jira/browse/HDDS-183
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-183-HDDS-48.00.patch, HDDS-183-HDDS-48.01.patch, 
> HDDS-183-HDDS-48.02.patch, HDDS-183-HDDS-48.03.patch, 
> HDDS-183-HDDS-48.04.patch
>
>
> This Jira adds following:
> 1. Use new VolumeSet.
> 2. build container map from .container files during startup.
> 3. Integrate HddsDispatcher.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-173) Refactor Dispatcher and implement Handler for new ContainerIO design

2018-06-28 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526515#comment-16526515
 ] 

Hanisha Koneru commented on HDDS-173:
-

Hi [~xyao], [~bharatviswa]

The compile failure looks like a Jenkins issue. It compiles successfully for me 
locally.
There are a couple of Findbug errors which I will fix in HDDS-182. And will fix 
the unit test along with integration tests.

Can we go ahead with committing patch v005?

> Refactor Dispatcher and implement Handler for new ContainerIO design
> 
>
> Key: HDDS-173
> URL: https://issues.apache.org/jira/browse/HDDS-173
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-173-HDDS-48.001.patch, HDDS-173-HDDS-48.002.patch, 
> HDDS-173-HDDS-48.003.patch, HDDS-173-HDDS-48.004.patch, 
> HDDS-173-HDDS-48.005.patch
>
>
> Dispatcher will pass the ContainerCommandRequests to the corresponding 
> Handler based on the ContainerType. Each ContainerType will have its own 
> Handler. The Handler class will process the message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-173) Refactor Dispatcher and implement Handler for new ContainerIO design

2018-06-28 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-173:

Description: 
HddsDispatcher will pass the ContainerCommandRequests to the corresponding 
Handler based on the ContainerType. Each ContainerType will have its own 
Handler. The Handler class will process the message.

Current Dispatcher will be replaced by HddsDispatcher in HDDS-183.

  was:Dispatcher will pass the ContainerCommandRequests to the corresponding 
Handler based on the ContainerType. Each ContainerType will have its own 
Handler. The Handler class will process the message.


> Refactor Dispatcher and implement Handler for new ContainerIO design
> 
>
> Key: HDDS-173
> URL: https://issues.apache.org/jira/browse/HDDS-173
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-173-HDDS-48.001.patch, HDDS-173-HDDS-48.002.patch, 
> HDDS-173-HDDS-48.003.patch, HDDS-173-HDDS-48.004.patch, 
> HDDS-173-HDDS-48.005.patch
>
>
> HddsDispatcher will pass the ContainerCommandRequests to the corresponding 
> Handler based on the ContainerType. Each ContainerType will have its own 
> Handler. The Handler class will process the message.
> Current Dispatcher will be replaced by HddsDispatcher in HDDS-183.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-183) Integrate Volumeset, ContainerSet and HddsDispatcher

2018-06-28 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-183:

Status: Open  (was: Patch Available)

> Integrate Volumeset, ContainerSet and HddsDispatcher
> 
>
> Key: HDDS-183
> URL: https://issues.apache.org/jira/browse/HDDS-183
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-183-HDDS-48.00.patch, HDDS-183-HDDS-48.01.patch, 
> HDDS-183-HDDS-48.02.patch, HDDS-183-HDDS-48.03.patch, 
> HDDS-183-HDDS-48.04.patch
>
>
> This Jira adds following:
> 1. Use new VolumeSet.
> 2. build container map from .container files during startup.
> 3. Integrate HddsDispatcher.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-173) Refactor Dispatcher and implement Handler for new ContainerIO design

2018-06-28 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526587#comment-16526587
 ] 

Hanisha Koneru commented on HDDS-173:
-

Thank you [~bharatviswa] and [~xyao] for the reviews. Committed this to feature 
branch.

> Refactor Dispatcher and implement Handler for new ContainerIO design
> 
>
> Key: HDDS-173
> URL: https://issues.apache.org/jira/browse/HDDS-173
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-173-HDDS-48.001.patch, HDDS-173-HDDS-48.002.patch, 
> HDDS-173-HDDS-48.003.patch, HDDS-173-HDDS-48.004.patch, 
> HDDS-173-HDDS-48.005.patch
>
>
> HddsDispatcher will pass the ContainerCommandRequests to the corresponding 
> Handler based on the ContainerType. Each ContainerType will have its own 
> Handler. The Handler class will process the message.
> Current Dispatcher will be replaced by HddsDispatcher in HDDS-183.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-173) Refactor Dispatcher and implement Handler for new ContainerIO design

2018-06-28 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-173:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Refactor Dispatcher and implement Handler for new ContainerIO design
> 
>
> Key: HDDS-173
> URL: https://issues.apache.org/jira/browse/HDDS-173
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-173-HDDS-48.001.patch, HDDS-173-HDDS-48.002.patch, 
> HDDS-173-HDDS-48.003.patch, HDDS-173-HDDS-48.004.patch, 
> HDDS-173-HDDS-48.005.patch
>
>
> HddsDispatcher will pass the ContainerCommandRequests to the corresponding 
> Handler based on the ContainerType. Each ContainerType will have its own 
> Handler. The Handler class will process the message.
> Current Dispatcher will be replaced by HddsDispatcher in HDDS-183.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-183) Integrate Volumeset, ContainerSet and HddsDispatcher

2018-06-28 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526773#comment-16526773
 ] 

Hanisha Koneru commented on HDDS-183:
-

+1 for patch v04 contingent upon addressing the other Findbug errors in cleanup 
Jira along with integration test fixes.

> Integrate Volumeset, ContainerSet and HddsDispatcher
> 
>
> Key: HDDS-183
> URL: https://issues.apache.org/jira/browse/HDDS-183
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-183-HDDS-48.00.patch, HDDS-183-HDDS-48.01.patch, 
> HDDS-183-HDDS-48.02.patch, HDDS-183-HDDS-48.03.patch, 
> HDDS-183-HDDS-48.04.patch
>
>
> This Jira adds following:
> 1. Use new VolumeSet.
> 2. build container map from .container files during startup.
> 3. Integrate HddsDispatcher.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-289) While creating bucket everything after '/' is ignored without any warning

2018-09-27 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631044#comment-16631044
 ] 

Hanisha Koneru commented on HDDS-289:
-

Thanks for working on this [~candychencan].
 Patch LGTM overall. A few comments:
 # PutKey should allow "/" in the key name. We can create keys using {{ozone 
fs}} (and they can have "/" in the keyname). So {{ozone sh key}} should also 
allow keys with a "/" in them.
 # The error message "Path ... too long in ..." is ambiguous. Can we expand it 
to say something like "Invalid bucket name.Delimiters ("/") not allowed in 
bucket name"
 # A minor NIT: Most of the handlers already have a check with respect to 
path.getNameCount(). We could probably optimize by combining them. Something 
like below:
{code:java}
int pathNameCount = path.getNameCount();
if (pathNameCount != 2) {
  String errorMessage;
  if (pathNameCount < 2) {
errorMessage = "volume and bucket name required in createBucket";
  } else {
errorMessage = "invalid bucket name. Delimiters (/) not allowed in " +
"bucket name";
  }
  throw new OzoneClientException(errorMessage); 
}
{code}

> While creating bucket everything after '/' is ignored without any warning
> -
>
> Key: HDDS-289
> URL: https://issues.apache.org/jira/browse/HDDS-289
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.2.1
>Reporter: Namit Maheshwari
>Assignee: chencan
>Priority: Major
>  Labels: newbie
> Attachments: HDDS-289.001.patch, HDDS-289.002.patch, 
> HDDS-289.003.patch
>
>
> Please see below example. Here the user issues command to create bucket like 
> below. Where /namit is the volume. 
> {code}
> hadoop@288c0999be17:~$ ozone oz -createBucket /namit/hjk/fgh
> 2018-07-24 00:30:52 WARN  NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-07-24 00:30:52 INFO  RpcClient:337 - Creating Bucket: namit/hjk, with 
> Versioning false and Storage Type set to DISK
> {code}
> As seen above it just ignored '/fgh'
> There should be a Warning / Error message instead of just ignoring everything 
> after a '/' 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-551) Fix the close container status check in CloseContainerCommandHandler

2018-09-27 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631154#comment-16631154
 ] 

Hanisha Koneru commented on HDDS-551:
-

Thanks [~shashikant] for working on this. 

LGTM. +1

(There is one checkstyle issue - line longer than 80, in 
CloseContainerCommandHandler#83. I will fix it while committing).

> Fix the close container status check in CloseContainerCommandHandler
> 
>
> Key: HDDS-551
> URL: https://issues.apache.org/jira/browse/HDDS-551
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-551.000.patch
>
>
> If the container is already closed while retrying to close the container in a 
> Datanode which is not a leader, we just log the info and still submit the 
> close request to Ratis. Ideally, this check should be moved to 
> CloseContainerCommandhandler and we should just return without submitting any 
> request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-551) Fix the close container status check in CloseContainerCommandHandler

2018-09-27 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631158#comment-16631158
 ] 

Hanisha Koneru commented on HDDS-551:
-

Committed to trunk. Thanks for the contribution [~shashikant].

> Fix the close container status check in CloseContainerCommandHandler
> 
>
> Key: HDDS-551
> URL: https://issues.apache.org/jira/browse/HDDS-551
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-551.000.patch
>
>
> If the container is already closed while retrying to close the container in a 
> Datanode which is not a leader, we just log the info and still submit the 
> close request to Ratis. Ideally, this check should be moved to 
> CloseContainerCommandhandler and we should just return without submitting any 
> request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-551) Fix the close container status check in CloseContainerCommandHandler

2018-09-27 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-551:

   Resolution: Fixed
Fix Version/s: 0.2.2
   Status: Resolved  (was: Patch Available)

> Fix the close container status check in CloseContainerCommandHandler
> 
>
> Key: HDDS-551
> URL: https://issues.apache.org/jira/browse/HDDS-551
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.2
>
> Attachments: HDDS-551.000.patch
>
>
> If the container is already closed while retrying to close the container in a 
> Datanode which is not a leader, we just log the info and still submit the 
> close request to Ratis. Ideally, this check should be moved to 
> CloseContainerCommandhandler and we should just return without submitting any 
> request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-361) Use DBStore and TableStore for DN metadata

2018-10-05 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640421#comment-16640421
 ] 

Hanisha Koneru edited comment on HDDS-361 at 10/5/18 10:37 PM:
---

[~ljain], thanks for working on this.
 The patch looks very good. I just have a few minor comments.
 # In {{BlockDeletingService#executeDeleteTxn()}}, if we cannot delete a block, 
we skip that block and continue deleting other blocks. So the actual number of 
blocks deleted might be less than were scheduled in the transaction. In 
{{BlockDeletingService#call()}}, we update the {{numBlocksDeleted}} with the 
count of number of blocks scheduled for deletion.
{code:java}
if (delTxn != null) {
  executeDeleteTxn(delTxn, defaultStore);
  // increment number of blocks deleted for the container
  numBlocksDeleted += delTxn.getLocalIDCount();
  // if successful, txn can be removed from delete table{code}
Instead, we should update {{numBlocksDeleted}} with the number of blocks 
actually deleted in {{executeDeleteTxn}}
{code:java}
if (delTxn != null) {
  int deletedBlocksCount = executeDeleteTxn(delTxn, defaultStore);
  // increment number of blocks deleted for the container
  numBlocksDeleted += deletedBlocksCount;
  // if successful, txn can be removed from delete table{code}
 # Also, before deleting the transaction from the Pending Deletes tables, we 
should verify that all the blocks in the transaction were successfully deleted.
{code:java}
// if successful, txn can be removed from delete table
if (deletedBlocksCount == delTxn.getLocalIDCount()) {
  batch.delete(pendingDeletes.getHandle(),
   Longs.toByteArray(delTxn.getTxID()));
}
{code}
 # In {{BlockDeletingService#executeDeleteTxn}}, the null check for delTxn is 
redundant. We perform this check before calling the function too.
 # A NIT: In {{BlockManagerImpl}}, most of the functions get the DB and then 
get the default table. We could have this in a private function to avoid 
redundancy.
{code:java}
private Table getDefaultTable(containerData, conf) {
  DBStore db = BlockUtils.getDB(cData, config);
  return db.getTable(DEFAULT_TABLE)
}
{code}

P.S: The patch does not apply to trunk anymore.


was (Author: hanishakoneru):
[~ljain], thanks for working on this.
 The patch looks very good. I just have a few minor comments.
 # In {{BlockDeletingService#executeDeleteTxn()}}, if we cannot delete a block, 
we skip that block and continue deleting other blocks. So the actual number of 
blocks deleted might be less than were scheduled in the transaction. In 
{{BlockDeletingService#call()}}, we update the {{numBlocksDeleted}} with the 
count of number of blocks scheduled for deletion.
{code:java}
if (delTxn != null) {
  executeDeleteTxn(delTxn, defaultStore);
  // increment number of blocks deleted for the container
  numBlocksDeleted += delTxn.getLocalIDCount();
  // if successful, txn can be removed from delete table{code}
Instead, we should update {{numBlocksDeleted}} with the number of blocks 
actually deleted in {{executeDeleteTxn}}
{code:java}
if (delTxn != null) {
  int deletedBlocksCount = executeDeleteTxn(delTxn, defaultStore);
  // increment number of blocks deleted for the container
  numBlocksDeleted += deletedBlocksCount;
  // if successful, txn can be removed from delete table{code}

 # Also, before deleting the transaction from the Pending Deletes tables, we 
should verify that all the blocks in the transaction were successfully deleted.
{code:java}
// if successful, txn can be removed from delete table
if (deletedBlocksCount == delTxn.getLocalIDCount()) {
  batch.delete(pendingDeletes.getHandle(),
   Longs.toByteArray(delTxn.getTxID()));
}
{code}

 # In {{BlockDeletingService#executeDeleteTxn}}, the null check for delTxn is 
redundant. We perform this check before calling the function too.
 # A NIT: In {{BlockManagerImpl}}, most of the functions get the DB and then 
get the default table. We could have this in a private function to avoid 
redundancy.
{code:java}
private Table getDefaultTable(containerData, conf) {
  DBStore db = BlockUtils.getDB(cData, config);
  return db.getTable(DEFAULT_TABLE)
}
{code}

P.S: The patch does not apply to trunk anymore.

> Use DBStore and TableStore for DN metadata
> --
>
> Key: HDDS-361
> URL: https://issues.apache.org/jira/browse/HDDS-361
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-361.001.patch, HDDS-361.002.patch
>
>
> As part of OM performance improvement we used Tables for storing a particular 
> type of key value pair in the rocks db. This Jira aims to use Tables for 
> separating block keys and deletion transactions in the container db.



--
This message was sent by Atlassian 

[jira] [Commented] (HDDS-361) Use DBStore and TableStore for DN metadata

2018-10-05 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640421#comment-16640421
 ] 

Hanisha Koneru commented on HDDS-361:
-

[~ljain], thanks for working on this.
 The patch looks very good. I just have a few minor comments.
 # In {{BlockDeletingService#executeDeleteTxn()}}, if we cannot delete a block, 
we skip that block and continue deleting other blocks. So the actual number of 
blocks deleted might be less than were scheduled in the transaction. In 
{{BlockDeletingService#call()}}, we update the {{numBlocksDeleted}} with the 
count of number of blocks scheduled for deletion.
{code:java}
if (delTxn != null) {
  executeDeleteTxn(delTxn, defaultStore);
  // increment number of blocks deleted for the container
  numBlocksDeleted += delTxn.getLocalIDCount();
  // if successful, txn can be removed from delete table{code}
Instead, we should update {{numBlocksDeleted}} with the number of blocks 
actually deleted in {{executeDeleteTxn}}
{code:java}
if (delTxn != null) {
  int deletedBlocksCount = executeDeleteTxn(delTxn, defaultStore);
  // increment number of blocks deleted for the container
  numBlocksDeleted += deletedBlocksCount;
  // if successful, txn can be removed from delete table{code}

 # Also, before deleting the transaction from the Pending Deletes tables, we 
should verify that all the blocks in the transaction were successfully deleted.
{code:java}
// if successful, txn can be removed from delete table
if (deletedBlocksCount == delTxn.getLocalIDCount()) {
  batch.delete(pendingDeletes.getHandle(),
   Longs.toByteArray(delTxn.getTxID()));
}
{code}

 # In {{BlockDeletingService#executeDeleteTxn}}, the null check for delTxn is 
redundant. We perform this check before calling the function too.
 # A NIT: In {{BlockManagerImpl}}, most of the functions get the DB and then 
get the default table. We could have this in a private function to avoid 
redundancy.
{code:java}
private Table getDefaultTable(containerData, conf) {
  DBStore db = BlockUtils.getDB(cData, config);
  return db.getTable(DEFAULT_TABLE)
}
{code}

P.S: The patch does not apply to trunk anymore.

> Use DBStore and TableStore for DN metadata
> --
>
> Key: HDDS-361
> URL: https://issues.apache.org/jira/browse/HDDS-361
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-361.001.patch, HDDS-361.002.patch
>
>
> As part of OM performance improvement we used Tables for storing a particular 
> type of key value pair in the rocks db. This Jira aims to use Tables for 
> separating block keys and deletion transactions in the container db.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-283) Need an option to list all volumes created in the cluster

2018-10-08 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642258#comment-16642258
 ] 

Hanisha Koneru commented on HDDS-283:
-

Hi [~nilotpalnandi],

Are you working on this Jira or planning to? If not, please let me know and I 
can take it up. Thanks.

> Need an option to list all volumes created in the cluster
> -
>
> Key: HDDS-283
> URL: https://issues.apache.org/jira/browse/HDDS-283
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Blocker
>  Labels: alpha2
> Fix For: 0.3.0
>
> Attachments: HDDS-283.001.patch
>
>
> Currently , listVolume command either gives :
> 1) all the volumes created by a particular user , using -user argument.
> 2) or , all the volumes created by the logged in user , if no -user argument 
> is provided.
>  
> We need an option to list all the volumes created in the cluster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-609) SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-609:

Description: 
When SCM is restart

While running a MapReduce job, we got "Allocate block failed, 
error:INTERNAL_ERROR". This 
{code:java}
SCM logs{code}
{code:java}
2018-10-09 23:37:28,984 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 
on 9863, call Call#101 Retry#0 
org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock from 
172.27.56.9:33814
org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
for allocateBlock
at 
org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
at 
org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191)
at 
org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
at 
org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
at 
org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
2018-10-09 23:37:35,232 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 
on 9863, call Call#103 Retry#0 
org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock from 
172.27.56.9:33814
org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
for allocateBlock
at 
org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
at 
org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191)
at 
org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
at 
org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
at 
org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
2018-10-09 23:37:42,044 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 
on 9863, call Call#105 Retry#0 
org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock from 
172.27.56.9:33814
org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
for allocateBlock
at 
org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
at 
org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191)
at 
org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
at 
org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
at 
org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at 

[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-609:

Summary: On restart, SCM does not exit chill mode as it expects DNs to 
report containers in ALLOCATED state  (was: SCM does not exit chill mode as it 
expects DNs to report containers in ALLOCATED state)

> On restart, SCM does not exit chill mode as it expects DNs to report 
> containers in ALLOCATED state
> --
>
> Key: HDDS-609
> URL: https://issues.apache.org/jira/browse/HDDS-609
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Priority: Major
>
> When SCM is restart
> While running a MapReduce job, we got "Allocate block failed, 
> error:INTERNAL_ERROR". This 
> {code:java}
> SCM logs{code}
> {code:java}
> 2018-10-09 23:37:28,984 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 4 on 9863, call Call#101 Retry#0 
> org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock 
> from 172.27.56.9:33814
> org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
> for allocateBlock
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
> at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
> at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191)
> at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> 2018-10-09 23:37:35,232 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 4 on 9863, call Call#103 Retry#0 
> org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock 
> from 172.27.56.9:33814
> org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
> for allocateBlock
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
> at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
> at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191)
> at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> 2018-10-09 23:37:42,044 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 3 on 9863, call Call#105 Retry#0 
> org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock 
> from 172.27.56.9:33814
> org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
> for allocateBlock
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
> at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
> at 
> 

[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-609:

Attachment: HDDS-609.002.patch

> On restart, SCM does not exit chill mode as it expects DNs to report 
> containers in ALLOCATED state
> --
>
> Key: HDDS-609
> URL: https://issues.apache.org/jira/browse/HDDS-609
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-609.001.patch, HDDS-609.002.patch
>
>
> Note: Updated the description to describe the root cause of the bug and moved 
> the error logs to comments.
> On restart, SCM can exit chill mode only if it receives report of 99% 
> (default) of containers from the DNs. 
> SCM includes containers in ALLOCATED state in calculating the total number of 
> containers. But since ALLOCATED containers are not reported by DNs, the 
> calculation of percentage of reported containers is misconstrued.
> {code:java}
> For example, say we have 1DN in the cluster and we restart SCM.
> Total number of containers in SCM ContainerMap = 20
> Containers in OPEN state = 2
> Containers in ALLOCATED state = 18
> Containers reported by DN on SCM restart = 2 
> Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) 
> = 0.10
>  {code}
> We should not include the ALLOCATED containers while calculating the total 
> number of containers for chill mode exit rule. Otherwise, for scenarios such 
> as above, SCM can never come out of chill mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-609) SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-609:

Summary: SCM does not exit chill mode as it expects DNs to report 
containers in ALLOCATED state  (was: Mapreduce example fails with Allocate 
block failed, error:INTERNAL_ERROR)

> SCM does not exit chill mode as it expects DNs to report containers in 
> ALLOCATED state
> --
>
> Key: HDDS-609
> URL: https://issues.apache.org/jira/browse/HDDS-609
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Priority: Major
>
> {code:java}
> -bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar 
> /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar 
> wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5
> 18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags:
> 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags:
> 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags:
> 18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History 
> server at ctr-e138-1518143905142-510793-01-04.hwx.site/172.27.79.197:10200
> 18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> 18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure 
> Coding for path: /user/hdfs/.staging/job_1539125785626_0007
> 18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1
> 18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
> 18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized 
> native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9]
> 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1
> 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_1539125785626_0007
> 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: []
> 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags:
> 18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml 
> at file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml
> 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags:
> 18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application 
> application_1539125785626_0007
> 18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: 
> http://ctr-e138-1518143905142-510793-01-05.hwx.site:8088/proxy/application_1539125785626_0007/
> 18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007
> 18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in 
> uber mode : false
> 18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0%
> 18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0%
> 18/10/09 23:37:29 INFO mapreduce.Job: Task Id : 
> attempt_1539125785626_0007_r_00_0, Status : FAILED
> Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR
> at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576)
> at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475)
> at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271)
> at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250)
> at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78)
> at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93)
> at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559)
> at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
> at 
> org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64)
> at 
> org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
> at java.security.AccessController.doPrivileged(Native Method)
> at 

[jira] [Commented] (HDDS-601) On restart, SCM throws 'No such datanode' exception

2018-10-11 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647018#comment-16647018
 ] 

Hanisha Koneru commented on HDDS-601:
-

Thanks [~ssulav] for reporting the issue and [~anu] for the review. 

I have committed this to trunk and ozone-0.3 branch.

> On restart, SCM throws 'No such datanode' exception
> ---
>
> Key: HDDS-601
> URL: https://issues.apache.org/jira/browse/HDDS-601
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Soumitra Sulav
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.3.0, 0.4.0
>
> Attachments: HDDS-601.001.patch, HDDS-601.002.patch
>
>
> Encountered below exception after I changed a configuration in ozone-site and 
> restarted SCM and Datanode :
> Ozone Cluster : 1 SCM, 1 OM, 3 DNs
> {code:java}
> 2018-10-04 09:35:59,716 INFO org.apache.hadoop.hdds.server.BaseHttpServer: 
> HTTP server of SCM is listening at http://0.0.0.0:9876
> 2018-10-04 09:36:03,618 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> SCM receive heartbeat from unregistered datanode 
> 127a8e17-b2df-4663-924c-1a6909adb293{ip: 172.22.119.19, host: 
> hcatest-2.openstacklocal}
> 2018-10-04 09:36:09,063 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> SCM receive heartbeat from unregistered datanode 
> 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: 
> hcatest-3.openstacklocal}
> 2018-10-04 09:36:09,083 ERROR 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Error on 
> processing container report from datanode 
> 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: 
> hcatest-3.openstacklocal}
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No such datanode
>  at 
> org.apache.hadoop.hdds.scm.node.states.Node2ContainerMap.setContainersForDatanode(Node2ContainerMap.java:82)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:45)
>  at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-601) On restart, SCM throws 'No such datanode' exception

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-601:

Fix Version/s: 0.4.0
   0.3.0

> On restart, SCM throws 'No such datanode' exception
> ---
>
> Key: HDDS-601
> URL: https://issues.apache.org/jira/browse/HDDS-601
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Soumitra Sulav
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.3.0, 0.4.0
>
> Attachments: HDDS-601.001.patch, HDDS-601.002.patch
>
>
> Encountered below exception after I changed a configuration in ozone-site and 
> restarted SCM and Datanode :
> Ozone Cluster : 1 SCM, 1 OM, 3 DNs
> {code:java}
> 2018-10-04 09:35:59,716 INFO org.apache.hadoop.hdds.server.BaseHttpServer: 
> HTTP server of SCM is listening at http://0.0.0.0:9876
> 2018-10-04 09:36:03,618 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> SCM receive heartbeat from unregistered datanode 
> 127a8e17-b2df-4663-924c-1a6909adb293{ip: 172.22.119.19, host: 
> hcatest-2.openstacklocal}
> 2018-10-04 09:36:09,063 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> SCM receive heartbeat from unregistered datanode 
> 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: 
> hcatest-3.openstacklocal}
> 2018-10-04 09:36:09,083 ERROR 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Error on 
> processing container report from datanode 
> 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: 
> hcatest-3.openstacklocal}
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No such datanode
>  at 
> org.apache.hadoop.hdds.scm.node.states.Node2ContainerMap.setContainersForDatanode(Node2ContainerMap.java:82)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:45)
>  at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-601) On restart, SCM throws 'No such datanode' exception

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-601:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> On restart, SCM throws 'No such datanode' exception
> ---
>
> Key: HDDS-601
> URL: https://issues.apache.org/jira/browse/HDDS-601
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Soumitra Sulav
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.3.0, 0.4.0
>
> Attachments: HDDS-601.001.patch, HDDS-601.002.patch
>
>
> Encountered below exception after I changed a configuration in ozone-site and 
> restarted SCM and Datanode :
> Ozone Cluster : 1 SCM, 1 OM, 3 DNs
> {code:java}
> 2018-10-04 09:35:59,716 INFO org.apache.hadoop.hdds.server.BaseHttpServer: 
> HTTP server of SCM is listening at http://0.0.0.0:9876
> 2018-10-04 09:36:03,618 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> SCM receive heartbeat from unregistered datanode 
> 127a8e17-b2df-4663-924c-1a6909adb293{ip: 172.22.119.19, host: 
> hcatest-2.openstacklocal}
> 2018-10-04 09:36:09,063 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> SCM receive heartbeat from unregistered datanode 
> 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: 
> hcatest-3.openstacklocal}
> 2018-10-04 09:36:09,083 ERROR 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Error on 
> processing container report from datanode 
> 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: 
> hcatest-3.openstacklocal}
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No such datanode
>  at 
> org.apache.hadoop.hdds.scm.node.states.Node2ContainerMap.setContainersForDatanode(Node2ContainerMap.java:82)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:45)
>  at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-609:

Description: 
Note: Updated the description to describe the root cause of the bug and moved 
the error logs to comments.

On restart, SCM can exit chill mode only if it receives report of 99% (default) 
of containers from the DNs. 

SCM includes containers in ALLOCATED state in calculating the total number of 
containers. But since ALLOCATED containers are not reported by DNs, the 
calculation of percentage of reported containers is misconstrued.
{code:java}
For example, say we have 1DN in the cluster and we restart SCM.

Total number of containers in SCM ContainerMap = 20

Containers in OPEN state = 2

Containers in ALLOCATED state = 18

Containers reported by DN on SCM restart = 2 

Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) = 
0.10
 {code}
We should not include the ALLOCATED containers while calculating the total 
number of containers for chill mode exit rule. Otherwise, for scenarios such as 
above, SCM can never come out of chill mode.

  was:
{code:java}
-bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar 
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar 
wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5
18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History 
server at ctr-e138-1518143905142-510793-01-04.hwx.site/172.27.79.197:10200
18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding 
for path: /user/hdfs/.staging/job_1539125785626_0007
18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1
18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized 
native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9]
18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1
18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1539125785626_0007
18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: []
18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml at 
file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml
18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application 
application_1539125785626_0007
18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: 
http://ctr-e138-1518143905142-510793-01-05.hwx.site:8088/proxy/application_1539125785626_0007/
18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007
18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in 
uber mode : false
18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0%
18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0%
18/10/09 23:37:29 INFO mapreduce.Job: Task Id : 
attempt_1539125785626_0007_r_00_0, Status : FAILED
Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR
at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250)
at 
org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at 
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78)
at 
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93)
at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559)
at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64)
at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 

[jira] [Commented] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646970#comment-16646970
 ] 

Hanisha Koneru commented on HDDS-609:
-

Initial error logs reported by [~nmaheshwari] (moved from the description):
{code:java}
-bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar 
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar 
wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5
18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History 
server at ctr-e138-1518143905142-510793-01-04.hwx.site/172.27.79.197:10200
18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding 
for path: /user/hdfs/.staging/job_1539125785626_0007
18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1
18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized 
native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9]
18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1
18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1539125785626_0007
18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: []
18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml at 
file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml
18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application 
application_1539125785626_0007
18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: 
http://ctr-e138-1518143905142-510793-01-05.hwx.site:8088/proxy/application_1539125785626_0007/
18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007
18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in 
uber mode : false
18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0%
18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0%
18/10/09 23:37:29 INFO mapreduce.Job: Task Id : 
attempt_1539125785626_0007_r_00_0, Status : FAILED
Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR
at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250)
at 
org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at 
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78)
at 
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93)
at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559)
at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64)
at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

18/10/09 23:37:35 INFO mapreduce.Job: Task Id : 
attempt_1539125785626_0007_r_00_1, Status : FAILED
Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR
at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475)
at 

[jira] [Commented] (HDDS-600) Mapreduce example fails with java.lang.IllegalArgumentException: Bucket or Volume name has an unsupported character

2018-10-11 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646944#comment-16646944
 ] 

Hanisha Koneru commented on HDDS-600:
-

[~nmaheshwari], can we close this issue?

> Mapreduce example fails with java.lang.IllegalArgumentException: Bucket or 
> Volume name has an unsupported character
> ---
>
> Key: HDDS-600
> URL: https://issues.apache.org/jira/browse/HDDS-600
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> Set up a hadoop cluster where ozone is also installed. Ozone can be 
> referenced via o3://xx.xx.xx.xx:9889
> {code:java}
> [root@ctr-e138-1518143905142-510793-01-02 ~]# ozone sh bucket list 
> o3://xx.xx.xx.xx:9889/volume1/
> 2018-10-09 07:21:24,624 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> [ {
> "volumeName" : "volume1",
> "bucketName" : "bucket1",
> "createdOn" : "Tue, 09 Oct 2018 06:48:02 GMT",
> "acls" : [ {
> "type" : "USER",
> "name" : "root",
> "rights" : "READ_WRITE"
> }, {
> "type" : "GROUP",
> "name" : "root",
> "rights" : "READ_WRITE"
> } ],
> "versioning" : "DISABLED",
> "storageType" : "DISK"
> } ]
> [root@ctr-e138-1518143905142-510793-01-02 ~]# ozone sh key list 
> o3://xx.xx.xx.xx:9889/volume1/bucket1
> 2018-10-09 07:21:54,500 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> [ {
> "version" : 0,
> "md5hash" : null,
> "createdOn" : "Tue, 09 Oct 2018 06:58:32 GMT",
> "modifiedOn" : "Tue, 09 Oct 2018 06:58:32 GMT",
> "size" : 0,
> "keyName" : "mr_job_dir"
> } ]
> [root@ctr-e138-1518143905142-510793-01-02 ~]#{code}
> Hdfs is also set fine as below
> {code:java}
> [root@ctr-e138-1518143905142-510793-01-02 ~]# hdfs dfs -ls 
> /tmp/mr_jobs/input/
> Found 1 items
> -rw-r--r-- 3 root hdfs 215755 2018-10-09 06:37 
> /tmp/mr_jobs/input/wordcount_input_1.txt
> [root@ctr-e138-1518143905142-510793-01-02 ~]#{code}
> Now try to run Mapreduce example job against ozone o3:
> {code:java}
> [root@ctr-e138-1518143905142-510793-01-02 ~]# 
> /usr/hdp/current/hadoop-client/bin/hadoop jar 
> /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar 
> wordcount /tmp/mr_jobs/input/ 
> o3://xx.xx.xx.xx:9889/volume1/bucket1/mr_job_dir/output
> 18/10/09 07:15:38 INFO conf.Configuration: Removed undeclared tags:
> java.lang.IllegalArgumentException: Bucket or Volume name has an unsupported 
> character : :
> at 
> org.apache.hadoop.hdds.scm.client.HddsClientUtils.verifyResourceName(HddsClientUtils.java:143)
> at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.getVolumeDetails(RpcClient.java:231)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.ozone.client.OzoneClientInvocationHandler.invoke(OzoneClientInvocationHandler.java:54)
> at com.sun.proxy.$Proxy16.getVolumeDetails(Unknown Source)
> at org.apache.hadoop.ozone.client.ObjectStore.getVolume(ObjectStore.java:92)
> at 
> org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(OzoneFileSystem.java:121)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:178)
> at org.apache.hadoop.examples.WordCount.main(WordCount.java:85)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> 

[jira] [Updated] (HDDS-601) On restart, SCM throws 'No such datanode

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-601:

Summary: On restart, SCM throws 'No such datanode  (was: SCMException: No 
such datanode)

> On restart, SCM throws 'No such datanode
> 
>
> Key: HDDS-601
> URL: https://issues.apache.org/jira/browse/HDDS-601
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Soumitra Sulav
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-601.001.patch, HDDS-601.002.patch
>
>
> Encountered below exception after I changed a configuration in ozone-site and 
> restarted SCM and Datanode :
> Ozone Cluster : 1 SCM, 1 OM, 3 DNs
> {code:java}
> 2018-10-04 09:35:59,716 INFO org.apache.hadoop.hdds.server.BaseHttpServer: 
> HTTP server of SCM is listening at http://0.0.0.0:9876
> 2018-10-04 09:36:03,618 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> SCM receive heartbeat from unregistered datanode 
> 127a8e17-b2df-4663-924c-1a6909adb293{ip: 172.22.119.19, host: 
> hcatest-2.openstacklocal}
> 2018-10-04 09:36:09,063 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> SCM receive heartbeat from unregistered datanode 
> 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: 
> hcatest-3.openstacklocal}
> 2018-10-04 09:36:09,083 ERROR 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Error on 
> processing container report from datanode 
> 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: 
> hcatest-3.openstacklocal}
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No such datanode
>  at 
> org.apache.hadoop.hdds.scm.node.states.Node2ContainerMap.setContainersForDatanode(Node2ContainerMap.java:82)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:45)
>  at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-601) On restart, SCM throws 'No such datanode' exception

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-601:

Summary: On restart, SCM throws 'No such datanode' exception  (was: On 
restart, SCM throws 'No such datanode)

> On restart, SCM throws 'No such datanode' exception
> ---
>
> Key: HDDS-601
> URL: https://issues.apache.org/jira/browse/HDDS-601
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Soumitra Sulav
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-601.001.patch, HDDS-601.002.patch
>
>
> Encountered below exception after I changed a configuration in ozone-site and 
> restarted SCM and Datanode :
> Ozone Cluster : 1 SCM, 1 OM, 3 DNs
> {code:java}
> 2018-10-04 09:35:59,716 INFO org.apache.hadoop.hdds.server.BaseHttpServer: 
> HTTP server of SCM is listening at http://0.0.0.0:9876
> 2018-10-04 09:36:03,618 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> SCM receive heartbeat from unregistered datanode 
> 127a8e17-b2df-4663-924c-1a6909adb293{ip: 172.22.119.19, host: 
> hcatest-2.openstacklocal}
> 2018-10-04 09:36:09,063 WARN org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> SCM receive heartbeat from unregistered datanode 
> 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: 
> hcatest-3.openstacklocal}
> 2018-10-04 09:36:09,083 ERROR 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Error on 
> processing container report from datanode 
> 82555af0-a1f9-447a-ad40-c524ba6e1317{ip: 172.22.119.190, host: 
> hcatest-3.openstacklocal}
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No such datanode
>  at 
> org.apache.hadoop.hdds.scm.node.states.Node2ContainerMap.setContainersForDatanode(Node2ContainerMap.java:82)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
>  at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:45)
>  at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647027#comment-16647027
 ] 

Hanisha Koneru commented on HDDS-609:
-

Updated patch v02 to fix test failure in TestSCMChillModeManager.

> On restart, SCM does not exit chill mode as it expects DNs to report 
> containers in ALLOCATED state
> --
>
> Key: HDDS-609
> URL: https://issues.apache.org/jira/browse/HDDS-609
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-609.001.patch, HDDS-609.002.patch
>
>
> Note: Updated the description to describe the root cause of the bug and moved 
> the error logs to comments.
> On restart, SCM can exit chill mode only if it receives report of 99% 
> (default) of containers from the DNs. 
> SCM includes containers in ALLOCATED state in calculating the total number of 
> containers. But since ALLOCATED containers are not reported by DNs, the 
> calculation of percentage of reported containers is misconstrued.
> {code:java}
> For example, say we have 1DN in the cluster and we restart SCM.
> Total number of containers in SCM ContainerMap = 20
> Containers in OPEN state = 2
> Containers in ALLOCATED state = 18
> Containers reported by DN on SCM restart = 2 
> Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) 
> = 0.10
>  {code}
> We should not include the ALLOCATED containers while calculating the total 
> number of containers for chill mode exit rule. Otherwise, for scenarios such 
> as above, SCM can never come out of chill mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-635) On Datanode restart, DatanodeStateMachine throws NullPointerException

2018-10-11 Thread Hanisha Koneru (JIRA)
Hanisha Koneru created HDDS-635:
---

 Summary: On Datanode restart, DatanodeStateMachine throws 
NullPointerException
 Key: HDDS-635
 URL: https://issues.apache.org/jira/browse/HDDS-635
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


{code:java}
2018-10-11 12:08:33,676 ERROR 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
Unable to start the DatanodeState Machine
java.io.IOException: Premature EOF from inputStream
at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100)
at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:250)
at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:155)
at 
org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStream.java:128)
at 
org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:110)
at org.apache.ratis.server.storage.LogSegment.loadSegment(LogSegment.java:132)
at 
org.apache.ratis.server.storage.RaftLogCache.loadSegment(RaftLogCache.java:110)
at 
org.apache.ratis.server.storage.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:155)
at 
org.apache.ratis.server.storage.SegmentedRaftLog.open(SegmentedRaftLog.java:123)
at org.apache.ratis.server.impl.ServerState.initLog(ServerState.java:162)
at org.apache.ratis.server.impl.ServerState.(ServerState.java:110)
at org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:106)
at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$0(RaftServerProxy.java:191)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
2018-10-11 12:08:33,677 ERROR org.apache.hadoop.ozone.HddsDatanodeService: 
Exception in HddsDatanodeService.
java.lang.NullPointerException
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.join(DatanodeStateMachine.java:332)
at 
org.apache.hadoop.ozone.HddsDatanodeService.join(HddsDatanodeService.java:191)
at 
org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:250)
2018-10-11 12:08:33,678 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1: java.lang.NullPointerException{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-609:

Description: 
{code:java}
-bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar 
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar 
wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5
18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History 
server at ctr-e138-1518143905142-510793-01-04.hwx.site/172.27.79.197:10200
18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding 
for path: /user/hdfs/.staging/job_1539125785626_0007
18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1
18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized 
native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9]
18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1
18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1539125785626_0007
18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: []
18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml at 
file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml
18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags:
18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application 
application_1539125785626_0007
18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: 
http://ctr-e138-1518143905142-510793-01-05.hwx.site:8088/proxy/application_1539125785626_0007/
18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007
18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in 
uber mode : false
18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0%
18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0%
18/10/09 23:37:29 INFO mapreduce.Job: Task Id : 
attempt_1539125785626_0007_r_00_0, Status : FAILED
Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR
at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250)
at 
org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at 
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78)
at 
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93)
at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559)
at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64)
at org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

18/10/09 23:37:35 INFO mapreduce.Job: Task Id : 
attempt_1539125785626_0007_r_00_1, Status : FAILED
Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR
at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271)
at 

[jira] [Assigned] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru reassigned HDDS-609:
---

Assignee: Hanisha Koneru

> On restart, SCM does not exit chill mode as it expects DNs to report 
> containers in ALLOCATED state
> --
>
> Key: HDDS-609
> URL: https://issues.apache.org/jira/browse/HDDS-609
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
>
> {code:java}
> -bash-4.2$ /usr/hdp/current/hadoop-client/bin/hadoop jar 
> /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar 
> wordcount /tmp/mr_jobs/input/ o3://bucket2.volume2/mr_job5
> 18/10/09 23:37:07 INFO conf.Configuration: Removed undeclared tags:
> 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags:
> 18/10/09 23:37:08 INFO conf.Configuration: Removed undeclared tags:
> 18/10/09 23:37:08 INFO client.AHSProxy: Connecting to Application History 
> server at ctr-e138-1518143905142-510793-01-04.hwx.site/172.27.79.197:10200
> 18/10/09 23:37:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> 18/10/09 23:37:09 INFO mapreduce.JobResourceUploader: Disabling Erasure 
> Coding for path: /user/hdfs/.staging/job_1539125785626_0007
> 18/10/09 23:37:09 INFO input.FileInputFormat: Total input files to process : 1
> 18/10/09 23:37:09 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
> 18/10/09 23:37:09 INFO lzo.LzoCodec: Successfully loaded & initialized 
> native-lzo library [hadoop-lzo rev 5d6248d8d690f8456469979213ab2e9993bfa2e9]
> 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: number of splits:1
> 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_1539125785626_0007
> 18/10/09 23:37:09 INFO mapreduce.JobSubmitter: Executing with tokens: []
> 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags:
> 18/10/09 23:37:10 INFO conf.Configuration: found resource resource-types.xml 
> at file:/etc/hadoop/3.0.3.0-63/0/resource-types.xml
> 18/10/09 23:37:10 INFO conf.Configuration: Removed undeclared tags:
> 18/10/09 23:37:10 INFO impl.YarnClientImpl: Submitted application 
> application_1539125785626_0007
> 18/10/09 23:37:10 INFO mapreduce.Job: The url to track the job: 
> http://ctr-e138-1518143905142-510793-01-05.hwx.site:8088/proxy/application_1539125785626_0007/
> 18/10/09 23:37:10 INFO mapreduce.Job: Running job: job_1539125785626_0007
> 18/10/09 23:37:17 INFO mapreduce.Job: Job job_1539125785626_0007 running in 
> uber mode : false
> 18/10/09 23:37:17 INFO mapreduce.Job: map 0% reduce 0%
> 18/10/09 23:37:24 INFO mapreduce.Job: map 100% reduce 0%
> 18/10/09 23:37:29 INFO mapreduce.Job: Task Id : 
> attempt_1539125785626_0007_r_00_0, Status : FAILED
> Error: java.io.IOException: Allocate block failed, error:INTERNAL_ERROR
> at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:576)
> at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.allocateNewBlock(ChunkGroupOutputStream.java:475)
> at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.handleWrite(ChunkGroupOutputStream.java:271)
> at 
> org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.write(ChunkGroupOutputStream.java:250)
> at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:47)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:78)
> at 
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:93)
> at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:559)
> at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
> at 
> org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:64)
> at 
> org.apache.hadoop.examples.WordCount$IntSumReducer.reduce(WordCount.java:52)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:628)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> 

[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-609:

Status: Patch Available  (was: Open)

> On restart, SCM does not exit chill mode as it expects DNs to report 
> containers in ALLOCATED state
> --
>
> Key: HDDS-609
> URL: https://issues.apache.org/jira/browse/HDDS-609
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-609.001.patch
>
>
> Note: Updated the description to describe the root cause of the bug and moved 
> the error logs to comments.
> On restart, SCM can exit chill mode only if it receives report of 99% 
> (default) of containers from the DNs. 
> SCM includes containers in ALLOCATED state in calculating the total number of 
> containers. But since ALLOCATED containers are not reported by DNs, the 
> calculation of percentage of reported containers is misconstrued.
> {code:java}
> For example, say we have 1DN in the cluster and we restart SCM.
> Total number of containers in SCM ContainerMap = 20
> Containers in OPEN state = 2
> Containers in ALLOCATED state = 18
> Containers reported by DN on SCM restart = 2 
> Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) 
> = 0.10
>  {code}
> We should not include the ALLOCATED containers while calculating the total 
> number of containers for chill mode exit rule. Otherwise, for scenarios such 
> as above, SCM can never come out of chill mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-609:

Attachment: HDDS-609.001.patch

> On restart, SCM does not exit chill mode as it expects DNs to report 
> containers in ALLOCATED state
> --
>
> Key: HDDS-609
> URL: https://issues.apache.org/jira/browse/HDDS-609
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-609.001.patch
>
>
> Note: Updated the description to describe the root cause of the bug and moved 
> the error logs to comments.
> On restart, SCM can exit chill mode only if it receives report of 99% 
> (default) of containers from the DNs. 
> SCM includes containers in ALLOCATED state in calculating the total number of 
> containers. But since ALLOCATED containers are not reported by DNs, the 
> calculation of percentage of reported containers is misconstrued.
> {code:java}
> For example, say we have 1DN in the cluster and we restart SCM.
> Total number of containers in SCM ContainerMap = 20
> Containers in OPEN state = 2
> Containers in ALLOCATED state = 18
> Containers reported by DN on SCM restart = 2 
> Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) 
> = 0.10
>  {code}
> We should not include the ALLOCATED containers while calculating the total 
> number of containers for chill mode exit rule. Otherwise, for scenarios such 
> as above, SCM can never come out of chill mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-609:

Attachment: HDDS-609.003.patch

> On restart, SCM does not exit chill mode as it expects DNs to report 
> containers in ALLOCATED state
> --
>
> Key: HDDS-609
> URL: https://issues.apache.org/jira/browse/HDDS-609
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-609.001.patch, HDDS-609.002.patch, 
> HDDS-609.003.patch
>
>
> Note: Updated the description to describe the root cause of the bug and moved 
> the error logs to comments.
> On restart, SCM can exit chill mode only if it receives report of 99% 
> (default) of containers from the DNs. 
> SCM includes containers in ALLOCATED state in calculating the total number of 
> containers. But since ALLOCATED containers are not reported by DNs, the 
> calculation of percentage of reported containers is misconstrued.
> {code:java}
> For example, say we have 1DN in the cluster and we restart SCM.
> Total number of containers in SCM ContainerMap = 20
> Containers in OPEN state = 2
> Containers in ALLOCATED state = 18
> Containers reported by DN on SCM restart = 2 
> Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) 
> = 0.10
>  {code}
> We should not include the ALLOCATED containers while calculating the total 
> number of containers for chill mode exit rule. Otherwise, for scenarios such 
> as above, SCM can never come out of chill mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-612) Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock fails with ChillModePrecheck exception

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-612:

Attachment: HDDS-612.001.patch

> Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock 
> fails with ChillModePrecheck exception
> 
>
> Key: HDDS-612
> URL: https://issues.apache.org/jira/browse/HDDS-612
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-612.001.patch
>
>
> {code:java}
> 2018-10-09 23:11:58,047 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 9863, call Call#70 Retry#0 
> org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock 
> from 172.27.56.9:53442
> org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
> for allocateBlock
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
> at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
> at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191)
> at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state

2018-10-11 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647172#comment-16647172
 ] 

Hanisha Koneru commented on HDDS-609:
-

Thanks [~anu]

Fixed the failing unit test and added one for the current change.

> On restart, SCM does not exit chill mode as it expects DNs to report 
> containers in ALLOCATED state
> --
>
> Key: HDDS-609
> URL: https://issues.apache.org/jira/browse/HDDS-609
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-609.001.patch, HDDS-609.002.patch, 
> HDDS-609.003.patch
>
>
> Note: Updated the description to describe the root cause of the bug and moved 
> the error logs to comments.
> On restart, SCM can exit chill mode only if it receives report of 99% 
> (default) of containers from the DNs. 
> SCM includes containers in ALLOCATED state in calculating the total number of 
> containers. But since ALLOCATED containers are not reported by DNs, the 
> calculation of percentage of reported containers is misconstrued.
> {code:java}
> For example, say we have 1DN in the cluster and we restart SCM.
> Total number of containers in SCM ContainerMap = 20
> Containers in OPEN state = 2
> Containers in ALLOCATED state = 18
> Containers reported by DN on SCM restart = 2 
> Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) 
> = 0.10
>  {code}
> We should not include the ALLOCATED containers while calculating the total 
> number of containers for chill mode exit rule. Otherwise, for scenarios such 
> as above, SCM can never come out of chill mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-612) Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock fails with ChillModePrecheck exception

2018-10-11 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-612:

Status: Patch Available  (was: Open)

> Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock 
> fails with ChillModePrecheck exception
> 
>
> Key: HDDS-612
> URL: https://issues.apache.org/jira/browse/HDDS-612
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-612.001.patch
>
>
> {code:java}
> 2018-10-09 23:11:58,047 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 9863, call Call#70 Retry#0 
> org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock 
> from 172.27.56.9:53442
> org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
> for allocateBlock
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
> at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
> at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191)
> at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-661) When a volume fails in datanode, VersionEndpointTask#call ends up in dead lock.

2018-10-15 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru reassigned HDDS-661:
---

Assignee: Hanisha Koneru

> When a volume fails in datanode, VersionEndpointTask#call ends up in dead 
> lock.
> ---
>
> Key: HDDS-661
> URL: https://issues.apache.org/jira/browse/HDDS-661
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Hanisha Koneru
>Priority: Major
>
> When a volume fails in datanode, the call to {{VersionEndpointTask#call}} 
> ends up in dead-lock.
> {code:java}
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:78)
> --> we acquire VolumeSet read lock here.
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:93)
>   
> org.apache.hadoop.ozone.container.common.volume.VolumeSet.failVolume(VolumeSet.java:276)
>   
> org.apache.hadoop.ozone.container.common.volume.VolumeSet.writeLock(VolumeSet.java:210)
> ---> we wait for VolumeSet write lock.
> {code}
> Since this thread already holds the read lock, it cannot get the write lock 
> and ends up in dead-lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-661) When a volume fails in datanode, VersionEndpointTask#call ends up in dead lock.

2018-10-15 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-661:

Attachment: HDDS-661.001.patch

> When a volume fails in datanode, VersionEndpointTask#call ends up in dead 
> lock.
> ---
>
> Key: HDDS-661
> URL: https://issues.apache.org/jira/browse/HDDS-661
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-661.001.patch
>
>
> When a volume fails in datanode, the call to {{VersionEndpointTask#call}} 
> ends up in dead-lock.
> {code:java}
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:78)
> --> we acquire VolumeSet read lock here.
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:93)
>   
> org.apache.hadoop.ozone.container.common.volume.VolumeSet.failVolume(VolumeSet.java:276)
>   
> org.apache.hadoop.ozone.container.common.volume.VolumeSet.writeLock(VolumeSet.java:210)
> ---> we wait for VolumeSet write lock.
> {code}
> Since this thread already holds the read lock, it cannot get the write lock 
> and ends up in dead-lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-661) When a volume fails in datanode, VersionEndpointTask#call ends up in dead lock.

2018-10-15 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-661:

Status: Patch Available  (was: Open)

> When a volume fails in datanode, VersionEndpointTask#call ends up in dead 
> lock.
> ---
>
> Key: HDDS-661
> URL: https://issues.apache.org/jira/browse/HDDS-661
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-661.001.patch
>
>
> When a volume fails in datanode, the call to {{VersionEndpointTask#call}} 
> ends up in dead-lock.
> {code:java}
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:78)
> --> we acquire VolumeSet read lock here.
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:93)
>   
> org.apache.hadoop.ozone.container.common.volume.VolumeSet.failVolume(VolumeSet.java:276)
>   
> org.apache.hadoop.ozone.container.common.volume.VolumeSet.writeLock(VolumeSet.java:210)
> ---> we wait for VolumeSet write lock.
> {code}
> Since this thread already holds the read lock, it cannot get the write lock 
> and ends up in dead-lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-661) When a volume fails in datanode, VersionEndpointTask#call ends up in dead lock.

2018-10-15 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650638#comment-16650638
 ] 

Hanisha Koneru commented on HDDS-661:
-

Thanks [~nandakumar131] for catching this bug.

In VersionEndPointTask, we take the writeLock instead of ReadLock. Since this 
is the very first call from the DN and DN cannot register/ heartbeat before 
this call completes, we can safely say that getting the writeLock here would 
not block any other process.

I have posted a patch with this change.

> When a volume fails in datanode, VersionEndpointTask#call ends up in dead 
> lock.
> ---
>
> Key: HDDS-661
> URL: https://issues.apache.org/jira/browse/HDDS-661
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-661.001.patch
>
>
> When a volume fails in datanode, the call to {{VersionEndpointTask#call}} 
> ends up in dead-lock.
> {code:java}
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:78)
> --> we acquire VolumeSet read lock here.
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:93)
>   
> org.apache.hadoop.ozone.container.common.volume.VolumeSet.failVolume(VolumeSet.java:276)
>   
> org.apache.hadoop.ozone.container.common.volume.VolumeSet.writeLock(VolumeSet.java:210)
> ---> we wait for VolumeSet write lock.
> {code}
> Since this thread already holds the read lock, it cannot get the write lock 
> and ends up in dead-lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-681) VolumeSet lock should not be exposed outside of VolumeSet class.

2018-10-17 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru reassigned HDDS-681:
---

Assignee: Hanisha Koneru

> VolumeSet lock should not be exposed outside of VolumeSet class.
> 
>
> Key: HDDS-681
> URL: https://issues.apache.org/jira/browse/HDDS-681
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Hanisha Koneru
>Priority: Major
>
> If {{VolumeSet}} lock is exposed outside of {{VolumeSet}} class then someone 
> who is using it can end up in deadlock situation easily. We should change the 
> code in such a way that the lock is not exposed and the data structure inside 
> VolumeSet is also protected by the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-663) Lot of "Removed undeclared tags" logger while running commands

2018-10-16 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652195#comment-16652195
 ] 

Hanisha Koneru commented on HDDS-663:
-

This has been fixed in HADOOP-15295. 

[~nmaheshwari], which version of Hadoop did you see this in?

> Lot of "Removed undeclared tags" logger while running commands
> --
>
> Key: HDDS-663
> URL: https://issues.apache.org/jira/browse/HDDS-663
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Namit Maheshwari
>Priority: Major
>  Labels: newbie
> Fix For: 0.3.0
>
>
> While running commands against OzoneFs see lot of logger like below:
> {code:java}
> -bash-4.2$ hdfs dfs -ls o3://bucket2.volume2/mr_jobEE
> 18/10/15 20:29:17 INFO conf.Configuration: Removed undeclared tags:
> 18/10/15 20:29:18 INFO conf.Configuration: Removed undeclared tags:
> Found 2 items
> rw-rw-rw 1 hdfs hdfs 0 2018-10-15 20:28 o3://bucket2.volume2/mr_jobEE/_SUCCESS
> rw-rw-rw 1 hdfs hdfs 5017 1970-07-23 04:33 
> o3://bucket2.volume2/mr_jobEE/part-r-0
> 18/10/15 20:29:19 INFO conf.Configuration: Removed undeclared tags:
> -bash-4.2$ {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-664) Creating hive table on Ozone fails

2018-10-16 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru reassigned HDDS-664:
---

Assignee: Hanisha Koneru

> Creating hive table on Ozone fails
> --
>
> Key: HDDS-664
> URL: https://issues.apache.org/jira/browse/HDDS-664
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
>
> Modified HIVE_AUX_JARS_PATH to include Ozone jars. Tried creating Hive 
> external table on Ozone. It fails with "Error: Error while compiling 
> statement: FAILED: HiveAuthzPluginException Error getting permissions for 
> o3://bucket2.volume2/testo3: User: hive is not allowed to impersonate 
> anonymous (state=42000,code=4)"
> {code:java}
> -bash-4.2$ beeline
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.0.3.0-63/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.0.3.0-63/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://ctr-e138-1518143905142-510793-01-11.hwx.site:2181,ctr-e138-1518143905142-510793-01-06.hwx.site:2181,ctr-e138-1518143905142-510793-01-08.hwx.site:2181,ctr-e138-1518143905142-510793-01-10.hwx.site:2181,ctr-e138-1518143905142-510793-01-07.hwx.site:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
> Enter username for 
> jdbc:hive2://ctr-e138-1518143905142-510793-01-11.hwx.site:2181,ctr-e138-1518143905142-510793-01-06.hwx.site:2181,ctr-e138-1518143905142-510793-01-08.hwx.site:2181,ctr-e138-1518143905142-510793-01-10.hwx.site:2181,ctr-e138-1518143905142-510793-01-07.hwx.site:2181/default:
> Enter password for 
> jdbc:hive2://ctr-e138-1518143905142-510793-01-11.hwx.site:2181,ctr-e138-1518143905142-510793-01-06.hwx.site:2181,ctr-e138-1518143905142-510793-01-08.hwx.site:2181,ctr-e138-1518143905142-510793-01-10.hwx.site:2181,ctr-e138-1518143905142-510793-01-07.hwx.site:2181/default:
> 18/10/15 21:36:55 [main]: INFO jdbc.HiveConnection: Connected to 
> ctr-e138-1518143905142-510793-01-04.hwx.site:1
> Connected to: Apache Hive (version 3.1.0.3.0.3.0-63)
> Driver: Hive JDBC (version 3.1.0.3.0.3.0-63)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 3.1.0.3.0.3.0-63 by Apache Hive
> 0: jdbc:hive2://ctr-e138-1518143905142-510793> create external table testo3 ( 
> i int, s string, d float) location "o3://bucket2.volume2/testo3";
> Error: Error while compiling statement: FAILED: HiveAuthzPluginException 
> Error getting permissions for o3://bucket2.volume2/testo3: User: hive is not 
> allowed to impersonate anonymous (state=42000,code=4)
> 0: jdbc:hive2://ctr-e138-1518143905142-510793> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-612) Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock fails with ChillModePrecheck exception

2018-10-18 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655780#comment-16655780
 ] 

Hanisha Koneru commented on HDDS-612:
-

Thanks for the review [~arpitagarwal].
{quote}why is the following change necessary?
{quote}
In {{exitChileMode()}}, we call the {{emitChillModeStatus()}}. So the emit 
function was being called twice before.

Added a unit test and also updated SCMChillModeManager to include a check if 
chillMode is enabled before checking the exitRules.

> Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock 
> fails with ChillModePrecheck exception
> 
>
> Key: HDDS-612
> URL: https://issues.apache.org/jira/browse/HDDS-612
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-612.001.patch, HDDS-612.002.patch, 
> HDDS-612.003.patch
>
>
> {code:java}
> 2018-10-09 23:11:58,047 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 9863, call Call#70 Retry#0 
> org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock 
> from 172.27.56.9:53442
> org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
> for allocateBlock
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
> at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
> at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191)
> at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-686) Incorrect creation time for files created by o3fs.

2018-10-18 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655901#comment-16655901
 ] 

Hanisha Koneru commented on HDDS-686:
-

[~jnp], I am getting the correct time format for the files created by o3fs.
{code:java}
$./hadoop fs -mkdir o3fs://bucket1.volume1/key2
$./ozone sh key list /volume1/bucket1
[{
"version" : 0,
"md5hash" : null,
"createdOn" : "Thu, 18 Oct 2018 21:13:37 GMT",
"modifiedOn" : "Thu, 18 Oct 2018 21:13:37 GMT",
"size" : 0,
"keyName" : "key2/"
} ]{code}
Could you please give the steps to repro?

> Incorrect creation time for files created by o3fs.
> --
>
> Key: HDDS-686
> URL: https://issues.apache.org/jira/browse/HDDS-686
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Filesystem
>Reporter: Jitendra Nath Pandey
>Assignee: Hanisha Koneru
>Priority: Blocker
>  Labels: app-compat
>
> Files created by o3fs show creation timestamp as unix epoch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-612) Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock fails with ChillModePrecheck exception

2018-10-18 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-612:

Attachment: HDDS-612.003.patch

> Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock 
> fails with ChillModePrecheck exception
> 
>
> Key: HDDS-612
> URL: https://issues.apache.org/jira/browse/HDDS-612
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-612.001.patch, HDDS-612.002.patch, 
> HDDS-612.003.patch
>
>
> {code:java}
> 2018-10-09 23:11:58,047 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 9863, call Call#70 Retry#0 
> org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock 
> from 172.27.56.9:53442
> org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
> for allocateBlock
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
> at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
> at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191)
> at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-338) ozoneFS allows to create file key and directory key with same keyname

2018-10-17 Thread Hanisha Koneru (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654565#comment-16654565
 ] 

Hanisha Koneru commented on HDDS-338:
-

{code:java}
$ ./ozone sh key put /volume2/bucket2/dir1/dir2/dir3/key1234 /etc/hadoop/workers
$ ./ozone sh key list /volume2/bucket2
[ {
"version" : 0,
"md5hash" : null,
"createdOn" : "Wed, 17 Oct 2018 23:13:15 GMT",
"modifiedOn" : "Wed, 17 Oct 2018 23:13:17 GMT",
"size" : 10,
"keyName" : "dir1"
}]{code}

> ozoneFS allows to create file key and directory key with same keyname
> -
>
> Key: HDDS-338
> URL: https://issues.apache.org/jira/browse/HDDS-338
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Nilotpal Nandi
>Assignee: Hanisha Koneru
>Priority: Critical
> Attachments: HDDS-338.001.patch
>
>
> steps taken :
> --
> 1. created a directory through ozoneFS interface.
> {noformat}
> hadoop@1a1fa8a11332:~/bin$ ./ozone fs -mkdir /temp1/
> 2018-08-08 13:50:26 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> hadoop@1a1fa8a11332:~/bin$ ./ozone fs -ls /
> 2018-08-08 14:09:59 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> drwxrwxrwx - 0 2018-08-08 13:51 /temp1{noformat}
> 2. create a new key with name 'temp1'  at same bucket.
> {noformat}
> hadoop@1a1fa8a11332:~/bin$ ./ozone oz -putKey root-volume/root-bucket/temp1 
> -file /etc/passwd
> 2018-08-08 14:10:34 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.rpc.type = GRPC (default)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 
> ms (default)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - 
> raft.client.async.outstanding-requests.max = 100 (default)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.async.scheduler-threads = 
> 3 (default)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB 
> (=1048576) (default)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.rpc.request.timeout = 
> 3000 ms (default)
> Aug 08, 2018 2:10:36 PM 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl detectProxy
> WARNING: Failed to construct URI for proxy lookup, proceeding without proxy
> java.net.URISyntaxException: Illegal character in hostname at index 13: 
> https://ozone_datanode_3.ozone_default:9858
>  at java.net.URI$Parser.fail(URI.java:2848)
>  at java.net.URI$Parser.parseHostname(URI.java:3387)
>  at java.net.URI$Parser.parseServer(URI.java:3236)
>  at java.net.URI$Parser.parseAuthority(URI.java:3155)
>  at java.net.URI$Parser.parseHierarchical(URI.java:3097)
>  at java.net.URI$Parser.parse(URI.java:3053)
>  at java.net.URI.(URI.java:673)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl.detectProxy(ProxyDetectorImpl.java:128)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl.proxyFor(ProxyDetectorImpl.java:118)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.InternalSubchannel.startNewTransport(InternalSubchannel.java:207)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.InternalSubchannel.obtainActiveTransport(InternalSubchannel.java:188)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$SubchannelImpl.requestConnection(ManagedChannelImpl.java:1130)
>  at 
> org.apache.ratis.shaded.io.grpc.PickFirstBalancerFactory$PickFirstBalancer.handleResolvedAddressGroups(PickFirstBalancerFactory.java:79)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl$1NamesResolved.run(ManagedChannelImpl.java:1032)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ChannelExecutor.drain(ChannelExecutor.java:73)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$LbHelperImpl.runSerialized(ManagedChannelImpl.java:1000)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl.onAddresses(ManagedChannelImpl.java:1044)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.DnsNameResolver$1.run(DnsNameResolver.java:201)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){noformat}
> Observed that there are multiple entries of 'temp1' when ozone fs -ls 

[jira] [Issue Comment Deleted] (HDDS-338) ozoneFS allows to create file key and directory key with same keyname

2018-10-17 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-338:

Comment: was deleted

(was: {code:java}
$ ./ozone sh key put /volume2/bucket2/dir1/dir2/dir3/key1234 /etc/hadoop/workers
$ ./ozone sh key list /volume2/bucket2
[ {
"version" : 0,
"md5hash" : null,
"createdOn" : "Wed, 17 Oct 2018 23:13:15 GMT",
"modifiedOn" : "Wed, 17 Oct 2018 23:13:17 GMT",
"size" : 10,
"keyName" : "dir1"
}]{code})

> ozoneFS allows to create file key and directory key with same keyname
> -
>
> Key: HDDS-338
> URL: https://issues.apache.org/jira/browse/HDDS-338
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Nilotpal Nandi
>Assignee: Hanisha Koneru
>Priority: Critical
> Attachments: HDDS-338.001.patch
>
>
> steps taken :
> --
> 1. created a directory through ozoneFS interface.
> {noformat}
> hadoop@1a1fa8a11332:~/bin$ ./ozone fs -mkdir /temp1/
> 2018-08-08 13:50:26 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> hadoop@1a1fa8a11332:~/bin$ ./ozone fs -ls /
> 2018-08-08 14:09:59 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Found 1 items
> drwxrwxrwx - 0 2018-08-08 13:51 /temp1{noformat}
> 2. create a new key with name 'temp1'  at same bucket.
> {noformat}
> hadoop@1a1fa8a11332:~/bin$ ./ozone oz -putKey root-volume/root-bucket/temp1 
> -file /etc/passwd
> 2018-08-08 14:10:34 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.rpc.type = GRPC (default)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.rpc.retryInterval = 300 
> ms (default)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - 
> raft.client.async.outstanding-requests.max = 100 (default)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.async.scheduler-threads = 
> 3 (default)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.flow.control.window = 1MB 
> (=1048576) (default)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.grpc.message.size.max = 33554432 
> (custom)
> 2018-08-08 14:10:35 INFO ConfUtils:41 - raft.client.rpc.request.timeout = 
> 3000 ms (default)
> Aug 08, 2018 2:10:36 PM 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl detectProxy
> WARNING: Failed to construct URI for proxy lookup, proceeding without proxy
> java.net.URISyntaxException: Illegal character in hostname at index 13: 
> https://ozone_datanode_3.ozone_default:9858
>  at java.net.URI$Parser.fail(URI.java:2848)
>  at java.net.URI$Parser.parseHostname(URI.java:3387)
>  at java.net.URI$Parser.parseServer(URI.java:3236)
>  at java.net.URI$Parser.parseAuthority(URI.java:3155)
>  at java.net.URI$Parser.parseHierarchical(URI.java:3097)
>  at java.net.URI$Parser.parse(URI.java:3053)
>  at java.net.URI.(URI.java:673)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl.detectProxy(ProxyDetectorImpl.java:128)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ProxyDetectorImpl.proxyFor(ProxyDetectorImpl.java:118)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.InternalSubchannel.startNewTransport(InternalSubchannel.java:207)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.InternalSubchannel.obtainActiveTransport(InternalSubchannel.java:188)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$SubchannelImpl.requestConnection(ManagedChannelImpl.java:1130)
>  at 
> org.apache.ratis.shaded.io.grpc.PickFirstBalancerFactory$PickFirstBalancer.handleResolvedAddressGroups(PickFirstBalancerFactory.java:79)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl$1NamesResolved.run(ManagedChannelImpl.java:1032)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ChannelExecutor.drain(ChannelExecutor.java:73)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$LbHelperImpl.runSerialized(ManagedChannelImpl.java:1000)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.ManagedChannelImpl$NameResolverListenerImpl.onAddresses(ManagedChannelImpl.java:1044)
>  at 
> org.apache.ratis.shaded.io.grpc.internal.DnsNameResolver$1.run(DnsNameResolver.java:201)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){noformat}
> Observed that there are multiple entries of 'temp1' when ozone fs -ls command 
> 

[jira] [Updated] (HDDS-670) Fix OzoneFS directory rename

2018-10-17 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-670:

Description: 
 

Renaming a directory within the same parent directory fails with the exception:
{code:java}
Unable to move: 
o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1
 to: 
o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved
{code}
Detailed exception in comment below.

  was:
It fails with 
{code:java}
ERROR : Job Commit failed with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(Unable to move: 
o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1
 to: 
o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved)'
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move: 
o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1
 to: 
o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved
{code}
 

Detailed exception in comment below.


> Fix OzoneFS directory rename
> 
>
> Key: HDDS-670
> URL: https://issues.apache.org/jira/browse/HDDS-670
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Blocker
>  Labels: app-compat
> Attachments: HDDS-670.001.patch
>
>
>  
> Renaming a directory within the same parent directory fails with the 
> exception:
> {code:java}
> Unable to move: 
> o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1
>  to: 
> o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved
> {code}
> Detailed exception in comment below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-670) Fix OzoneFS directory rename

2018-10-17 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-670:

Summary: Fix OzoneFS directory rename  (was: Hive insert fails against 
Ozone external table)

> Fix OzoneFS directory rename
> 
>
> Key: HDDS-670
> URL: https://issues.apache.org/jira/browse/HDDS-670
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Blocker
>  Labels: app-compat
> Attachments: HDDS-670.001.patch
>
>
> It fails with 
> {code:java}
> ERROR : Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(Unable to move: 
> o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1
>  to: 
> o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved)'
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move: 
> o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1
>  to: 
> o3://bucket2.volume2/testo3/.hive-staging_hive_2018-10-16_21-09-35_130_1001829123585250245-1/_tmp.-ext-1.moved
> {code}
>  
> Detailed exception in comment below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-612) Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock fails with ChillModePrecheck exception

2018-10-17 Thread Hanisha Koneru (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-612:

Attachment: HDDS-612.002.patch

> Even after setting hdds.scm.chillmode.enabled to false, SCM allocateblock 
> fails with ChillModePrecheck exception
> 
>
> Key: HDDS-612
> URL: https://issues.apache.org/jira/browse/HDDS-612
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-612.001.patch, HDDS-612.002.patch
>
>
> {code:java}
> 2018-10-09 23:11:58,047 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 9863, call Call#70 Retry#0 
> org.apache.hadoop.ozone.protocol.ScmBlockLocationProtocol.allocateScmBlock 
> from 172.27.56.9:53442
> org.apache.hadoop.hdds.scm.exceptions.SCMException: ChillModePrecheck failed 
> for allocateBlock
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:38)
> at 
> org.apache.hadoop.hdds.scm.server.ChillModePrecheck.check(ChillModePrecheck.java:30)
> at org.apache.hadoop.hdds.scm.ScmUtils.preCheck(ScmUtils.java:42)
> at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:191)
> at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:143)
> at 
> org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:74)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:6255)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    4   5   6   7   8   9   10   11   12   13   >