[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

2018-06-20 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518901#comment-16518901
 ] 

Anu Engineer commented on HDFS-10285:
-

+1, I also completely agree with [~umamaheswararao]'s proposal. The internal 
SPS code can come in the next phase of SPS. This way HBase will be able to 
start using the feature and SPS being internal or external is not visible to 
clients. I suggest that start a DISCUSS thread for SPS merge.

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-94) Change ozone datanode command to start the standalone datanode plugin

2018-06-20 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518723#comment-16518723
 ] 

Elek, Marton commented on HDDS-94:
--

Thank you very much to do this [~Sandeep Nemuri].

I did a quick survey offline and asked multiple people. Most of the people 
voted to use the datanode as the command line name of the standalone ozone 
datanode. (ozone datanode instead of ozone hddsdn). I propose to replace the 
original datanode subcommand instead of creataing hddsdn (sorry for not 
including this info in the original description).

I also propose to update the docker-compose files:

./hadoop-ozone/acceptance-test/src/test/acceptance/basic/docker-compose.yaml
./hadoop-dist/src/main/compose/ozoneperf/docker-compose.yaml
./hadoop-dist/src/main/compose/ozone/docker-compose.yaml

The namenode service could be removed from them (they use "ozone datanode" 
which should ok after the change). It could be done in other jira, but I prefer 
to do this in one step to make it easier to test.

(BTW, I tested it with modifying the docker-compose files locally and it worked 
very well).

Thanks, again. 
 

> Change ozone datanode command to start the standalone datanode plugin
> -
>
> Key: HDDS-94
> URL: https://issues.apache.org/jira/browse/HDDS-94
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Elek, Marton
>Assignee: Sandeep Nemuri
>Priority: Major
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-94.001.patch
>
>
> The current ozone datanode command starts the regular hdfs datanode with an 
> enabled HddsDatanodeService as a datanode plugin.
> The goal is to start only the HddsDatanodeService.java (main function is 
> already there but GenericOptionParser should be adopted). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518718#comment-16518718
 ] 

Hudson commented on HDFS-13682:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14457 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14457/])
HDFS-13682. Cannot create encryption zone after KMS auth token expires. (xiao: 
rev 32f867a6a907c05a312657139d295a92756d98ef)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestSecureEncryptionZoneWithKMS.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java


> Cannot create encryption zone after KMS auth token expires
> --
>
> Key: HDFS-13682
> URL: https://issues.apache.org/jira/browse/HDFS-13682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, kms, namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13682.01.patch, HDFS-13682.02.patch, 
> HDFS-13682.03.patch, HDFS-13682.dirty.repro.branch-2.patch, 
> HDFS-13682.dirty.repro.patch
>
>
> Our internal testing reported this behavior recently.
> {noformat}
> [root@nightly6x-1 ~]# sudo -u hdfs /usr/bin/kinit -kt 
> /cdep/keytabs/hdfs.keytab hdfs -l 30d -r 30d
> [root@nightly6x-1 ~]# sudo -u hdfs klist
> Ticket cache: FILE:/tmp/krb5cc_994
> Default principal: h...@gce.cloudera.com
> Valid starting   Expires  Service principal
> 06/12/2018 03:24:09  07/12/2018 03:24:09  
> krbtgt/gce.cloudera@gce.cloudera.com
> [root@nightly6x-1 ~]# sudo -u hdfs hdfs crypto -createZone -keyName key77 
> -path /user/systest/ez
> RemoteException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> {noformat}
> Upon further investigation, it's due to the KMS client (cached in HDFS NN) 
> cannot authenticate with the server after the authentication token (which is 
> cached by KMSCP) expires, even if the HDFS client RPC has valid kerberos 
> credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-20 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13682:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.4
   3.1.1
   3.2.0
   Status: Resolved  (was: Patch Available)

> Cannot create encryption zone after KMS auth token expires
> --
>
> Key: HDFS-13682
> URL: https://issues.apache.org/jira/browse/HDFS-13682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, kms, namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13682.01.patch, HDFS-13682.02.patch, 
> HDFS-13682.03.patch, HDFS-13682.dirty.repro.branch-2.patch, 
> HDFS-13682.dirty.repro.patch
>
>
> Our internal testing reported this behavior recently.
> {noformat}
> [root@nightly6x-1 ~]# sudo -u hdfs /usr/bin/kinit -kt 
> /cdep/keytabs/hdfs.keytab hdfs -l 30d -r 30d
> [root@nightly6x-1 ~]# sudo -u hdfs klist
> Ticket cache: FILE:/tmp/krb5cc_994
> Default principal: h...@gce.cloudera.com
> Valid starting   Expires  Service principal
> 06/12/2018 03:24:09  07/12/2018 03:24:09  
> krbtgt/gce.cloudera@gce.cloudera.com
> [root@nightly6x-1 ~]# sudo -u hdfs hdfs crypto -createZone -keyName key77 
> -path /user/systest/ez
> RemoteException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> {noformat}
> Upon further investigation, it's due to the KMS client (cached in HDFS NN) 
> cannot authenticate with the server after the authentication token (which is 
> cached by KMSCP) expires, even if the HDFS client RPC has valid kerberos 
> credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-20 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13682:
-
Component/s: kms

> Cannot create encryption zone after KMS auth token expires
> --
>
> Key: HDFS-13682
> URL: https://issues.apache.org/jira/browse/HDFS-13682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, kms, namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-13682.01.patch, HDFS-13682.02.patch, 
> HDFS-13682.03.patch, HDFS-13682.dirty.repro.branch-2.patch, 
> HDFS-13682.dirty.repro.patch
>
>
> Our internal testing reported this behavior recently.
> {noformat}
> [root@nightly6x-1 ~]# sudo -u hdfs /usr/bin/kinit -kt 
> /cdep/keytabs/hdfs.keytab hdfs -l 30d -r 30d
> [root@nightly6x-1 ~]# sudo -u hdfs klist
> Ticket cache: FILE:/tmp/krb5cc_994
> Default principal: h...@gce.cloudera.com
> Valid starting   Expires  Service principal
> 06/12/2018 03:24:09  07/12/2018 03:24:09  
> krbtgt/gce.cloudera@gce.cloudera.com
> [root@nightly6x-1 ~]# sudo -u hdfs hdfs crypto -createZone -keyName key77 
> -path /user/systest/ez
> RemoteException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> {noformat}
> Upon further investigation, it's due to the KMS client (cached in HDFS NN) 
> cannot authenticate with the server after the authentication token (which is 
> cached by KMSCP) expires, even if the HDFS client RPC has valid kerberos 
> credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-20 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518699#comment-16518699
 ] 

Xiao Chen commented on HDFS-13682:
--

Thanks a lot Wei-Chiu!

Test failure is HDFS-13662, unrelated to this patch. Committing.

> Cannot create encryption zone after KMS auth token expires
> --
>
> Key: HDFS-13682
> URL: https://issues.apache.org/jira/browse/HDFS-13682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-13682.01.patch, HDFS-13682.02.patch, 
> HDFS-13682.03.patch, HDFS-13682.dirty.repro.branch-2.patch, 
> HDFS-13682.dirty.repro.patch
>
>
> Our internal testing reported this behavior recently.
> {noformat}
> [root@nightly6x-1 ~]# sudo -u hdfs /usr/bin/kinit -kt 
> /cdep/keytabs/hdfs.keytab hdfs -l 30d -r 30d
> [root@nightly6x-1 ~]# sudo -u hdfs klist
> Ticket cache: FILE:/tmp/krb5cc_994
> Default principal: h...@gce.cloudera.com
> Valid starting   Expires  Service principal
> 06/12/2018 03:24:09  07/12/2018 03:24:09  
> krbtgt/gce.cloudera@gce.cloudera.com
> [root@nightly6x-1 ~]# sudo -u hdfs hdfs crypto -createZone -keyName key77 
> -path /user/systest/ez
> RemoteException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> {noformat}
> Upon further investigation, it's due to the KMS client (cached in HDFS NN) 
> cannot authenticate with the server after the authentication token (which is 
> cached by KMSCP) expires, even if the HDFS client RPC has valid kerberos 
> credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-20 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518670#comment-16518670
 ] 

genericqa commented on HDFS-13682:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
17s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 46s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}239m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.client.impl.TestBlockReaderLocal |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13682 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928520/HDFS-13682.03.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 38b60da8f0c5 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9a9e969 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Commented] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518520#comment-16518520
 ] 

Wei-Chiu Chuang commented on HDFS-13682:


+1 pending Jenkins

> Cannot create encryption zone after KMS auth token expires
> --
>
> Key: HDFS-13682
> URL: https://issues.apache.org/jira/browse/HDFS-13682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-13682.01.patch, HDFS-13682.02.patch, 
> HDFS-13682.03.patch, HDFS-13682.dirty.repro.branch-2.patch, 
> HDFS-13682.dirty.repro.patch
>
>
> Our internal testing reported this behavior recently.
> {noformat}
> [root@nightly6x-1 ~]# sudo -u hdfs /usr/bin/kinit -kt 
> /cdep/keytabs/hdfs.keytab hdfs -l 30d -r 30d
> [root@nightly6x-1 ~]# sudo -u hdfs klist
> Ticket cache: FILE:/tmp/krb5cc_994
> Default principal: h...@gce.cloudera.com
> Valid starting   Expires  Service principal
> 06/12/2018 03:24:09  07/12/2018 03:24:09  
> krbtgt/gce.cloudera@gce.cloudera.com
> [root@nightly6x-1 ~]# sudo -u hdfs hdfs crypto -createZone -keyName key77 
> -path /user/systest/ez
> RemoteException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> {noformat}
> Upon further investigation, it's due to the KMS client (cached in HDFS NN) 
> cannot authenticate with the server after the authentication token (which is 
> cached by KMSCP) expires, even if the HDFS client RPC has valid kerberos 
> credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-20 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518444#comment-16518444
 ] 

Xiao Chen commented on HDFS-13682:
--

Thanks for the review and offline discussion [~jojochuang]! Actually my memory 
overflowed and we can use {{UGI#shouldRelogin}}. 

[^HDFS-13682.03.patch]uploaded

> Cannot create encryption zone after KMS auth token expires
> --
>
> Key: HDFS-13682
> URL: https://issues.apache.org/jira/browse/HDFS-13682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-13682.01.patch, HDFS-13682.02.patch, 
> HDFS-13682.03.patch, HDFS-13682.dirty.repro.branch-2.patch, 
> HDFS-13682.dirty.repro.patch
>
>
> Our internal testing reported this behavior recently.
> {noformat}
> [root@nightly6x-1 ~]# sudo -u hdfs /usr/bin/kinit -kt 
> /cdep/keytabs/hdfs.keytab hdfs -l 30d -r 30d
> [root@nightly6x-1 ~]# sudo -u hdfs klist
> Ticket cache: FILE:/tmp/krb5cc_994
> Default principal: h...@gce.cloudera.com
> Valid starting   Expires  Service principal
> 06/12/2018 03:24:09  07/12/2018 03:24:09  
> krbtgt/gce.cloudera@gce.cloudera.com
> [root@nightly6x-1 ~]# sudo -u hdfs hdfs crypto -createZone -keyName key77 
> -path /user/systest/ez
> RemoteException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> {noformat}
> Upon further investigation, it's due to the KMS client (cached in HDFS NN) 
> cannot authenticate with the server after the authentication token (which is 
> cached by KMSCP) expires, even if the HDFS client RPC has valid kerberos 
> credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-20 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13682:
-
Attachment: HDFS-13682.03.patch

> Cannot create encryption zone after KMS auth token expires
> --
>
> Key: HDFS-13682
> URL: https://issues.apache.org/jira/browse/HDFS-13682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-13682.01.patch, HDFS-13682.02.patch, 
> HDFS-13682.03.patch, HDFS-13682.dirty.repro.branch-2.patch, 
> HDFS-13682.dirty.repro.patch
>
>
> Our internal testing reported this behavior recently.
> {noformat}
> [root@nightly6x-1 ~]# sudo -u hdfs /usr/bin/kinit -kt 
> /cdep/keytabs/hdfs.keytab hdfs -l 30d -r 30d
> [root@nightly6x-1 ~]# sudo -u hdfs klist
> Ticket cache: FILE:/tmp/krb5cc_994
> Default principal: h...@gce.cloudera.com
> Valid starting   Expires  Service principal
> 06/12/2018 03:24:09  07/12/2018 03:24:09  
> krbtgt/gce.cloudera@gce.cloudera.com
> [root@nightly6x-1 ~]# sudo -u hdfs hdfs crypto -createZone -keyName key77 
> -path /user/systest/ez
> RemoteException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> {noformat}
> Upon further investigation, it's due to the KMS client (cached in HDFS NN) 
> cannot authenticate with the server after the authentication token (which is 
> cached by KMSCP) expires, even if the HDFS client RPC has valid kerberos 
> credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518345#comment-16518345
 ] 

Wei-Chiu Chuang commented on HDFS-13682:


Thanks [~xiaochen]!

Ok, I think the bug report & fix makes sense. so looks like the externally 
managed subjects are handled by HADOOP-9747. the shadedclient error doesn't 
seem related.

Could you consider rename ugiCanRelogin as something like shouldUseLoginUser()? 
Somehow the name ugiCanRelogin confused me. +1 after that.

> Cannot create encryption zone after KMS auth token expires
> --
>
> Key: HDFS-13682
> URL: https://issues.apache.org/jira/browse/HDFS-13682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-13682.01.patch, HDFS-13682.02.patch, 
> HDFS-13682.dirty.repro.branch-2.patch, HDFS-13682.dirty.repro.patch
>
>
> Our internal testing reported this behavior recently.
> {noformat}
> [root@nightly6x-1 ~]# sudo -u hdfs /usr/bin/kinit -kt 
> /cdep/keytabs/hdfs.keytab hdfs -l 30d -r 30d
> [root@nightly6x-1 ~]# sudo -u hdfs klist
> Ticket cache: FILE:/tmp/krb5cc_994
> Default principal: h...@gce.cloudera.com
> Valid starting   Expires  Service principal
> 06/12/2018 03:24:09  07/12/2018 03:24:09  
> krbtgt/gce.cloudera@gce.cloudera.com
> [root@nightly6x-1 ~]# sudo -u hdfs hdfs crypto -createZone -keyName key77 
> -path /user/systest/ez
> RemoteException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> {noformat}
> Upon further investigation, it's due to the KMS client (cached in HDFS NN) 
> cannot authenticate with the server after the authentication token (which is 
> cached by KMSCP) expires, even if the HDFS client RPC has valid kerberos 
> credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-06-20 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518314#comment-16518314
 ] 

Daniel Templeton commented on HDFS-13448:
-

The build failures are unrelated:

{quote}[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-antrun-plugin:1.7:run (common-test-bats-driver) 
on project hadoop-common: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part .. @ 4:69 in 
/testptch/hadoop/hadoop-common-project/hadoop-common/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-common{quote}

It's also really hard to find that output.  It took some reverse engineering.  
Someone should look into that...

The unit test failures are worth looking at.  I recognize at least one of them 
as flaky, but I can't assert they all are flaky.  (Maybe someone else could.)

Aside from the unit tests, the patch looks good to me.  I haven't done a formal 
final pass, but I think you got it.  [~daryn], wanna take another look?

Thanks for sticking through it, [~belugabehr].

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch, 
> HDFS-13448.12.patch, HDFS-13448.13.patch, HDFS-13448.6.patch, 
> HDFS-13448.7.patch, HDFS-13448.8.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-20 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518270#comment-16518270
 ] 

genericqa commented on HDDS-178:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m  
4s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} container-service in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 23m 29s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline
 |
|   | hadoop.ozone.TestStorageContainerManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDDS-178 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928476/HDDS-178.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  

[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-20 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518158#comment-16518158
 ] 

genericqa commented on HDFS-13672:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 33s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}158m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
|   | hadoop.hdfs.TestDFSClientRetries |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13672 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928461/HDFS-13672.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1a5051bc4e08 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2d87592 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24480/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Commented] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-20 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518100#comment-16518100
 ] 

Lokesh Jain commented on HDDS-178:
--

v4 patch fixes failure in TestKeys.

> DeleteBlocks should not be handled by open containers
> -
>
> Key: HDDS-178
> URL: https://issues.apache.org/jira/browse/HDDS-178
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-178.001.patch, HDDS-178.002.patch, 
> HDDS-178.003.patch, HDDS-178.004.patch
>
>
> In the case of open containers deleteBlocks command just adds an entry in the 
> log but does not delete the blocks. These blocks are deleted only when 
> container is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-20 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-178:
-
Attachment: HDDS-178.004.patch

> DeleteBlocks should not be handled by open containers
> -
>
> Key: HDDS-178
> URL: https://issues.apache.org/jira/browse/HDDS-178
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-178.001.patch, HDDS-178.002.patch, 
> HDDS-178.003.patch, HDDS-178.004.patch
>
>
> In the case of open containers deleteBlocks command just adds an entry in the 
> log but does not delete the blocks. These blocks are deleted only when 
> container is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13691) Usage of ErasureCoder/Codec in Hadoop

2018-06-20 Thread Chaitanya Mukka (JIRA)
Chaitanya Mukka created HDFS-13691:
--

 Summary: Usage of ErasureCoder/Codec in Hadoop
 Key: HDFS-13691
 URL: https://issues.apache.org/jira/browse/HDFS-13691
 Project: Hadoop HDFS
  Issue Type: Wish
  Components: erasure-coding, hdfs
Reporter: Chaitanya Mukka


While looking through the source code of Hadoop, we found the ErasureCoder and 
ErasureCodec APIs are not being used in HDFS. The HDFS still uses the 
RawErasureCoder API.

We have been working on creating a Erasure Codec Plugin for 
[ClayCodes,|https://www.usenix.org/conference/fast18/presentation/vajha] which 
uses these APIs. We would like to know what is the progress regarding 
integrating the ErasureCodec in HDFS. We could not find any active Jira on the 
same, correct us if we are wrong. (HDFS-7337 seems to be resolved)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-170) Fix TestBlockDeletingService#testBlockDeletionTimeout

2018-06-20 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518069#comment-16518069
 ] 

genericqa commented on HDDS-170:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
7s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
41s{color} | {color:green} container-service in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 23s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.TestStorageContainerManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDDS-170 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928382/HDDS-170.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  

[jira] [Commented] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-20 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518044#comment-16518044
 ] 

genericqa commented on HDDS-178:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
35s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 43s{color} 
| {color:red} container-service in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 25m 47s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}148m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.container.common.report.TestReportPublisher 
|
|   | hadoop.ozone.TestStorageContainerManager |
|   | hadoop.ozone.web.client.TestKeys |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDDS-178 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928455/HDDS-178.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  

[jira] [Updated] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-20 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-13672:
--
Status: Patch Available  (was: In Progress)

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

2018-06-20 Thread Surendra Singh Lilhore (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518035#comment-16518035
 ] 

Surendra Singh Lilhore commented on HDFS-10285:
---

Agree with [~umamaheswararao]'s proposal. Lets merge external SPS code and 
continue working on internal SPS in next phase...

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-175) Refactor ContainerInfo to remove Pipeline object from it

2018-06-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518019#comment-16518019
 ] 

Shashikant Banerjee commented on HDDS-175:
--

Thanks [~ajayydv], for updating the patch. I just had a quick look.

There are some unwanted changes in file "WritingYarnApplications.md" and there 
are test failures as well as some findBug issues.

Can you please check these? 

I will have a closer look at it.

> Refactor ContainerInfo to remove Pipeline object from it 
> -
>
> Key: HDDS-175
> URL: https://issues.apache.org/jira/browse/HDDS-175
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-175.00.patch, HDDS-175.01.patch, HDDS-175.02.patch, 
> HDDS-175.03.patch
>
>
> Refactor ContainerInfo to remove Pipeline object from it. We can add below 4 
> fields to ContainerInfo to recreate pipeline if required:
> # pipelineId
> # replication type
> # expected replication count
> # DataNode where its replica exist



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-20 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518008#comment-16518008
 ] 

Gabor Bota commented on HDFS-13672:
---

In my v001 patch, I'm solving the issue with the limited number of block per 
iteration.

[~jojochuang], do you mean +break out from the loop after 1 second+  by 
breaking out of the whole clearCorruptLazyPersistFiles function or just 
releasing a lock and continue the interation? If the 
{{LazyPersistFileScrubber}} runs periodically we don't even have to remove all 
the corrupt replica blocks in one run.

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-20 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518008#comment-16518008
 ] 

Gabor Bota edited comment on HDFS-13672 at 6/20/18 10:26 AM:
-

In my v001 patch, I'm solving the issue with the limited number of block per 
iteration.

[~jojochuang], do you mean _break out from the loop after 1 second_  by 
breaking out of the whole clearCorruptLazyPersistFiles function or just 
releasing a lock and continue the interation? If the 
{{LazyPersistFileScrubber}} runs periodically we don't even have to remove all 
the corrupt replica blocks in one run.


was (Author: gabor.bota):
In my v001 patch, I'm solving the issue with the limited number of block per 
iteration.

[~jojochuang], do you mean +break out from the loop after 1 second+  by 
breaking out of the whole clearCorruptLazyPersistFiles function or just 
releasing a lock and continue the interation? If the 
{{LazyPersistFileScrubber}} runs periodically we don't even have to remove all 
the corrupt replica blocks in one run.

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-20 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-13672:
--
Attachment: HDFS-13672.001.patch

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HDFS-13672.001.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13690) Improve error message when creating encryption zone while KMS is unreachable

2018-06-20 Thread Kitti Nanasi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kitti Nanasi updated HDFS-13690:

Description: 
In failure testing, we stopped the KMS and then tried to run some encryption 
related commands.

{{hdfs crypto -createZone}} will complain with a short "RemoteException: 
Connection refused." This message could be improved to explain that we cannot 
connect to the KMSClientProvier.

For example, {{hadoop key list}} while KMS is down will error:
{code}
 -bash-4.1$ hadoop key list
 Cannot list keys for KeyProvider: 
KMSClientProvider[http://hdfs-cdh5-vanilla-1.vpc.cloudera.com:16000/kms/v1/]: 
Connection refusedjava.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at sun.net.www.http.HttpClient.New(HttpClient.java:326)
 at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
 at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
 at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:186)
 at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:125)
 at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
 at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:312)
 at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:397)
 at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:392)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:392)
 at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.getKeys(KMSClientProvider.java:479)
 at org.apache.hadoop.crypto.key.KeyShell$ListCommand.execute(KeyShell.java:286)
 at org.apache.hadoop.crypto.key.KeyShell.run(KeyShell.java:79)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.crypto.key.KeyShell.main(KeyShell.java:513)
{code}

  was:
In failure testing, we stopped the KMS and then tried to run some encryption 
related commands.

{{hdfs crypto -createZone}} will complain with a short "RemoteException: 
Connection refused." This message could be improved to explain that we cannot 
connect to the KMSClientProvier.

For example, {{hadoop key list}} while KMS is down will error:
 -bash-4.1$ hadoop key list

{code}
 Cannot list keys for KeyProvider: 
KMSClientProvider[http://hdfs-cdh5-vanilla-1.vpc.cloudera.com:16000/kms/v1/]: 
Connection refusedjava.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at sun.net.www.http.HttpClient.New(HttpClient.java:326)
 at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
 at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
 at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:186)
 at 

[jira] [Work started] (HDFS-13690) Improve error message when creating encryption zone while KMS is unreachable

2018-06-20 Thread Kitti Nanasi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-13690 started by Kitti Nanasi.
---
> Improve error message when creating encryption zone while KMS is unreachable
> 
>
> Key: HDFS-13690
> URL: https://issues.apache.org/jira/browse/HDFS-13690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption, hdfs, kms
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Minor
>
> In failure testing, we stopped the KMS and then tried to run some encryption 
> related commands.
> {{hdfs crypto -createZone}} will complain with a short "RemoteException: 
> Connection refused." This message could be improved to explain that we cannot 
> connect to the KMSClientProvier.
> For example, {{hadoop key list}} while KMS is down will error:
> {code}
>  -bash-4.1$ hadoop key list
>  Cannot list keys for KeyProvider: 
> KMSClientProvider[http://hdfs-cdh5-vanilla-1.vpc.cloudera.com:16000/kms/v1/]: 
> Connection refusedjava.net.ConnectException: Connection refused
>  at java.net.PlainSocketImpl.socketConnect(Native Method)
>  at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>  at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>  at java.net.Socket.connect(Socket.java:579)
>  at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
>  at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>  at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>  at sun.net.www.http.HttpClient.(HttpClient.java:211)
>  at sun.net.www.http.HttpClient.New(HttpClient.java:308)
>  at sun.net.www.http.HttpClient.New(HttpClient.java:326)
>  at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
>  at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
>  at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
>  at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:186)
>  at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:125)
>  at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
>  at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:312)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:397)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:392)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:392)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.getKeys(KMSClientProvider.java:479)
>  at 
> org.apache.hadoop.crypto.key.KeyShell$ListCommand.execute(KeyShell.java:286)
>  at org.apache.hadoop.crypto.key.KeyShell.run(KeyShell.java:79)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>  at org.apache.hadoop.crypto.key.KeyShell.main(KeyShell.java:513)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13690) Improve error message when creating encryption zone while KMS is unreachable

2018-06-20 Thread Kitti Nanasi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kitti Nanasi updated HDFS-13690:

Description: 
In failure testing, we stopped the KMS and then tried to run some encryption 
related commands.

{{hdfs crypto -createZone}} will complain with a short "RemoteException: 
Connection refused." This message could be improved to explain that we cannot 
connect to the KMSClientProvier.

For example, {{hadoop key list}} while KMS is down will error:
 -bash-4.1$ hadoop key list

{code}
 Cannot list keys for KeyProvider: 
KMSClientProvider[http://hdfs-cdh5-vanilla-1.vpc.cloudera.com:16000/kms/v1/]: 
Connection refusedjava.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
 at java.net.Socket.connect(Socket.java:579)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
 at sun.net.www.http.HttpClient.(HttpClient.java:211)
 at sun.net.www.http.HttpClient.New(HttpClient.java:308)
 at sun.net.www.http.HttpClient.New(HttpClient.java:326)
 at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
 at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
 at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
 at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:186)
 at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:125)
 at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
 at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:312)
 at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:397)
 at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:392)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:392)
 at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.getKeys(KMSClientProvider.java:479)
 at org.apache.hadoop.crypto.key.KeyShell$ListCommand.execute(KeyShell.java:286)
 at org.apache.hadoop.crypto.key.KeyShell.run(KeyShell.java:79)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.crypto.key.KeyShell.main(KeyShell.java:513)
{code}

  was:
In failure testing, we stopped the KMS and then tried to run some encryption 
related commands.

{{hdfs crypto -createZone}} will complain with a short "RemoteException: 
Connection refused." This message could be improved to explain that we cannot 
connect to the KMSClientProvier.

For example, {{hadoop key list}} while KMS is down will error:
-bash-4.1$ hadoop key list
Cannot list keys for KeyProvider: 
KMSClientProvider[http://hdfs-cdh5-vanilla-1.vpc.cloudera.com:16000/kms/v1/]: 
Connection refusedjava.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at 

[jira] [Created] (HDFS-13690) Improve error message when creating encryption zone while KMS is unreachable

2018-06-20 Thread Kitti Nanasi (JIRA)
Kitti Nanasi created HDFS-13690:
---

 Summary: Improve error message when creating encryption zone while 
KMS is unreachable
 Key: HDFS-13690
 URL: https://issues.apache.org/jira/browse/HDFS-13690
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: encryption, hdfs, kms
Reporter: Kitti Nanasi
Assignee: Kitti Nanasi


In failure testing, we stopped the KMS and then tried to run some encryption 
related commands.

{{hdfs crypto -createZone}} will complain with a short "RemoteException: 
Connection refused." This message could be improved to explain that we cannot 
connect to the KMSClientProvier.

For example, {{hadoop key list}} while KMS is down will error:
-bash-4.1$ hadoop key list
Cannot list keys for KeyProvider: 
KMSClientProvider[http://hdfs-cdh5-vanilla-1.vpc.cloudera.com:16000/kms/v1/]: 
Connection refusedjava.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:186)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:125)
at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:312)
at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:397)
at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:392)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:392)
at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.getKeys(KMSClientProvider.java:479)
at 
org.apache.hadoop.crypto.key.KeyShell$ListCommand.execute(KeyShell.java:286)
at org.apache.hadoop.crypto.key.KeyShell.run(KeyShell.java:79)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.crypto.key.KeyShell.main(KeyShell.java:513)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-170) Fix TestBlockDeletingService#testBlockDeletionTimeout

2018-06-20 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-170:
---
Status: Patch Available  (was: Open)

> Fix TestBlockDeletingService#testBlockDeletionTimeout
> -
>
> Key: HDDS-170
> URL: https://issues.apache.org/jira/browse/HDDS-170
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-170.001.patch
>
>
> TestBlockDeletingService#testBlockDeletionTimeout timesout while waiting for 
> expected error messsage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-20 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517945#comment-16517945
 ] 

Lokesh Jain commented on HDDS-178:
--

v3 patch fixes unit test failures, findbugs and license issues.

> DeleteBlocks should not be handled by open containers
> -
>
> Key: HDDS-178
> URL: https://issues.apache.org/jira/browse/HDDS-178
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-178.001.patch, HDDS-178.002.patch, 
> HDDS-178.003.patch
>
>
> In the case of open containers deleteBlocks command just adds an entry in the 
> log but does not delete the blocks. These blocks are deleted only when 
> container is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-20 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-178:
-
Attachment: HDDS-178.003.patch

> DeleteBlocks should not be handled by open containers
> -
>
> Key: HDDS-178
> URL: https://issues.apache.org/jira/browse/HDDS-178
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-178.001.patch, HDDS-178.002.patch, 
> HDDS-178.003.patch
>
>
> In the case of open containers deleteBlocks command just adds an entry in the 
> log but does not delete the blocks. These blocks are deleted only when 
> container is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

2018-06-20 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517908#comment-16517908
 ] 

genericqa commented on HDFS-10285:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-10285 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-10285 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12908083/HDFS-10285-consolidated-merge-patch-05.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24479/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

2018-06-20 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517906#comment-16517906
 ] 

Uma Maheswara Rao G commented on HDFS-10285:


Hi All,

After all long discussions offline, I would like to summarize the current state 
of arguments/approaches.

>From [~andrew.wang]: Interested to to this process inside NN to reduce 
>maintenance cost etc. He also agreed to have this running optionally outside.

>From [~anu]: He has no interest to run the process inside NN and in-fact he 
>was the one who proposed to start this process outside NN. We worked so far to 
>satisfy both arguments. In Offline discussion today, Anu proposing to go ahead 
>with merge using existing workable external SPS part in first phase and we 
>continue improve the feature as alternatives proposed. This feature can be 
>Alpha. 

>From [~chris.douglas]: He is fine with both options and he proposed context 
>based abstractions what we agreed and implemented so far.

>From [~daryn]: He is fine with running this process outside. If we want run 
>this internal to NN, he proposed to couple with RM instead of keeping 
>logics/queues in separate daemon thread.

>From Uma: Interested primarily to run with NN and have no major concerns to 
>start as separate process to move forward on the project.

>From [~rakeshr]: He has no concerns on either way as users can run depending 
>on their usage model.

Here I am trying to propose that the current code supports both options. But 
internal to NN is not depending on RM in current code base. So, how about we 
move forward with External SPS option for merge and continue discussing the 
internal SPS? Because internal SPS takes time to integrate with RM and testing 
etc and also definitely there may not be a major common code. While we will 
continue discussion on Internal SPS, and no concerns in External SPS, we could 
move forward with External SPS to merge? If that works we will make necessary 
cleanups and go for external SPS merge? However we will not mark this feature 
as Stable until we run this for some time. So, It should be ok to keep improve 
the code part incrementally instead nothing and blocked each other on 
arguments. Thanks

 

 

 

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by