[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360628#comment-16360628
 ] 

genericqa commented on HDFS-13119:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 10m 
19s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 22s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}195m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.federation.router.TestRouterRpc |
|   | hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages |
|   | hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13119 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910161/HDFS-13119.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 9b2e8486c464 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d02e42c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| findbugs | 

[jira] [Created] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-12 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDFS-13134:
--

 Summary: Ozone: Format open containers on datanode restart
 Key: HDFS-13134
 URL: https://issues.apache.org/jira/browse/HDFS-13134
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Lokesh Jain
Assignee: Lokesh Jain


Once a datanode is restarted its open containers should be formatted. Only the 
open containers whose pipeline has a replication factor of three will need to 
be formatted. The format command is sent by SCM to the datanode after the 
corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360641#comment-16360641
 ] 

genericqa commented on HDFS-13134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HDFS-13134 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910195/HDFS-13134.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23032/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: Format open containers on datanode restart
> -
>
> Key: HDFS-13134
> URL: https://issues.apache.org/jira/browse/HDFS-13134
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13134.001.patch
>
>
> Once a datanode is restarted its open containers should be formatted. Only 
> the open containers whose pipeline has a replication factor of three will 
> need to be formatted. The format command is sent by SCM to the datanode after 
> the corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reopened HDFS-8693:


> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693.02.patch, HDFS-8693.03.patch, HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12742) Add support for KSM --expunge command

2018-02-12 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-12742:
---
Attachment: HDFS-12742-HDFS-7240.003.patch

> Add support for KSM --expunge command
> -
>
> Key: HDFS-12742
> URL: https://issues.apache.org/jira/browse/HDFS-12742
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12742-HDFS-7240.001.patch, 
> HDFS-12742-HDFS-7240.002.patch, HDFS-12742-HDFS-7240.003.patch
>
>
> KSM --expunge will delete all the data from the data nodes for all the keys 
> in the KSM db. 
> User will have no control over the deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9608) Disk IO imbalance in HDFS with heterogeneous storages

2018-02-12 Thread Wei Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360654#comment-16360654
 ] 

Wei Zhou commented on HDFS-9608:


Hi,

I'm on vacation from 2/8/2018 and back on 2/22/2018. My email response may be 
delayed. Thanks!
Wish you happly Chinese New Year!

Best wishes
Wei


> Disk IO imbalance in HDFS with heterogeneous storages
> -
>
> Key: HDFS-9608
> URL: https://issues.apache.org/jira/browse/HDFS-9608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Wei Zhou
>Assignee: Wei Zhou
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9608.01.patch, HDFS-9608.02.patch, 
> HDFS-9608.03.patch, HDFS-9608.04.patch, HDFS-9608.05.patch, 
> HDFS-9608.06.patch, HDFS-9608.07.patch
>
>
> Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes 
> in HDFS with heterogeneous storages, this leads to non-RR choosing mode for 
> certain type of storage.
> Besides, it uses a shared lock for synchronization which limits the 
> concurrency of volume choosing process. Volume choosing threads that 
> operating on different storage types should be able to run concurrently. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12742) Add support for KSM --expunge command

2018-02-12 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-12742:
---
Status: Patch Available  (was: Open)

> Add support for KSM --expunge command
> -
>
> Key: HDFS-12742
> URL: https://issues.apache.org/jira/browse/HDFS-12742
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12742-HDFS-7240.001.patch, 
> HDFS-12742-HDFS-7240.002.patch, HDFS-12742-HDFS-7240.003.patch
>
>
> KSM --expunge will delete all the data from the data nodes for all the keys 
> in the KSM db. 
> User will have no control over the deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360588#comment-16360588
 ] 

Vinayakumar B commented on HDFS-8693:
-

+1 for addendum

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, 
> HDFS-8693.03.patch, HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13132) Ozone: Handle datanode failures in Storage Container Manager

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360637#comment-16360637
 ] 

genericqa commented on HDFS-13132:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 0s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
0s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
2s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
12s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
3s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}184m 12s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}256m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.web.client.TestKeysRatis |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-13132 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910155/HDFS-13132-HDFS-7240.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 19970fc31057 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-12742) Add support for KSM --expunge command

2018-02-12 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360651#comment-16360651
 ] 

Shashikant Banerjee commented on HDFS-12742:


patch v3 implements an API to get the list of containers owned by a KSM and 
using this API , it deletes all the containers owned by this kSM to delete all 
data when KSM -expunge command is invoked.

> Add support for KSM --expunge command
> -
>
> Key: HDFS-12742
> URL: https://issues.apache.org/jira/browse/HDFS-12742
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12742-HDFS-7240.001.patch, 
> HDFS-12742-HDFS-7240.002.patch, HDFS-12742-HDFS-7240.003.patch
>
>
> KSM --expunge will delete all the data from the data nodes for all the keys 
> in the KSM db. 
> User will have no control over the deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-12 Thread Lokesh Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-13134:
---
Attachment: HDFS-13134.001.patch

> Ozone: Format open containers on datanode restart
> -
>
> Key: HDFS-13134
> URL: https://issues.apache.org/jira/browse/HDFS-13134
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13134.001.patch
>
>
> Once a datanode is restarted its open containers should be formatted. Only 
> the open containers whose pipeline has a replication factor of three will 
> need to be formatted. The format command is sent by SCM to the datanode after 
> the corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13118) SnapshotDiffReport should provide the INode type

2018-02-12 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-13118:
--
Status: Patch Available  (was: Open)

> SnapshotDiffReport should provide the INode type
> 
>
> Key: HDFS-13118
> URL: https://issues.apache.org/jira/browse/HDFS-13118
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Major
> Attachments: HDFS-13118.001.patch, HDFS-13118.002.patch
>
>
> Currently the snapshot diff report will list which inodes were added, 
> removed, renamed, etc. But to see what the INode actually is, we need to 
> actually access the underlying snapshot - and this is cumbersome to do 
> programmatically when the snapshot diff already has the information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13118) SnapshotDiffReport should provide the INode type

2018-02-12 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-13118:
--
Status: Open  (was: Patch Available)

> SnapshotDiffReport should provide the INode type
> 
>
> Key: HDFS-13118
> URL: https://issues.apache.org/jira/browse/HDFS-13118
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Major
> Attachments: HDFS-13118.001.patch, HDFS-13118.002.patch
>
>
> Currently the snapshot diff report will list which inodes were added, 
> removed, renamed, etc. But to see what the INode actually is, we need to 
> actually access the underlying snapshot - and this is cumbersome to do 
> programmatically when the snapshot diff already has the information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-8693:
---
Status: Patch Available  (was: Reopened)

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, 
> HDFS-8693.03.patch, HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-8693:
---
Attachment: HDFS-8693-03-addendum.patch

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, 
> HDFS-8693.03.patch, HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-12 Thread Lokesh Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-13134:
---
Status: Patch Available  (was: Open)

> Ozone: Format open containers on datanode restart
> -
>
> Key: HDFS-13134
> URL: https://issues.apache.org/jira/browse/HDFS-13134
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13134.001.patch
>
>
> Once a datanode is restarted its open containers should be formatted. Only 
> the open containers whose pipeline has a replication factor of three will 
> need to be formatted. The format command is sent by SCM to the datanode after 
> the corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-12 Thread Lokesh Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360633#comment-16360633
 ] 

Lokesh Jain commented on HDFS-13134:


v1 patch currently formats all the open containers in the datanode. After 
HDFS-13132 the format would need to be based on the pipeline. Only containers 
in pipeline whose replication factor is three need to be formatted after they 
have been successfully replicated by SCM.

> Ozone: Format open containers on datanode restart
> -
>
> Key: HDFS-13134
> URL: https://issues.apache.org/jira/browse/HDFS-13134
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13134.001.patch
>
>
> Once a datanode is restarted its open containers should be formatted. Only 
> the open containers whose pipeline has a replication factor of three will 
> need to be formatted. The format command is sent by SCM to the datanode after 
> the corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9608) Disk IO imbalance in HDFS with heterogeneous storages

2018-02-12 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360653#comment-16360653
 ] 

Vinayakumar B commented on HDFS-9608:
-

Should this go to branch-2.8 (possibly 2.7) as well?

IMO, it should. Any other opinions?

> Disk IO imbalance in HDFS with heterogeneous storages
> -
>
> Key: HDFS-9608
> URL: https://issues.apache.org/jira/browse/HDFS-9608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Wei Zhou
>Assignee: Wei Zhou
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9608.01.patch, HDFS-9608.02.patch, 
> HDFS-9608.03.patch, HDFS-9608.04.patch, HDFS-9608.05.patch, 
> HDFS-9608.06.patch, HDFS-9608.07.patch
>
>
> Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes 
> in HDFS with heterogeneous storages, this leads to non-RR choosing mode for 
> certain type of storage.
> Besides, it uses a shared lock for synchronization which limits the 
> concurrency of volume choosing process. Volume choosing threads that 
> operating on different storage types should be able to run concurrently. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360538#comment-16360538
 ] 

Brahma Reddy Battula commented on HDFS-8693:


[~lindongdong]  Thanks for verifying and reporting..will upload addendum patch 
to fix this.

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693.02.patch, HDFS-8693.03.patch, HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11225) NameNode crashed because deleteSnapshot held FSNamesystem lock too long

2018-02-12 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDFS-11225:
--

Assignee: Shashikant Banerjee  (was: Manoj Govindassamy)

> NameNode crashed because deleteSnapshot held FSNamesystem lock too long
> ---
>
> Key: HDFS-11225
> URL: https://issues.apache.org/jira/browse/HDFS-11225
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
> Environment: CDH5.8.2, HA
>Reporter: Wei-Chiu Chuang
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: high-availability
> Attachments: Snaphot_Deletion_Design_Proposal.pdf
>
>
> The deleteSnapshot operation is synchronous. In certain situations this 
> operation may hold FSNamesystem lock for too long, bringing almost every 
> NameNode operation to a halt.
> We have observed one incidence where it took so long that ZKFC believes the 
> NameNode is down. All other IPC threads were waiting to acquire FSNamesystem 
> lock. This specific deleteSnapshot took ~70 seconds. ZKFC has connection 
> timeout of 45 seconds by default, and if all IPC threads wait for 
> FSNamesystem lock and can't accept new incoming connection, ZKFC times out, 
> advances epoch and NameNode will therefore lose its active NN role and then 
> fail.
> Relevant log:
> {noformat}
> Thread 154 (IPC Server handler 86 on 8020):
>   State: RUNNABLE
>   Blocked count: 2753455
>   Waited count: 89201773
>   Stack:
> 
> org.apache.hadoop.hdfs.server.namenode.INode$BlocksMapUpdateInfo.addDeleteBlock(INode.java:879)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.destroyAndCollectBlocks(INodeFile.java:508)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.destroyAndCollectBlocks(INodeDirectory.java:763)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.destroyAndCollectBlocks(INodeReference.java:339)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.destroyAndCollectBlocks(INodeReference.java:606)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$ChildrenDiff.destroyDeletedList(DirectoryWithSnapshotFeature.java:119)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$ChildrenDiff.access$400(DirectoryWithSnapshotFeature.java:61)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.destroyDiffAndCollectBlocks(DirectoryWithSnapshotFeature.java:319)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.destroyDiffAndCollectBlocks(DirectoryWithSnapshotFeature.java:167)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:83)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:745)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:776)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:747)
> 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:747)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:776)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:747)
> 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:789)
> {noformat}
> After the ZKFC determined NameNode was down and advanced epoch, the NN 
> finished deleting snapshot, and sent the edit to journal nodes, but it was 
> rejected because epoch was updated. See the following stacktrace:
> {noformat}
> 10.0.16.21:8485: IPC's epoch 17 is less than the last promised epoch 18
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:429)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:457)
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:352)
> at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:149)
> at 
> 

[jira] [Updated] (HDFS-13052) WebHDFS: Add support for snasphot diff

2018-02-12 Thread Lokesh Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-13052:
---
Attachment: HDFS-13052.006.patch

> WebHDFS: Add support for snasphot diff
> --
>
> Key: HDFS-13052
> URL: https://issues.apache.org/jira/browse/HDFS-13052
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13052.001.patch, HDFS-13052.002.patch, 
> HDFS-13052.003.patch, HDFS-13052.004.patch, HDFS-13052.005.patch, 
> HDFS-13052.006.patch
>
>
> This Jira aims to implement snapshot diff operation for webHdfs filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.

2018-02-12 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-10453:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.4
   3.0.1
   2.9.1
   2.10.0
   3.1.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-3.0, branch-3.1, branch-2, branch-2.9, branch-2.8 
and branch-2.7.

Thanks [~hexiaoqiao] for reporting and fixing this for all the active release 
branches! And thanks to all the reviewers (too numerous to mention).

> ReplicationMonitor thread could stuck for long time due to the race between 
> replication and delete of same file in a large cluster.
> ---
>
> Key: HDFS-10453
> URL: https://issues.apache.org/jira/browse/HDFS-10453
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6
>
> Attachments: HDFS-10453-branch-2.001.patch, 
> HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, 
> HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, 
> HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, 
> HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, 
> HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, 
> HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, 
> HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, 
> HDFS-10453-trunk.002.patch, HDFS-10453.001.patch
>
>
> ReplicationMonitor thread could stuck for long time and loss data with little 
> probability. Consider the typical scenario:
> (1) create and close a file with the default replicas(3);
> (2) increase replication (to 10) of the file.
> (3) delete the file while ReplicationMonitor is scheduling blocks belong to 
> that file for replications.
> if ReplicationMonitor stuck reappeared, NameNode will print log as:
> {code:xml}
> 2016-04-19 10:20:48,083 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> ..
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
> replicas: expected size is 7 but only 0 storage types can be selected 
> (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, 
> DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) All required storage types are unavailable:  
> unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) 
> process same block at the same moment.
> (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to 
> replicate and leave the global lock.
> (2) FSNamesystem#delete invoked to delete blocks then clear the reference in 
> blocksmap, needReplications, etc. the block's NumBytes will set 
> NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does 
> not need explicit ACK from the node. 
> (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to 
> chooseTargets for the same blocks and no node will be selected after traverse 
> whole cluster because  no node choice satisfy the goodness criteria 
> 

[jira] [Commented] (HDFS-13052) WebHDFS: Add support for snasphot diff

2018-02-12 Thread Lokesh Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360963#comment-16360963
 ] 

Lokesh Jain commented on HDFS-13052:


v6 patch adds a function in JsonUtil to convert SnapshotDiffReport object to 
json string. Earlier I was directly using JsonUtil#MAPPER for the conversion. 
But this is the way it has been done for other WebHdfs operations.

> WebHDFS: Add support for snasphot diff
> --
>
> Key: HDFS-13052
> URL: https://issues.apache.org/jira/browse/HDFS-13052
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13052.001.patch, HDFS-13052.002.patch, 
> HDFS-13052.003.patch, HDFS-13052.004.patch, HDFS-13052.005.patch, 
> HDFS-13052.006.patch
>
>
> This Jira aims to implement snapshot diff operation for webHdfs filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12742) Add support for KSM --expunge command

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360927#comment-16360927
 ] 

genericqa commented on HDFS-12742:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
57s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
1s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
10s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
6s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 44s{color} | {color:orange} hadoop-hdfs-project: The patch generated 27 new 
+ 9 unchanged - 0 fixed = 36 total (was 9) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
27s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 4 new + 0 
unchanged - 0 fixed = 4 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
39s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 11s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}188m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Dead store to LOG in 
org.apache.hadoop.ozone.ksm.KSMExpunge.expunge(OzoneConfiguration)  At 
KSMExpunge.java:org.apache.hadoop.ozone.ksm.KSMExpunge.expunge(OzoneConfiguration)
  At KSMExpunge.java:[line 85] |
|  |  Dead store to completed in 
org.apache.hadoop.ozone.ksm.KSMExpunge.expunge(OzoneConfiguration)  At 
KSMExpunge.java:org.apache.hadoop.ozone.ksm.KSMExpunge.expunge(OzoneConfiguration)
  At KSMExpunge.java:[line 121] |
|  |  Dead store to exception in 

[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.

2018-02-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360934#comment-16360934
 ] 

Hudson commented on HDFS-10453:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13645 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13645/])
HDFS-10453. ReplicationMonitor thread could stuck for long time due to (arp: 
rev 96bb6a51ec4a470e9b287c94e377444a9f97c410)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ErasureCodingWork.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ReplicationWork.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReconstructionWork.java


> ReplicationMonitor thread could stuck for long time due to the race between 
> replication and delete of same file in a large cluster.
> ---
>
> Key: HDFS-10453
> URL: https://issues.apache.org/jira/browse/HDFS-10453
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6
>
> Attachments: HDFS-10453-branch-2.001.patch, 
> HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, 
> HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, 
> HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, 
> HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, 
> HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, 
> HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, 
> HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, 
> HDFS-10453-trunk.002.patch, HDFS-10453.001.patch
>
>
> ReplicationMonitor thread could stuck for long time and loss data with little 
> probability. Consider the typical scenario:
> (1) create and close a file with the default replicas(3);
> (2) increase replication (to 10) of the file.
> (3) delete the file while ReplicationMonitor is scheduling blocks belong to 
> that file for replications.
> if ReplicationMonitor stuck reappeared, NameNode will print log as:
> {code:xml}
> 2016-04-19 10:20:48,083 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> ..
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
> replicas: expected size is 7 but only 0 storage types can be selected 
> (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, 
> DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) All required storage types are unavailable:  
> unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) 
> process same block at the same moment.
> (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to 
> replicate and leave the global lock.
> (2) FSNamesystem#delete invoked to delete blocks then clear the reference in 
> blocksmap, needReplications, etc. the block's NumBytes will set 
> NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does 
> not need explicit ACK from the node. 
> (3) 

[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360719#comment-16360719
 ] 

genericqa commented on HDFS-8693:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 21s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.TestErasureCodingPolicies |
|   | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 |
|   | hadoop.hdfs.TestDFSStripedOutputStream |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure050 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-8693 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910180/HDFS-8693-03-addendum.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6aeb687b3fda 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Updated] (HDFS-12998) SnapshotDiff - Provide an iterator-based listing API for calculating snapshotDiff

2018-02-12 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-12998:
---
Attachment: HDFS-12998.001.patch

> SnapshotDiff - Provide an iterator-based listing API for calculating 
> snapshotDiff
> -
>
> Key: HDFS-12998
> URL: https://issues.apache.org/jira/browse/HDFS-12998
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-12998.001.patch
>
>
> Currently , SnapshotDiff computation happens over multiple rpc calls to 
> namenode depending on the no of snapshotDiff entries where each rpc call 
> returns at max 1000 entries by default . Each "getSnapshotDiffreportListing" 
> call to namenode returns a partial snapshotDiffreportList which are all 
> combined and processed at the client side to generate a final 
> snapshotDiffreport. There can be cases where SnapshotDiffReport can be huge 
> and in situations as such , the  rpc calls to namnode should happen on demand 
> at the client side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-12 Thread Lokesh Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-13134:
---
Attachment: HDFS-13134-HDFS-7240.001.patch

> Ozone: Format open containers on datanode restart
> -
>
> Key: HDFS-13134
> URL: https://issues.apache.org/jira/browse/HDFS-13134
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13134-HDFS-7240.001.patch
>
>
> Once a datanode is restarted its open containers should be formatted. Only 
> the open containers whose pipeline has a replication factor of three will 
> need to be formatted. The format command is sent by SCM to the datanode after 
> the corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13052) WebHDFS: Add support for snasphot diff

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361294#comment-16361294
 ] 

genericqa commented on HDFS-13052:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 43s{color} | {color:orange} hadoop-hdfs-project: The patch generated 5 new + 
245 unchanged - 1 fixed = 250 total (was 246) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
22s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}119m 15s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}179m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13052 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910245/HDFS-13052.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 87d98ecb76e4 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361307#comment-16361307
 ] 

genericqa commented on HDFS-13108:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
37s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
34s{color} | {color:red} hadoop-tools/hadoop-ozone generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
23s{color} | {color:green} hadoop-ozone in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-ozone |
|  |  Dead store to path in 
org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(URI, Configuration)  At 
OzoneFileSystem.java:org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(URI, 
Configuration)  At OzoneFileSystem.java:[line 104] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-13108 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910269/HDFS-13108-HDFS-7240.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c3d7761315f7 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-7240 / f3d07ef |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23037/artifact/out/new-findbugs-hadoop-tools_hadoop-ozone.html
 |
|  Test Results | 

[jira] [Commented] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361323#comment-16361323
 ] 

genericqa commented on HDFS-13134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
29s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
22s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 10 new + 2 unchanged - 0 fixed = 12 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
41s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m  6s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
25s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}151m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Possible null pointer dereference in 
org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(Configuration,
 List, DatanodeID) due to return value of called method  Dereferenced at 
ContainerManagerImpl.java:org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(Configuration,
 List, DatanodeID) due to return value of called method  Dereferenced at 
ContainerManagerImpl.java:[line 183] |
| Failed junit tests | hadoop.ozone.scm.node.TestPipelineRecovery |
|   | hadoop.ozone.scm.node.TestContainerPlacement |
|   | hadoop.ozone.scm.node.TestNodeManager |
|   | hadoop.ozone.container.common.TestDatanodeStateMachine |
|   | hadoop.hdfs.web.TestWebHDFS |
|   | hadoop.ozone.container.common.TestEndPoint |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-13134 |
| JIRA Patch URL | 

[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-12 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360977#comment-16360977
 ] 

Gabor Bota commented on HDFS-11187:
---

[~jojochuang] thanks for the review! 
I'd like to reflect on your second comment because I think the first one is for 
rev001. 

In FsDatasetImpl#finalizeReplica there's a call on 
FsVolumeImpl#addFinalizedBlock() in line 1734 on your trunk commit.
I moved the load checksum logic to the else branch, in 
FsDatasetImpl#finalizeReplica after the call is done to the unmodified 
FsVolumeImpl#addFinalizedBlock() for branch-2. I think this will solve the 
issue that you've pointed out. (it will be rev003).

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-12 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Attachment: HDFS-11187-branch-2.003.patch

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12998) SnapshotDiff - Provide an iterator-based listing API for calculating snapshotDiff

2018-02-12 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361066#comment-16361066
 ] 

Shashikant Banerjee commented on HDFS-12998:


Patch v1 implements a RemoteIterator class for fetching 
SnapshotDiffReportListingEntries on demand. 

> SnapshotDiff - Provide an iterator-based listing API for calculating 
> snapshotDiff
> -
>
> Key: HDFS-12998
> URL: https://issues.apache.org/jira/browse/HDFS-12998
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-12998.001.patch
>
>
> Currently , SnapshotDiff computation happens over multiple rpc calls to 
> namenode depending on the no of snapshotDiff entries where each rpc call 
> returns at max 1000 entries by default . Each "getSnapshotDiffreportListing" 
> call to namenode returns a partial snapshotDiffreportList which are all 
> combined and processed at the client side to generate a final 
> snapshotDiffreport. There can be cases where SnapshotDiffReport can be huge 
> and in situations as such , the  rpc calls to namnode should happen on demand 
> at the client side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-12 Thread Lokesh Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-13134:
---
Attachment: (was: HDFS-13134.001.patch)

> Ozone: Format open containers on datanode restart
> -
>
> Key: HDFS-13134
> URL: https://issues.apache.org/jira/browse/HDFS-13134
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>
> Once a datanode is restarted its open containers should be formatted. Only 
> the open containers whose pipeline has a replication factor of three will 
> need to be formatted. The format command is sent by SCM to the datanode after 
> the corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-12 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Status: Open  (was: Patch Available)

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-8693:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693-03-Addendum-branch-2.patch, 
> HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, HDFS-8693.03.patch, 
> HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361206#comment-16361206
 ] 

genericqa commented on HDFS-11187:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
12s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 22s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 0s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-hdfs:20 |
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestRollingUpgradeDowngrade |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.TestDFSRename |
| Timed out junit tests | org.apache.hadoop.hdfs.TestFileAppend |
|   | org.apache.hadoop.hdfs.TestSafeMode |
|   | org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage |
|   | org.apache.hadoop.hdfs.TestFileCorruption |
|   | org.apache.hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter |
|   | org.apache.hadoop.hdfs.TestDFSUpgrade |
|   | org.apache.hadoop.hdfs.web.TestWebHDFS |
|   | org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs |
|   | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr |
|   | org.apache.hadoop.hdfs.TestRenameWhileOpen |
|   | org.apache.hadoop.hdfs.TestPipelines |
|   | org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs |
|   | org.apache.hadoop.hdfs.TestFSOutputSummer |
|   | org.apache.hadoop.hdfs.web.TestWebHDFSForHA |
|   | org.apache.hadoop.hdfs.TestTrashWithEncryptionZones |
|   | org.apache.hadoop.hdfs.TestDFSShell |
|   | org.apache.hadoop.hdfs.TestReplaceDatanodeFailureReplication |
|   | org.apache.hadoop.hdfs.web.TestWebHDFSAcl |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:17213a0 |
| JIRA Issue | HDFS-11187 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910250/HDFS-11187-branch-2.003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7378b7513160 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 

[jira] [Updated] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-12 Thread Elek, Marton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-13108:

Attachment: HDFS-13108-HDFS-7240.002.patch

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch, 
> HDFS-13108-HDFS-7240.002.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-12 Thread Elek, Marton (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361204#comment-16361204
 ] 

Elek, Marton commented on HDFS-13108:
-

I uploaded the rebased version and retested with spark word count. 

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch, 
> HDFS-13108-HDFS-7240.002.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361133#comment-16361133
 ] 

Íñigo Goiri commented on HDFS-13119:


* Haven't been able to go too deep but it seems like killing the mini cluster 
messes with the other unit tests.
* I think we could also test {{renewLease()}} to see if it succeeds when just 
one nameservice is down; in addition, we may want to put metrics for the number 
of retries and check them in the unit test.
* One thing I was considering was being able to try at least once if we don't 
have an NN ACTIVE; something like setting {{retryCount}} to he max value -1.
* I think {{DFS_ROUTER_CLIENT_THREADS_SIZE_DEFAULT}} in {{DFSConfigKeys}} fits 
in one line.

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13119.001.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13106) Need to exercise all HDFS APIs for EC

2018-02-12 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-13106:
-
Fix Version/s: 3.1.0

> Need to exercise all HDFS APIs for EC
> -
>
> Key: HDFS-13106
> URL: https://issues.apache.org/jira/browse/HDFS-13106
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Haibo Yan
>Assignee: Haibo Yan
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HDFS-13106.001.patch, HDFS-13106.002.patch, 
> HDFS-13106.003.patch
>
>
> Exercise FileSystem API to make sure all APIs works as expected under Erasure 
> Coding feature enabled



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-12 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Status: Patch Available  (was: Open)

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12917) Fix description errors in testErasureCodingConf.xml

2018-02-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-12917:
---

Committer of this patch, please set the *Fix Version/s* for this JIRA.

> Fix description errors in testErasureCodingConf.xml
> ---
>
> Key: HDFS-12917
> URL: https://issues.apache.org/jira/browse/HDFS-12917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: chencan
>Assignee: chencan
>Priority: Major
> Attachments: HADOOP-12917.002.patch, HADOOP-12917.patch
>
>
> In testErasureCodingConf.xml,there are two case's description may be 
> "getPolicy : get EC policy information at specified path, whick have an EC 
> Policy".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13106) Need to exercise all HDFS APIs for EC

2018-02-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-13106:
---

Committer of this patch, please set the *Fix Version/s* for this JIRA.

> Need to exercise all HDFS APIs for EC
> -
>
> Key: HDFS-13106
> URL: https://issues.apache.org/jira/browse/HDFS-13106
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Haibo Yan
>Assignee: Haibo Yan
>Priority: Major
> Attachments: HDFS-13106.001.patch, HDFS-13106.002.patch, 
> HDFS-13106.003.patch
>
>
> Exercise FileSystem API to make sure all APIs works as expected under Erasure 
> Coding feature enabled



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361093#comment-16361093
 ] 

Brahma Reddy Battula commented on HDFS-8693:


Test failures are unrelated.

Committed to trunk,branch-3.1,branch-3.0,branch-2,branch-2.9 and branch-2.8.

Thanks [~vinayrpet] for quick review and [~lindongdong] thanks for reporting.

Uploaded committed branch-2 patch which have little conflict with trunk patch.

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693-03-Addendum-branch-2.patch, 
> HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, HDFS-8693.03.patch, 
> HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361091#comment-16361091
 ] 

Hudson commented on HDFS-8693:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13646 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13646/])
HDFS-8693. Addendum patch to execute the command using UGI. Contributed 
(brahma: rev 35c17351cab645dcc72e0d2ae1608507aa787ffb)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolManager.java


> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693-03-Addendum-branch-2.patch, 
> HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, HDFS-8693.03.patch, 
> HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-12 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-8693:
---
Attachment: HDFS-8693-03-Addendum-branch-2.patch

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693-03-Addendum-branch-2.patch, 
> HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, HDFS-8693.03.patch, 
> HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-12 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-13136:
-

 Summary: Avoid taking FSN lock while doing group member lookup for 
FSD permission check
 Key: HDFS-13136
 URL: https://issues.apache.org/jira/browse/HDFS-13136
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
take FSN lock first and then FSD lock.  The permission check is done via 
FSPermissionChecker at FSD layer assuming FSN lock is taken. 

The FSPermissionChecker constructor invokes callerUgi.getGroups() that can take 
seconds sometimes. There are external cache scheme such SSSD and internal cache 
scheme for group lookup. However, the delay could still occur during cache 
refresh, which causes severe FSN lock contentions and unresponsive namenode 
issues.

Checking the current code, we found that getBlockLocations(..) did it right but 
some methods such as getFileInfo(..), getContentSummary(..) did it wrong. This 
ticket is open to ensure the group lookup for permission checker is outside the 
FSN lock.  
 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13135) Lease not deleted when deleting INodeReference

2018-02-12 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361517#comment-16361517
 ] 

Kihwal Lee commented on HDFS-13135:
---

>From what I remember, a change (bug fix?) intentionally made a lease to be 
>removed from LeaseManager, but left it in the INode in the snapshot.  I argue 
>that it is not a correct design.  Some blocks can be left in 
>under-construction state forever without any block recovery. This can cause 
>data loss, since re-replication won't happen for those blocks.  When namenode 
>is restarted, all leases will be restored based on the files 
>under-construction section and also on INodeUnderConstructionFeature. 

> Lease not deleted when deleting INodeReference
> --
>
> Key: HDFS-13135
> URL: https://issues.apache.org/jira/browse/HDFS-13135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HDFS-13135.001.patch
>
>
> In troubleshooting an occurrence of HDFS-13115, it seemed that there was 
> another underlying root cause that should also be addressed. There was an 
> INodeReference that was deleted and the lease on it was not subsequently 
> deleted because it was never added to the reclaim context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13135) Lease not deleted when deleting INodeReference

2018-02-12 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361517#comment-16361517
 ] 

Kihwal Lee edited comment on HDFS-13135 at 2/12/18 10:06 PM:
-

>From what I remember, a change (bug fix?) intentionally made a lease to be 
>removed from LeaseManager, but left it in the INode in the snapshot.  I argue 
>that it is not a correct design.  Some blocks can be left in 
>under-construction state forever without any block recovery. This can cause 
>data loss, since re-replication won't happen for those blocks.  When namenode 
>is restarted, all leases will be restored based on the files 
>under-construction section and also on INodeUnderConstructionFeature. 

I am not a snapshot expert, so I can't say what is the correct fix.  I've seen 
conflicting requirements on under-construction blocks in a snapshot. IMO, if an 
under-construction block ends up only in a snapshot (current view deleted while 
being written), a block recovery should somehow be done.


was (Author: kihwal):
>From what I remember, a change (bug fix?) intentionally made a lease to be 
>removed from LeaseManager, but left it in the INode in the snapshot.  I argue 
>that it is not a correct design.  Some blocks can be left in 
>under-construction state forever without any block recovery. This can cause 
>data loss, since re-replication won't happen for those blocks.  When namenode 
>is restarted, all leases will be restored based on the files 
>under-construction section and also on INodeUnderConstructionFeature. 

I am not a snapshot expert, so I can't say what is the correct fix.  I've seen 
conflicting requirements on under-construction blocks in a snapshot. IMO, an 
under-construction block ends up only in a snapshot (current view deleted while 
being written), a block recovery should somehow be done.

> Lease not deleted when deleting INodeReference
> --
>
> Key: HDFS-13135
> URL: https://issues.apache.org/jira/browse/HDFS-13135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HDFS-13135.001.patch
>
>
> In troubleshooting an occurrence of HDFS-13115, it seemed that there was 
> another underlying root cause that should also be addressed. There was an 
> INodeReference that was deleted and the lease on it was not subsequently 
> deleted because it was never added to the reclaim context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13135) Lease not deleted when deleting INodeReference

2018-02-12 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HDFS-13135:
-
Attachment: HDFS-13135.001.patch

> Lease not deleted when deleting INodeReference
> --
>
> Key: HDFS-13135
> URL: https://issues.apache.org/jira/browse/HDFS-13135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HDFS-13135.001.patch
>
>
> In troubleshooting an occurrence of HDFS-13115, it seemed that there was 
> another underlying root cause that should also be addressed. There was an 
> INodeReference that was deleted and the lease on it was not subsequently 
> deleted because it was never added to the reclaim context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13135) Lease not deleted when deleting INodeReference

2018-02-12 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361517#comment-16361517
 ] 

Kihwal Lee edited comment on HDFS-13135 at 2/12/18 10:06 PM:
-

>From what I remember, a change (bug fix?) intentionally made a lease to be 
>removed from LeaseManager, but left it in the INode in the snapshot.  I argue 
>that it is not a correct design.  Some blocks can be left in 
>under-construction state forever without any block recovery. This can cause 
>data loss, since re-replication won't happen for those blocks.  When namenode 
>is restarted, all leases will be restored based on the files 
>under-construction section and also on INodeUnderConstructionFeature. 

I am not a snapshot expert, so I can't say what is the correct fix.  I've seen 
conflicting requirements on under-construction blocks in a snapshot. IMO, an 
under-construction block ends up only in a snapshot (current view deleted while 
being written), a block recovery should somehow be done.


was (Author: kihwal):
>From what I remember, a change (bug fix?) intentionally made a lease to be 
>removed from LeaseManager, but left it in the INode in the snapshot.  I argue 
>that it is not a correct design.  Some blocks can be left in 
>under-construction state forever without any block recovery. This can cause 
>data loss, since re-replication won't happen for those blocks.  When namenode 
>is restarted, all leases will be restored based on the files 
>under-construction section and also on INodeUnderConstructionFeature. 

> Lease not deleted when deleting INodeReference
> --
>
> Key: HDFS-13135
> URL: https://issues.apache.org/jira/browse/HDFS-13135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HDFS-13135.001.patch
>
>
> In troubleshooting an occurrence of HDFS-13115, it seemed that there was 
> another underlying root cause that should also be addressed. There was an 
> INodeReference that was deleted and the lease on it was not subsequently 
> deleted because it was never added to the reclaim context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13055) Aggregate usage statistics from datanodes

2018-02-12 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361388#comment-16361388
 ] 

Arpit Agarwal commented on HDFS-13055:
--

Hi [~ajayydv], since this is a large change would you consider creating an 
[Apache Reviewboard|https://reviews.apache.org/r/] request?

> Aggregate usage statistics from datanodes
> -
>
> Key: HDFS-13055
> URL: https://issues.apache.org/jira/browse/HDFS-13055
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDFS-13055.001.patch, HDFS-13055.002.patch
>
>
> We collect variety of statistics in DataNodes and expose them via JMX. 
> Aggregating some of the high level statistics which we are already collecting 
> in {{DataNodeMetrics}} (like bytesRead,bytesWritten etc) over a configurable 
> time window will create a central repository accessible via JMX and UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13135) Lease not deleted when deleting INodeReference

2018-02-12 Thread Sean Mackrory (JIRA)
Sean Mackrory created HDFS-13135:


 Summary: Lease not deleted when deleting INodeReference
 Key: HDFS-13135
 URL: https://issues.apache.org/jira/browse/HDFS-13135
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sean Mackrory
Assignee: Sean Mackrory


In troubleshooting an occurrence of HDFS-13115, it seemed that there was 
another underlying root cause that should also be addressed. There was an 
INodeReference that was deleted and the lease on it was not subsequently 
deleted because it was never added to the reclaim context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13135) Lease not deleted when deleting INodeReference

2018-02-12 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13135:
-
Status: Patch Available  (was: Open)

> Lease not deleted when deleting INodeReference
> --
>
> Key: HDFS-13135
> URL: https://issues.apache.org/jira/browse/HDFS-13135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HDFS-13135.001.patch
>
>
> In troubleshooting an occurrence of HDFS-13115, it seemed that there was 
> another underlying root cause that should also be addressed. There was an 
> INodeReference that was deleted and the lease on it was not subsequently 
> deleted because it was never added to the reclaim context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13135) Lease not deleted when deleting INodeReference

2018-02-12 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361465#comment-16361465
 ] 

Sean Mackrory commented on HDFS-13135:
--

So this test more or less reproduces what I was seeing. I'm still trying to get 
more info about the workload that did this, because it seems insane, but the 
same RPC client ID was appending to a file, and then deleting it, and upon 
starting back up we got a NullPointerException because there was a lease for an 
inode that didn't exist anymore.

I'm uncertain about whether or not the fix is correct here: a lot of the code 
this is calling is completely new to me, so it's entirely possible there are 
side effects I haven't considered (like whether or not this causes data that 
should not be cleaned up because it's needed for the s0 snapshot to get 
deleted).

> Lease not deleted when deleting INodeReference
> --
>
> Key: HDFS-13135
> URL: https://issues.apache.org/jira/browse/HDFS-13135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HDFS-13135.001.patch
>
>
> In troubleshooting an occurrence of HDFS-13115, it seemed that there was 
> another underlying root cause that should also be addressed. There was an 
> INodeReference that was deleted and the lease on it was not subsequently 
> deleted because it was never added to the reclaim context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13133) Ozone: OzoneFileSystem: Calling delete with non-existing path shouldn't be logged on ERROR level

2018-02-12 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13133:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the contribution [~elek], I have committed this to the feature 
branch.

> Ozone: OzoneFileSystem: Calling delete with non-existing path shouldn't be 
> logged on ERROR level
> 
>
> Key: HDFS-13133
> URL: https://issues.apache.org/jira/browse/HDFS-13133
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13133-HDFS-7240.001.patch
>
>
> During the test of OzoneFileSystem with spark I noticed ERROR messages 
> multiple times:
> Something like this:
> {code}
> 2018-02-11 15:54:54 ERROR OzoneFileSystem:409 - Couldn't delete 
> o3://bucket1.test/user/hadoop/.sparkStaging/application_1518349702045_0008 - 
> does not exist
> {code}
> I checked the other implemetations, and they use DEBUG level. I think it's 
> expected that the path sometimes points to a non-existing dir/file.
> To be consistent with the other implemetation I propose to lower the log 
> level to debug.
> Examples from other file systems:
> S3AFileSystem:
> {code}
> } catch (FileNotFoundException e) {
>   LOG.debug("Couldn't delete {} - does not exist", f);
>   instrumentation.errorIgnored();
>   return false;
> } catch (AmazonClientException e) {
>   throw translateException("delete", f, e);
> }
> {code}
> Alyun:
> {code}
>try {
>   return innerDelete(getFileStatus(path), recursive);
> } catch (FileNotFoundException e) {
>   LOG.debug("Couldn't delete {} - does not exist", path);
>   return false;
> }
> {code}
> SFTP:
> {code}
>} catch (FileNotFoundException e) {
>   // file not found, no need to delete, return true
>   return false;
> }
> {code}
> SwiftNativeFileSystem:
> {code}
> try {
>   return store.delete(path, recursive);
> } catch (FileNotFoundException e) {
>   //base path was not found.
>   return false;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361766#comment-16361766
 ] 

genericqa commented on HDFS-13136:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 252 unchanged - 1 fixed = 252 total (was 253) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 12s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}180m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.server.namenode.TestAuditLogger |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.namenode.TestAuditLoggerWithCommands |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13136 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910302/HDFS-13136.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux df32cca368ce 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5a1db60 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23039/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23039/testReport/ |
| Max. 

[jira] [Commented] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361807#comment-16361807
 ] 

genericqa commented on HDFS-1686:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 14 new + 20 unchanged - 6 fixed = 34 total (was 26) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 18s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}137m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-1686 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910309/HDFS-1686.00.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cf61c4ea1920 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5a1db60 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23040/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23040/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23040/testReport/ |
| Max. process+thread count | 4180 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Updated] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-02-12 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-1686:
-
Attachment: HDFS-1686.00.patch

> Federation: Add more Balancer tests with federation setting
> ---
>
> Key: HDFS-1686
> URL: https://issues.apache.org/jira/browse/HDFS-1686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer  mover, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: 4358946.patch, HDFS-1686.00.patch, h1686_20110303.patch
>
>
> A test with 3 Namenodes and 4 Datanodes in startup, and then adding 2 new 
> Datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-02-12 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-1686:
-
Attachment: (was: HDFS-1686.00.patch)

> Federation: Add more Balancer tests with federation setting
> ---
>
> Key: HDFS-1686
> URL: https://issues.apache.org/jira/browse/HDFS-1686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer  mover, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: 4358946.patch, h1686_20110303.patch
>
>
> A test with 3 Namenodes and 4 Datanodes in startup, and then adding 2 new 
> Datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-02-12 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-1686:
-
Status: Patch Available  (was: Open)

> Federation: Add more Balancer tests with federation setting
> ---
>
> Key: HDFS-1686
> URL: https://issues.apache.org/jira/browse/HDFS-1686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer  mover, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: 4358946.patch, HDFS-1686.00.patch, h1686_20110303.patch
>
>
> A test with 3 Namenodes and 4 Datanodes in startup, and then adding 2 new 
> Datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-02-12 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-1686:
-
Status: In Progress  (was: Patch Available)

> Federation: Add more Balancer tests with federation setting
> ---
>
> Key: HDFS-1686
> URL: https://issues.apache.org/jira/browse/HDFS-1686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer  mover, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: 4358946.patch, h1686_20110303.patch
>
>
> A test with 3 Namenodes and 4 Datanodes in startup, and then adding 2 new 
> Datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-02-12 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-1686:
-
Attachment: HDFS-1686.00.patch

> Federation: Add more Balancer tests with federation setting
> ---
>
> Key: HDFS-1686
> URL: https://issues.apache.org/jira/browse/HDFS-1686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer  mover, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: 4358946.patch, HDFS-1686.00.patch, h1686_20110303.patch
>
>
> A test with 3 Namenodes and 4 Datanodes in startup, and then adding 2 new 
> Datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-02-12 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-1686:
-
Status: Patch Available  (was: In Progress)

> Federation: Add more Balancer tests with federation setting
> ---
>
> Key: HDFS-1686
> URL: https://issues.apache.org/jira/browse/HDFS-1686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer  mover, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Bharat Viswanadham
>Priority: Minor
> Attachments: 4358946.patch, HDFS-1686.00.patch, h1686_20110303.patch
>
>
> A test with 3 Namenodes and 4 Datanodes in startup, and then adding 2 new 
> Datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-02-12 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham reassigned HDFS-1686:


Assignee: Bharat Viswanadham  (was: Tsz Wo Nicholas Sze)

> Federation: Add more Balancer tests with federation setting
> ---
>
> Key: HDFS-1686
> URL: https://issues.apache.org/jira/browse/HDFS-1686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer  mover, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Bharat Viswanadham
>Priority: Minor
> Attachments: 4358946.patch, HDFS-1686.00.patch, h1686_20110303.patch
>
>
> A test with 3 Namenodes and 4 Datanodes in startup, and then adding 2 new 
> Datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-02-12 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361709#comment-16361709
 ] 

Bharat Viswanadham commented on HDFS-1686:
--

Hi [~szetszwo]

I have assigned this Jira to myself, as for 2nd time uploading the patch, not 
able to change the status of Jira to PATCH Available.

I have taken your patch, and updated to add few more test cases.

 

> Federation: Add more Balancer tests with federation setting
> ---
>
> Key: HDFS-1686
> URL: https://issues.apache.org/jira/browse/HDFS-1686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer  mover, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Bharat Viswanadham
>Priority: Minor
> Attachments: 4358946.patch, HDFS-1686.00.patch, h1686_20110303.patch
>
>
> A test with 3 Namenodes and 4 Datanodes in startup, and then adding 2 new 
> Datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-1686) Federation: Add more Balancer tests with federation setting

2018-02-12 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361709#comment-16361709
 ] 

Bharat Viswanadham edited comment on HDFS-1686 at 2/13/18 2:02 AM:
---

Hi [~szetszwo]

I have assigned this Jira to myself, as after uploading the patch 2nd time(to 
correct few errors), not able to change the status of Jira to PATCH Available.

I have taken your patch, and updated to add few more test cases.

 


was (Author: bharatviswa):
Hi [~szetszwo]

I have assigned this Jira to myself, as for 2nd time uploading the patch, not 
able to change the status of Jira to PATCH Available.

I have taken your patch, and updated to add few more test cases.

 

> Federation: Add more Balancer tests with federation setting
> ---
>
> Key: HDFS-1686
> URL: https://issues.apache.org/jira/browse/HDFS-1686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer  mover, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Bharat Viswanadham
>Priority: Minor
> Attachments: 4358946.patch, HDFS-1686.00.patch, h1686_20110303.patch
>
>
> A test with 3 Namenodes and 4 Datanodes in startup, and then adding 2 new 
> Datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-12 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13136:
--
Attachment: HDFS-13136.001.patch

> Avoid taking FSN lock while doing group member lookup for FSD permission check
> --
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13136.001.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-12 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361633#comment-16361633
 ] 

Xiaoyu Yao commented on HDFS-13136:
---

Attach an initial patch to move the getPermissionChecker() out of FSN lock. 
Thanks for the offline discussion with [~szetszwo].

This patch also removes the repeated group lookup from recursive calls such as 
FSDirStatAndListingOp#getContentSummaryInt(), which will help to improve NN 
performance.

> Avoid taking FSN lock while doing group member lookup for FSD permission check
> --
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13136.001.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-12 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13136:
--
Status: Patch Available  (was: Open)

> Avoid taking FSN lock while doing group member lookup for FSD permission check
> --
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13136.001.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13135) Lease not deleted when deleting INodeReference

2018-02-12 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361657#comment-16361657
 ] 

genericqa commented on HDFS-13135:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 54s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}128m 52s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}185m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap |
|   | hadoop.hdfs.server.namenode.TestQuotaByStorageType |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.namenode.TestSnapshotPathINodes |
|   | hadoop.hdfs.TestErasureCodingPolicyWithSnapshot |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.TestErasureCodingPolicyWithSnapshotWithRandomECPolicy |
|   | hadoop.hdfs.server.namenode.snapshot.TestAclWithSnapshot |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.server.namenode.TestFSImageWithSnapshot |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13135 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910280/HDFS-13135.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux eb492af2bd36 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 

[jira] [Updated] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-12 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13119:
-
Status: Patch Available  (was: Open)

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13119.001.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-12 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360413#comment-16360413
 ] 

Yiqun Lin commented on HDFS-13119:
--

Comments make sense to me. Thanks [~elgoiri].
Attach the initial patch.

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13119.001.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-12 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13119:
-
Attachment: HDFS-13119.001.patch

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13119.001.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13135) Lease not deleted when deleting INodeReference

2018-02-12 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361913#comment-16361913
 ] 

Xiao Chen commented on HDFS-13135:
--

Thanks for reporting this and providing a patch Sean! Appreciate Kihwal's 
comment too.

I tried to play with the code, but not able to let the test fail, with or 
without the fix... Does it fail for you locally?

Looking at the test case, I think this is very similar to HDFS-12369. The unit 
test of that one 
({{TestDeleteRace#testDeleteAndLeaseRecoveryHardLimitSnapshot}}) was basically 
the same: keep a file unclosed, then delete it. Only difference at that time is 
(IIRC) a hard lease expiration was necessary to trigger the edits corruption. 
Not sure if this is the bug fix Kihwal was mentioning hopefully not. But if 
that is, and we want to do a different fix, I'm open to it. Suffice it to say, 
I should quote the famous line - I'm not a snapshot expert. :)

> Lease not deleted when deleting INodeReference
> --
>
> Key: HDFS-13135
> URL: https://issues.apache.org/jira/browse/HDFS-13135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HDFS-13135.001.patch
>
>
> In troubleshooting an occurrence of HDFS-13115, it seemed that there was 
> another underlying root cause that should also be addressed. There was an 
> INodeReference that was deleted and the lease on it was not subsequently 
> deleted because it was never added to the reclaim context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-12 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361903#comment-16361903
 ] 

Xiao Chen commented on HDFS-11187:
--

Thanks [~gabor.bota] for revving and [~jojochuang] for review.

Mostly LGTM, just 1 thing looking at the patch:
Could you please handle append too? In the trunk patch, 
{{loadLastPartialChunkChecksum}} in append is fixed too, by setting from the 
finalized replica directly. We should fix the same in {{FsDatasetImpl#append}} 
for branch-2.

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13138) webhdfs of federated namenode does not work properly

2018-02-12 Thread KWON BYUNGCHANG (JIRA)
KWON BYUNGCHANG created HDFS-13138:
--

 Summary: webhdfs of federated namenode does not  work properly
 Key: HDFS-13138
 URL: https://issues.apache.org/jira/browse/HDFS-13138
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 3.0.0, 2.7.1
Reporter: KWON BYUNGCHANG


my cluster has multiple namenodes using HDFS Federation.

webhdfs that is not defaultFS does not work properly.

when I uploaded to non defaultFS namenode  using webhdfs.

uploaded file was founded at defaultFS namenode.

 

I think root cause is that

  clientNamenodeAddress of non defaultFS namenode is always fs.defaultFS.

 
[https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L462]

 
{code:java}
/**
   * Set the namenode address that will be used by clients to access this
   * namenode or name service. This needs to be called before the config
   * is overriden.
   */
  public void setClientNamenodeAddress(Configuration conf) {
String nnAddr = conf.get(FS_DEFAULT_NAME_KEY);
if (nnAddr == null) {
  // default fs is not set.
  clientNamenodeAddress = null;
  return;
}

LOG.info("{} is {}", FS_DEFAULT_NAME_KEY, nnAddr);
URI nnUri = URI.create(nnAddr);

String nnHost = nnUri.getHost();
if (nnHost == null) {
  clientNamenodeAddress = null;
  return;
}

if (DFSUtilClient.getNameServiceIds(conf).contains(nnHost)) {
  // host name is logical
  clientNamenodeAddress = nnHost;
} else if (nnUri.getPort() > 0) {
  // physical address with a valid port
  clientNamenodeAddress = nnUri.getAuthority();
} else {
  // the port is missing or 0. Figure out real bind address later.
  clientNamenodeAddress = null;
  return;
}
LOG.info("Clients are to use {} to access"
+ " this namenode/service.", clientNamenodeAddress );
  }

{code}
 

so webhdfs is redirected to datanode having wrong namenoderpcaddress parameter

finally file was located namenode of fs,defaultFS

 

workaround is

  configure fs.defaultFS of each namenode to its own nameservice.  

e.g.

  hdfs://ns1  has fs.defaultFS=hdfs://ns1

  hdfs://ns2  has fs.defaultFS=hdfs://ns2

  

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13137) Ozone: Ozonefs read fails because ChunkGroupInputStream#read does not iterate through all the blocks in the key

2018-02-12 Thread Mukul Kumar Singh (JIRA)
Mukul Kumar Singh created HDFS-13137:


 Summary: Ozone: Ozonefs read fails because 
ChunkGroupInputStream#read does not iterate through all the blocks in the key
 Key: HDFS-13137
 URL: https://issues.apache.org/jira/browse/HDFS-13137
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh
 Fix For: HDFS-7240


OzoneFilesystem put is failing with the following exception. This happens 
because ChunkGroupInputStream#read does not iterate through all the blocks in 
the key.

{code}
[hdfs@y129 ~]$ time /opt/hadoop/hadoop-3.1.0-SNAPSHOT/bin/hdfs dfs -put test3 
/test3a
18/02/12 13:36:21 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
2018-02-12 13:36:22,211 [main] INFO   - Using 
org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.

18/02/12 13:37:25 WARN util.ShutdownHookManager: ShutdownHook 'ClientFinalizer' 
timeout, java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:68)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-12 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Attachment: HDFS-13110-HDFS-10285-04.patch

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch, 
> HDFS-13110-HDFS-10285-03.patch, HDFS-13110-HDFS-10285-04.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13040) Kerberized inotify client fails despite kinit properly

2018-02-12 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13040:
-
Attachment: HDFS-13040.half.test.patch

> Kerberized inotify client fails despite kinit properly
> --
>
> Key: HDFS-13040
> URL: https://issues.apache.org/jira/browse/HDFS-13040
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, 
> HDFS-13040.half.test.patch, TestDFSInotifyEventInputStreamKerberized.java, 
> TransactionReader.java
>
>
> This issue is similar to HDFS-10799.
> HDFS-10799 turned out to be a client side issue where client is responsible 
> for renewing kerberos ticket actively.
> However we found in a slightly setup even if client has valid Kerberos 
> credentials, inotify still fails.
> Suppose client uses principal h...@example.com, 
>  namenode 1 uses server principal hdfs/nn1.example@example.com
>  namenode 2 uses server principal hdfs/nn2.example@example.com
> *After Namenodes starts for longer than kerberos ticket lifetime*, the client 
> fails with the following error:
> {noformat}
> 18/01/19 11:23:02 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) 
> cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We 
> encountered an error reading 
> https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3,
>  
> https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 8683, but we thought we could read up to transaction 
> 8684.  If you continue, metadata will be lost forever!
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
> {noformat}
> Typically if NameNode has an expired Kerberos ticket, the error handling for 
> the typical edit log tailing would let NameNode to relogin with its own 
> Kerberos principal. However, when inotify uses the same code path to retrieve 
> edits, since the current user is the inotify client's principal, unless 
> client uses the same principal as the NameNode, NameNode can't do it on 
> behalf of the client.
> Therefore, a more appropriate approach is to use proxy user so that NameNode 
> can retrieving edits on behalf of the client.
> I will attach a patch to fix it. This patch has been verified to work for a 
> CDH5.10.2 cluster, however it seems impossible to craft a unit test for this 
> fix because the way Hadoop UGI handles Kerberos credentials (I can't have a 
> single process that logins as two Kerberos principals simultaneously and let 
> them establish connection)
> A possible workaround is for the inotify client to use the active NameNode's 
> server principal. However, that's not going to work when there's a namenode 
> failover, because then the client's principal will not be consistent with the 
> active NN's one, and then fails to 

[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly

2018-02-12 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361857#comment-16361857
 ] 

Xiao Chen commented on HDFS-13040:
--

Worked on the test based on [~pifta]'s original. 

I think one thing that worths clarifying with Istvan is that, in the test the 
UGI is just the setting up the client, for doing an NN RPC. So if I understand 
correctly, what you observed from tests are all RPC behavior. The bug here is 
after the client authenticates with NN, when the NN is doing its part to get 
the edits see below

After some debugging, it seems this needs extra efforts to make a real test 
that lets NN read the edits via {{EditLogFileInputStream$URLLog}}. Currently 
{{DFSInotifyEventInputStream#poll}} would end up making the NN go the 
{{EditLogFileInputStream$FileLog}}, hence not triggering the authentication 
path reported. On the other hand, we can really unit test the stream, since we 
do need the context of a NN RPC, and the context of NN loginuser. Will continue 
working on this when I can find cycles

[~daryn] / [~kihwal],
Does the fix itself make sense to you?

> Kerberized inotify client fails despite kinit properly
> --
>
> Key: HDFS-13040
> URL: https://issues.apache.org/jira/browse/HDFS-13040
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, 
> HDFS-13040.half.test.patch, TestDFSInotifyEventInputStreamKerberized.java, 
> TransactionReader.java
>
>
> This issue is similar to HDFS-10799.
> HDFS-10799 turned out to be a client side issue where client is responsible 
> for renewing kerberos ticket actively.
> However we found in a slightly setup even if client has valid Kerberos 
> credentials, inotify still fails.
> Suppose client uses principal h...@example.com, 
>  namenode 1 uses server principal hdfs/nn1.example@example.com
>  namenode 2 uses server principal hdfs/nn2.example@example.com
> *After Namenodes starts for longer than kerberos ticket lifetime*, the client 
> fails with the following error:
> {noformat}
> 18/01/19 11:23:02 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) 
> cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We 
> encountered an error reading 
> https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3,
>  
> https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 8683, but we thought we could read up to transaction 
> 8684.  If you continue, metadata will be lost forever!
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
> {noformat}
> Typically if NameNode has an expired Kerberos ticket, the error handling for 
> the typical edit log tailing would let NameNode to relogin with its own 
> Kerberos principal. However, when inotify uses the