[jira] [Updated] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.

2016-05-28 Thread He Xiaoqiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-10453:
---
Affects Version/s: 2.7.1

> ReplicationMonitor thread could stuck for long time due to the race between 
> replication and delete of same file in a large cluster.
> ---
>
> Key: HDFS-10453
> URL: https://issues.apache.org/jira/browse/HDFS-10453
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: He Xiaoqiao
> Attachments: HDFS-10453-branch-2.001.patch, HDFS-10453.001.patch
>
>
> ReplicationMonitor thread could stuck for long time and loss data with little 
> probability. Consider the typical scenarioļ¼š
> (1) create and close a file with the default replicas(3);
> (2) increase replication (to 10) of the file.
> (3) delete the file while ReplicationMonitor is scheduling blocks belong to 
> that file for replications.
> if ReplicationMonitor stuck reappeared, NameNode will print log as:
> {code:xml}
> 2016-04-19 10:20:48,083 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> ..
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
> replicas: expected size is 7 but only 0 storage types can be selected 
> (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, 
> DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) All required storage types are unavailable:  
> unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) 
> process same block at the same moment.
> (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to 
> replicate and leave the global lock.
> (2) FSNamesystem#delete invoked to delete blocks then clear the reference in 
> blocksmap, needReplications, etc. the block's NumBytes will set 
> NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does 
> not need explicit ACK from the node. 
> (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to 
> chooseTargets for the same blocks and no node will be selected after traverse 
> whole cluster because  no node choice satisfy the goodness criteria 
> (remaining spaces achieve required size Long.MAX_VALUE). 
> During of stage#3 ReplicationMonitor stuck for long time, especial in a large 
> cluster. invalidateBlocks & neededReplications continues to grow and no 
> consumes. it will loss data at the worst.
> This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block 
> and remove it from neededReplications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10433) Make retry also works well for Async DFS

2016-05-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305736#comment-15305736
 ] 

Hadoop QA commented on HDFS-10433:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 55s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 36s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 22s 
{color} | {color:red} root: The patch generated 44 new + 224 unchanged - 6 
fixed = 268 total (was 230) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 41s 
{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s 
{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 47s {color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 50s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.TestAsyncHDFSWithHA |
|   | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:2c91fd8 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12806837/h10433_20160528b.patch
 |
| JIRA Issue | HDFS-10433 |
| Optional Tests |  asflicense  xml  compile  javac  javadoc  mvninstall  
mvnsite  unit  findbugs  checkstyle  |
| uname | Linux b56bdee95ca5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / f5ff05c |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 

[jira] [Commented] (HDFS-10467) Router-based HDFS federation

2016-05-28 Thread He Tianyi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305716#comment-15305716
 ] 

He Tianyi commented on HDFS-10467:
--

Thanks for sharing.

I've implemented similar approach as a separated project, see 
https://github.com/bytedance/nnproxy.
I am currently using it for backing 2 namenodes with a mount table with 20+ 
entries in production and worked well. (about 12K TPS)

Looks like HDFS Router Federation includes more features. Shall we work 
together?

> Router-based HDFS federation
> 
>
> Key: HDFS-10467
> URL: https://issues.apache.org/jira/browse/HDFS-10467
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.7.2
>Reporter: Inigo Goiri
> Attachments: HDFS Router Federation.pdf
>
>
> Add a Router to provide a federated view of multiple HDFS clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10433) Make retry also works well for Async DFS

2016-05-28 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-10433:
---
Attachment: h10433_20160528c.patch

h10433_20160528c.patch: fixes test failure.

> Make retry also works well for Async DFS
> 
>
> Key: HDFS-10433
> URL: https://issues.apache.org/jira/browse/HDFS-10433
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Xiaobing Zhou
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h10433_20160524.patch, h10433_20160525.patch, 
> h10433_20160525b.patch, h10433_20160527.patch, h10433_20160528.patch, 
> h10433_20160528b.patch, h10433_20160528c.patch
>
>
> In current Async DFS implementation, file system calls are invoked and 
> returns Future immediately to clients. Clients call Future#get to retrieve 
> final results. Future#get internally invokes a chain of callbacks residing in 
> ClientNamenodeProtocolTranslatorPB, ProtobufRpcEngine and ipc.Client. The 
> callback path bypasses the original retry layer/logic designed for 
> synchronous DFS. This proposes refactoring to make retry also works for Async 
> DFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10433) Make retry also works well for Async DFS

2016-05-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305707#comment-15305707
 ] 

Hadoop QA commented on HDFS-10433:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 14s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 29s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 15s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 22s 
{color} | {color:red} root: The patch generated 44 new + 224 unchanged - 6 
fixed = 268 total (was 230) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 28s 
{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 8s 
{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 30s {color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 37s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestPersistBlocks |
|   | hadoop.hdfs.TestAsyncHDFSWithHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:2c91fd8 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12806837/h10433_20160528b.patch
 |
| JIRA Issue | HDFS-10433 |
| Optional Tests |  asflicense  xml  compile  javac  javadoc  mvninstall  
mvnsite  unit  findbugs  checkstyle  |
| uname | Linux 990848a97abf 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / f5ff05c |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 

[jira] [Updated] (HDFS-10433) Make retry also works well for Async DFS

2016-05-28 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-10433:
---
Attachment: h10433_20160528b.patch

h10433_20160528b.patch: makes testAsyncWithoutRetry to be run in both HA and 
non-HA cases.

> Make retry also works well for Async DFS
> 
>
> Key: HDFS-10433
> URL: https://issues.apache.org/jira/browse/HDFS-10433
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Xiaobing Zhou
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h10433_20160524.patch, h10433_20160525.patch, 
> h10433_20160525b.patch, h10433_20160527.patch, h10433_20160528.patch, 
> h10433_20160528b.patch
>
>
> In current Async DFS implementation, file system calls are invoked and 
> returns Future immediately to clients. Clients call Future#get to retrieve 
> final results. Future#get internally invokes a chain of callbacks residing in 
> ClientNamenodeProtocolTranslatorPB, ProtobufRpcEngine and ipc.Client. The 
> callback path bypasses the original retry layer/logic designed for 
> synchronous DFS. This proposes refactoring to make retry also works for Async 
> DFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10433) Make retry also works well for Async DFS

2016-05-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305650#comment-15305650
 ] 

Hadoop QA commented on HDFS-10433:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 39s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 13s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 11s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 20s 
{color} | {color:red} root: The patch generated 46 new + 224 unchanged - 6 
fixed = 270 total (was 230) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 49s 
{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s 
{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 14s {color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 132m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestErasureCodeBenchmarkThroughput |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:2c91fd8 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12806827/h10433_20160528.patch 
|
| JIRA Issue | HDFS-10433 |
| Optional Tests |  asflicense  xml  compile  javac  javadoc  mvninstall  
mvnsite  unit  findbugs  checkstyle  |
| uname | Linux ebb0db7a1a3a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / f5ff05c |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15598/artifact/patchprocess/diff-checkstyle-root.txt
 |
| 

[jira] [Updated] (HDFS-10433) Make retry also works well for Async DFS

2016-05-28 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-10433:
---
Attachment: h10433_20160528.patch

h10433_20160528.patch: sync'ed with trunk.

> Make retry also works well for Async DFS
> 
>
> Key: HDFS-10433
> URL: https://issues.apache.org/jira/browse/HDFS-10433
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Xiaobing Zhou
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h10433_20160524.patch, h10433_20160525.patch, 
> h10433_20160525b.patch, h10433_20160527.patch, h10433_20160528.patch
>
>
> In current Async DFS implementation, file system calls are invoked and 
> returns Future immediately to clients. Clients call Future#get to retrieve 
> final results. Future#get internally invokes a chain of callbacks residing in 
> ClientNamenodeProtocolTranslatorPB, ProtobufRpcEngine and ipc.Client. The 
> callback path bypasses the original retry layer/logic designed for 
> synchronous DFS. This proposes refactoring to make retry also works for Async 
> DFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10426) TestPendingInvalidateBlock failed in trunk

2016-05-28 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305451#comment-15305451
 ] 

Masatake Iwasaki commented on HDFS-10426:
-

I found another intermittent failures in {{TestPendingInvalidateBlock}}. It 
would be nice if those are fixed too here.

{noformat}
Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 31.54 sec <<< 
FAILURE! - in 
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
testPendingDeletion(org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock)
  Time elapsed: 12.213 sec  <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeletion(TestPendingInvalidateBlock.java:100)
{noformat}

{noformat}
testPendingDeleteUnknownBlocks(org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock)
  Time elapsed: 18.993 sec  <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeleteUnknownBlocks(TestPendingInvalidateBlock.java:177)
{noformat}


> TestPendingInvalidateBlock failed in trunk
> --
>
> Key: HDFS-10426
> URL: https://issues.apache.org/jira/browse/HDFS-10426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10426.001.patch, HDFS-10426.002.patch
>
>
> The test {{TestPendingInvalidateBlock}} failed sometimes. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
> testPendingDeletion(org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock)
>   Time elapsed: 7.703 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeletion(TestPendingInvalidateBlock.java:92)
> {code}
> It looks that the {{invalidateBlock}} has been removed before we do the check
> {code}
> // restart NN
> cluster.restartNameNode(true);
> dfs.delete(foo, true);
> Assert.assertEquals(0, cluster.getNamesystem().getBlocksTotal());
> Assert.assertEquals(REPLICATION, cluster.getNamesystem()
> .getPendingDeletionBlocks());
> Assert.assertEquals(REPLICATION,
> dfs.getPendingDeletionBlocksCount());
> {code}
> And I look into the related configurations. I found the property 
> {{dfs.namenode.replication.interval}} was just set as 1 second in this test. 
> And after the delay time of {{dfs.namenode.startup.delay.block.deletion.sec}} 
> and the delete operation was slowly, it will cause this case. We can see the 
> stack info before, the failed test costs 7.7s more than 5+1 second.
> One way can improve this.
> * Increase the time of {{dfs.namenode.replication.interval}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10426) TestPendingInvalidateBlock failed in trunk

2016-05-28 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305449#comment-15305449
 ] 

Masatake Iwasaki commented on HDFS-10426:
-

bq. And I have tested the way as you comment. But I found it still not affect 
the cluster.getNamesystem().getPendingDeletionBlocks(), and the 
invalidateBlocks will still be removed.

Sorry, my assumption was wrong.

{code}
dfs.delete(foo, true);
Assert.assertEquals(0, cluster.getNamesystem().getBlocksTotal());
Assert.assertEquals(REPLICATION, cluster.getNamesystem()
.getPendingDeletionBlocks());
{code}

I could see 2 causes for getting {{AssertionError: expected:<2> but was:<1>}} 
on the above.

# ReplicationMonitor kicks in between {{delete}} and checking pending deletion 
blocks count.
# {{delete}} is called before the NN processes block report from 1 of the DNs.

Your 002 patch is addressing #1 but I think it would be better to avoid adding 
code for testing only to BlockManager.

In order to fix #2, we should wait for replication count of the block reaches 
to 2 before {{delete}}.


> TestPendingInvalidateBlock failed in trunk
> --
>
> Key: HDFS-10426
> URL: https://issues.apache.org/jira/browse/HDFS-10426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10426.001.patch, HDFS-10426.002.patch
>
>
> The test {{TestPendingInvalidateBlock}} failed sometimes. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
> testPendingDeletion(org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock)
>   Time elapsed: 7.703 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeletion(TestPendingInvalidateBlock.java:92)
> {code}
> It looks that the {{invalidateBlock}} has been removed before we do the check
> {code}
> // restart NN
> cluster.restartNameNode(true);
> dfs.delete(foo, true);
> Assert.assertEquals(0, cluster.getNamesystem().getBlocksTotal());
> Assert.assertEquals(REPLICATION, cluster.getNamesystem()
> .getPendingDeletionBlocks());
> Assert.assertEquals(REPLICATION,
> dfs.getPendingDeletionBlocksCount());
> {code}
> And I look into the related configurations. I found the property 
> {{dfs.namenode.replication.interval}} was just set as 1 second in this test. 
> And after the delay time of {{dfs.namenode.startup.delay.block.deletion.sec}} 
> and the delete operation was slowly, it will cause this case. We can see the 
> stack info before, the failed test costs 7.7s more than 5+1 second.
> One way can improve this.
> * Increase the time of {{dfs.namenode.replication.interval}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.

2016-05-28 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305222#comment-15305222
 ] 

He Xiaoqiao commented on HDFS-10453:


hi [~shahrs87], thanks a lot for your watching this issue. It would be helpful 
with ur reviews and more suggestions.

> ReplicationMonitor thread could stuck for long time due to the race between 
> replication and delete of same file in a large cluster.
> ---
>
> Key: HDFS-10453
> URL: https://issues.apache.org/jira/browse/HDFS-10453
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: He Xiaoqiao
> Attachments: HDFS-10453-branch-2.001.patch, HDFS-10453.001.patch
>
>
> ReplicationMonitor thread could stuck for long time and loss data with little 
> probability. Consider the typical scenarioļ¼š
> (1) create and close a file with the default replicas(3);
> (2) increase replication (to 10) of the file.
> (3) delete the file while ReplicationMonitor is scheduling blocks belong to 
> that file for replications.
> if ReplicationMonitor stuck reappeared, NameNode will print log as:
> {code:xml}
> 2016-04-19 10:20:48,083 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> ..
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
> replicas: expected size is 7 but only 0 storage types can be selected 
> (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, 
> DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) All required storage types are unavailable:  
> unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) 
> process same block at the same moment.
> (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to 
> replicate and leave the global lock.
> (2) FSNamesystem#delete invoked to delete blocks then clear the reference in 
> blocksmap, needReplications, etc. the block's NumBytes will set 
> NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does 
> not need explicit ACK from the node. 
> (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to 
> chooseTargets for the same blocks and no node will be selected after traverse 
> whole cluster because  no node choice satisfy the goodness criteria 
> (remaining spaces achieve required size Long.MAX_VALUE). 
> During of stage#3 ReplicationMonitor stuck for long time, especial in a large 
> cluster. invalidateBlocks & neededReplications continues to grow and no 
> consumes. it will loss data at the worst.
> This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block 
> and remove it from neededReplications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org