[jira] [Commented] (HDFS-10643) HDFS namenode should always use service user (hdfs) to generateEncryptedKey

2016-08-01 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403408#comment-15403408
 ] 

Xiao Chen commented on HDFS-10643:
--

Thanks [~xyao] for revving!

The change LGTM too, but the test is passing even without the fix. I think (not 
debugged, sorry if not correct) this is because NN will warm up the cache after 
HDFS-9405, so the test didn't trigger the KMS ACL check. 
Why {{createFile}} is done 3 times in the test? Is it for cache draining? I 
think we could set the cache size to 1 make it fail if so. 

Also a nit: in the test, can we remove this? 
{code}
try {
...
} catch (IOException e) {
throw new IOException(e);
}
{code}

> HDFS namenode should always use service user (hdfs) to generateEncryptedKey
> ---
>
> Key: HDFS-10643
> URL: https://issues.apache.org/jira/browse/HDFS-10643
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10643.00.patch, HDFS-10643.01.patch, 
> HDFS-10643.02.patch, HDFS-10643.03.patch, HDFS-10643.04.patch
>
>
> KMSClientProvider is designed to be shared by different KMS clients. When 
> HDFS Namenode as KMS client talks to KMS to generateEncryptedKey for new file 
> creation from proxy user (hive, oozie), the proxyuser handling for 
> KMSClientProvider in this case is unnecessary, which cause 1) an extra proxy 
> user configuration allowing hdfs user to proxy its clients and 2) KMS acls to 
> allow non-hdfs user for GENERATE_EEK operation. 
> This ticket is opened to always use HDFS namenode login user (hdfs) when 
> talking to KMS to generateEncryptedKey for new file creation. This way, we 
> have a more secure KMS based HDFS encryption (we can set kms-acls to allow 
> only hdfs user for GENERATE_EEK) with less configuration hassle for KMS to 
> allow hdfs to proxy other users. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-01 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403399#comment-15403399
 ] 

Fenghua Hu commented on HDFS-10682:
---

Arpit/Liang,
Looks like there is one JIRA(https://issues.apache.org/jira/browse/HDFS-9668) 
to address the big lock issue, maybe we should we relate them?



> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> This Jira proposes to replace the FsDatasetImpl object lock with a separate 
> lock object. Doing so will make it easier to measure lock statistics like 
> lock held time and warn about potential lock contention due to slow disk 
> operations.
> In the future we can also consider replacing the lock with a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10706) Add tool generating FSImage from external store

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403393#comment-15403393
 ] 

Hadoop QA commented on HDFS-10706:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 8s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-tools {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
21s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
6s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m  
5s{color} | {color:red} hadoop-tools in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m  
4s{color} | {color:red} hadoop-fs2img in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m  
7s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m  7s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 6s{color} | {color:green} root: The patch generated 0 new + 0 unchanged - 6 
fixed = 0 total (was 6) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m  
6s{color} | {color:red} hadoop-tools in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m  
4s{color} | {color:red} hadoop-fs2img in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvneclipse {color} | {color:red}  0m  
6s{color} | {color:red} hadoop-tools in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvneclipse {color} | {color:red}  0m  
4s{color} | {color:red} hadoop-fs2img in the patch failed. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-tools {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m  
4s{color} | {color:red} hadoop-fs2img in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m  
7s{color} | {color:red} hadoop-tools in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m  
4s{color} | {color:red} hadoop-fs2img in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 14s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
58s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m  9s{color} 
| {color:red} hadoop-tools in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m  8s{color} 
| {color:red} hadoop-fs2img in the 

[jira] [Commented] (HDFS-10678) Documenting NNThroughputBenchmark tool

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403341#comment-15403341
 ] 

Hadoop QA commented on HDFS-10678:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
32s{color} | {color:green} root: The patch generated 0 new + 140 unchanged - 1 
fixed = 140 total (was 141) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m  
9s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
41s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 62m 
47s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}118m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821489/HDFS-10678.002.patch |
| JIRA Issue | HDFS-10678 |
| Optional Tests |  asflicense  mvnsite  compile  javac  javadoc  mvninstall  
unit  findbugs  checkstyle  xml  |
| uname | Linux 83279432c1c9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403325#comment-15403325
 ] 

Konstantin Shvachko commented on HDFS-10301:


Unfortunately, there seems to be a problem with the patch. Storage report is 
not recognized in certain cases.
Will revert the commits.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10714) Issue in handling checksum errors in write pipeline when fault DN is LAST_IN_PIPELINE

2016-08-01 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403322#comment-15403322
 ] 

Brahma Reddy Battula commented on HDFS-10714:
-

Thinking solutions like this.

1) Remove both DNs in checksum error case..i.e DN2 and DN3

2) Remove DN3  first and record DN2 as suspect node .. If it still fails with 
checksum error , then  DN2 can be removed as it's suspected during next pipeline

I think, 2nd solution will be safe.. 

anythoughts on this...?  cc [~kanaka]/[~vinayrpet]

> Issue in handling checksum errors in write pipeline when fault DN is 
> LAST_IN_PIPELINE
> -
>
> Key: HDFS-10714
> URL: https://issues.apache.org/jira/browse/HDFS-10714
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>
> We had come across one issue, where write is failed even 7 DN’s are available 
> due to network fault at one datanode which is LAST_IN_PIPELINE. It will be 
> similar to HDFS-6937 .
> Scenario : (DN3 has N/W Fault and Min repl=2).
> Write pipeline:
> DN1->DN2->DN3  => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad
> DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad
> ….
> And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more 
> datanodes to construct the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10714) Issue in handling checksum errors in write pipeline when fault DN is LAST_IN_PIPELINE

2016-08-01 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned HDFS-10714:
---

Assignee: Brahma Reddy Battula

> Issue in handling checksum errors in write pipeline when fault DN is 
> LAST_IN_PIPELINE
> -
>
> Key: HDFS-10714
> URL: https://issues.apache.org/jira/browse/HDFS-10714
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>
> We had come across one issue, where write is failed even 7 DN’s are available 
> due to network fault at one datanode which is LAST_IN_PIPELINE. It will be 
> similar to HDFS-6937 .
> Scenario : (DN3 has N/W Fault and Min repl=2).
> Write pipeline:
> DN1->DN2->DN3  => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad
> DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad
> ….
> And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more 
> datanodes to construct the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6937) Another issue in handling checksum errors in write pipeline

2016-08-01 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403313#comment-15403313
 ] 

Brahma Reddy Battula commented on HDFS-6937:


bq. If your problem is really a network issue, then your proposed solution 
sounds reasonable to me. However, it seems different than what HDFS-6937 
intends to solve, and I think we can create a new jira for your issue. Here is 
why:

Initially thought of handling with this issue only. Thanks for 
correction..Raised HDFS-10714 to handle seperately..

> Another issue in handling checksum errors in write pipeline
> ---
>
> Key: HDFS-6937
> URL: https://issues.apache.org/jira/browse/HDFS-6937
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 2.5.0
>Reporter: Yongjun Zhang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-6937.001.patch, HDFS-6937.002.patch
>
>
> Given a write pipeline:
> DN1 -> DN2 -> DN3
> DN3 detected cheksum error and terminate, DN2 truncates its replica to the 
> ACKed size. Then a new pipeline is attempted as
> DN1 -> DN2 -> DN4
> DN4 detects checksum error again. Later when replaced DN4 with DN5 (and so 
> on), it failed for the same reason. This led to the observation that DN2's 
> data is corrupted. 
> Found that the software currently truncates DN2's replca to the ACKed size 
> after DN3 terminates. But it doesn't check the correctness of the data 
> already written to disk.
> So intuitively, a solution would be, when downstream DN (DN3 here) found 
> checksum error, propagate this info back to upstream DN (DN2 here), DN2 
> checks the correctness of the data already written to disk, and truncate the 
> replica to to MIN(correctDataSize, ACKedSize).
> Found this issue is similar to what was reported by HDFS-3875, and the 
> truncation at DN2 was actually introduced as part of the HDFS-3875 solution. 
> Filing this jira for the issue reported here. HDFS-3875 was filed by 
> [~tlipcon]
> and found he proposed something similar there.
> {quote}
> if the tail node in the pipeline detects a checksum error, then it returns a 
> special error code back up the pipeline indicating this (rather than just 
> disconnecting)
> if a non-tail node receives this error code, then it immediately scans its 
> own block on disk (from the beginning up through the last acked length). If 
> it detects a corruption on its local copy, then it should assume that it is 
> the faulty one, rather than the downstream neighbor. If it detects no 
> corruption, then the faulty node is either the downstream mirror or the 
> network link between the two, and the current behavior is reasonable.
> {quote}
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10714) Issue in handling checksum errors in write pipeline when fault DN is LAST_IN_PIPELINE

2016-08-01 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created HDFS-10714:
---

 Summary: Issue in handling checksum errors in write pipeline when 
fault DN is LAST_IN_PIPELINE
 Key: HDFS-10714
 URL: https://issues.apache.org/jira/browse/HDFS-10714
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brahma Reddy Battula


We had come across one issue, where write is failed even 7 DN’s are available 
due to network fault at one datanode which is LAST_IN_PIPELINE. It will be 
similar to HDFS-6937 .

Scenario : (DN3 has N/W Fault and Min repl=2).

Write pipeline:
DN1->DN2->DN3  => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad
DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad
….
And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more 
datanodes to construct the pipeline.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10706) Add tool generating FSImage from external store

2016-08-01 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-10706:
-
Status: Patch Available  (was: Open)

> Add tool generating FSImage from external store
> ---
>
> Key: HDFS-10706
> URL: https://issues.apache.org/jira/browse/HDFS-10706
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, tools
>Reporter: Chris Douglas
> Attachments: HDFS-10706.001.patch
>
>
> To experiment with provided storage, this provides a tool to map an external 
> namespace to an FSImage/NN storage. By loading it in a NN, one can access the 
> remote FS using HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10586) Erasure Code misfunctions when 3 DataNode down

2016-08-01 Thread gao shan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403291#comment-15403291
 ] 

gao shan commented on HDFS-10586:
-

Thanks. but I feel that is not caused by network.  The cluster consists of  10 
nodes ( 1 namenode and 9 datanodes ) , which are all virtual machines (15G 
memory for per vm) created in a same physical server machine. And IPs of these 
10 nodes are assigned in a same internal network segment ( 192.168.X .X ).

> Erasure Code misfunctions when 3 DataNode down
> --
>
> Key: HDFS-10586
> URL: https://issues.apache.org/jira/browse/HDFS-10586
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
> Environment: 9 DataNode and 1 NameNode,Erasured code policy is 
> set as "6--3",   When 3 DataNode down,  erasured code fails and an exception 
> is thrown
>Reporter: gao shan
>
> The following is the steps to reproduce:
> 1) hadoop fs -mkdir /ec
> 2) set erasured code policy as "6-3"
> 3) "write" data by : 
> time hadoop jar 
> /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar
>   TestDFSIO -D test.build.data=/ec -write -nrFiles 30 -fileSize 12288 
> -bufferSize 1073741824
> 4) Manually down 3 nodes.  Kill the threads of  "datanode" and "nodemanager" 
> in 3 DataNode.
> 5) By using erasured code to "read" data by:
> time hadoop jar 
> /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar
>   TestDFSIO -D test.build.data=/ec -read -nrFiles 30 -fileSize 12288 
> -bufferSize 1073741824
> then the failure occurs and the exception is thrown as:
> INFO mapreduce.Job: Task Id : attempt_1465445965249_0008_m_34_2, Status : 
> FAILED
> Error: java.io.IOException: 4 missing blocks, the stripe is: Offset=0, 
> length=8388608, fetchedChunksNum=0, missingChunksNum=4
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.checkMissingBlocks(DFSStripedInputStream.java:614)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readParityChunks(DFSStripedInputStream.java:647)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:762)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:316)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:450)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:941)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:531)
>   at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:508)
>   at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:134)
>   at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:37)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10678) Documenting NNThroughputBenchmark tool

2016-08-01 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10678:
-
Attachment: HDFS-10678.002.patch

The v2 patch is to address [~iwasakims]'s last comment. Thanks for the 
suggestion.

{quote}
I think Benchmarking.md should be just toc and the doc of NNThroughputBenchmakr 
should be under hadoop-hdfs-project/hadoop-hdfs/src/site as independent page.
{quote}
Adding a toc file seems good for long term, but a bit heavy/overkill because it 
will have only one link as content as the only benchmarking related material by 
now is {{NNThroughputBenchmark}}.
Does it make sense to add a new *menu* for the 
hadoop-project/src/site/site.xml? In this way, we organize all benchmarking 
related pages in one section, while keep themselves independent and well placed.

> Documenting NNThroughputBenchmark tool
> --
>
> Key: HDFS-10678
> URL: https://issues.apache.org/jira/browse/HDFS-10678
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: benchmarks, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>  Labels: documentation
> Attachments: HDFS-10678.000.patch, HDFS-10678.001.patch, 
> HDFS-10678.002.patch
>
>
> The best (only) documentation for the NNThroughputBenchmark currently exists 
> as a JavaDoc on the NNThroughputBenchmark class. This is less than useful, 
> especially since we no longer generate javadocs for HDFS as part of the build 
> process. I suggest we extract it into a separate markdown doc, or merge it 
> with other benchmarking materials (if any?) about HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10467) Router-based HDFS federation

2016-08-01 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403257#comment-15403257
 ] 

Inigo Goiri commented on HDFS-10467:


Regarding the rebalancing operations, currently we are proposing to disallow 
write accesses from the Routers. The problem is that then we have to disallow 
direct accesses to the Namenodes to prevent writes at that level. For this 
reason, we could leverage the concept of immutable folders/files from HDFS-3154 
and more recently HDFS-7568. Not sure how likely are those efforts are to move 
forward though.

> Router-based HDFS federation
> 
>
> Key: HDFS-10467
> URL: https://issues.apache.org/jira/browse/HDFS-10467
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.7.2
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS Router Federation.pdf, HDFS-10467.PoC.001.patch, 
> HDFS-10467.PoC.patch, HDFS-Router-Federation-Prototype.patch
>
>
> Add a Router to provide a federated view of multiple HDFS clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403191#comment-15403191
 ] 

Hadoop QA commented on HDFS-10682:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 24s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 16 new + 109 unchanged - 11 fixed = 125 total (was 120) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 10s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 86m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock |
|   | hadoop.hdfs.server.datanode.TestBlockRecovery |
| Timed out junit tests | org.apache.hadoop.hdfs.TestReadWhileWriting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821459/HDFS-10682.006.patch |
| JIRA Issue | HDFS-10682 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 81bdc8af99e1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9f473cf |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16282/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16282/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16282/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16282/console |
| Powered 

[jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403169#comment-15403169
 ] 

Hadoop QA commented on HDFS-10702:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
5s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 49s{color} | {color:orange} root: The patch generated 9 new + 1030 unchanged 
- 2 fixed = 1039 total (was 1032) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
21s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
12s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 75m  
6s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821448/HDFS-10702.003.patch |
| JIRA Issue | HDFS-10702 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux 222ef60ae1e7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9f473cf |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16281/artifact/patchprocess/diff-checkstyle-root.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16281/artifact/patchprocess/whitespace-tabs.txt
 |
| 

[jira] [Updated] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-01 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-10682:
-
Description: 
This Jira proposes to replace the FsDatasetImpl object lock with a separate 
lock object. Doing so will make it easier to measure lock statistics like lock 
held time and warn about potential lock contention due to slow disk operations.

In the future we can also consider replacing the lock with a read-write lock.

  was:
This Jira proposes to replace the FsDatasetImpl object lock with a separate 
lock object. Doing so will allow us to measure lock statistics.

In the future we can also consider replacing the lock with a read-write lock.


> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> This Jira proposes to replace the FsDatasetImpl object lock with a separate 
> lock object. Doing so will make it easier to measure lock statistics like 
> lock held time and warn about potential lock contention due to slow disk 
> operations.
> In the future we can also consider replacing the lock with a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403126#comment-15403126
 ] 

Hadoop QA commented on HDFS-10712:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
51s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} branch-2 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} branch-2 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 50m 
30s{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_101. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:b59b8b7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821441/HDFS-10712.branch-2.patch
 |
| JIRA Issue | HDFS-10712 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ed9b865e076a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2 / 4ad2a73 |
| Default Java | 1.7.0_101 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_101 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 |
| findbugs | v3.0.0 |
| JDK v1.7.0_101  Test Results | 

[jira] [Updated] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-01 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-10682:
-
Description: 
This Jira proposes to replace the FsDatasetImpl object lock with a separate 
lock object. Doing so will allow us to measure lock statistics.

In the future we can also consider replacing the lock with a read-write lock.

  was:Add a metric to measure the time the lock of FSDataSetImpl is held by a 
thread. The goal is to expose this for users to identify operations that locks 
dataset for long time ("long" in some sense) and be able to 
understand/reason/track the operation based on logs.


> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> This Jira proposes to replace the FsDatasetImpl object lock with a separate 
> lock object. Doing so will allow us to measure lock statistics.
> In the future we can also consider replacing the lock with a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-01 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403118#comment-15403118
 ] 

Arpit Agarwal commented on HDFS-10682:
--

bq. And I think we need dataset to expose lock acquire and release to those 
callers. What do you think?
Hi [~vagarychen], yes that's correct. We'd need FsDatasetImpl (and perhaps 
FsDatasetSpi) to expose locking routines.

bq. I was trying to take into account lock acquire time and lock release time 
there, which requires recording time before acquiring lock and after releasing 
lock and this is ThreadLocal is all about. 
Yes I agree with the idea, if we want to incorporate lock-held/release time 
we'd need either thread locals or the map you talked about. But we can skip 
measuring those values for simplicity and to avoid questions like thread local 
overhead. We can consider adding more measurements later.

> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-01 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403093#comment-15403093
 ] 

Chen Liang commented on HDFS-10682:
---

Thank you [~arpitagarwal] very much for the review! A few things I would like 
to clarify though:

1. by "fix the other locations", did you mean fixing places like such?:
synchronized(dataset) {
...
}
As dataset itself has been refactored with a separate lock, having this 
synchronized cal meanswe would have two locks here. I have been thinking about 
situations like this. And I think we need dataset to expose lock acquire and 
release to those callers. What do you think?

2. Regarding the ThreadLocal variables, you have a good point. But I was trying 
to take into account lock acquire time and lock release time there, which 
requires recording time before acquiring lock and after releasing lock and this 
is ThreadLocal is all about. Do you have any comments on this? e.g. are these 
values worth recording?

3. And you are totally right that I don't need the equals zero checks, thanks 
for pointing it out!


> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10643) HDFS namenode should always use service user (hdfs) to generateEncryptedKey

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403083#comment-15403083
 ] 

Hadoop QA commented on HDFS-10643:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 59m 
58s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821442/HDFS-10643.04.patch |
| JIRA Issue | HDFS-10643 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 6201e31dde22 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9f473cf |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16280/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16280/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> HDFS namenode should always use service user (hdfs) to generateEncryptedKey
> ---
>
> Key: HDFS-10643
> URL: https://issues.apache.org/jira/browse/HDFS-10643
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10643.00.patch, HDFS-10643.01.patch, 
> HDFS-10643.02.patch, HDFS-10643.03.patch, HDFS-10643.04.patch
>
>
> KMSClientProvider is designed to be 

[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-01 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403061#comment-15403061
 ] 

Arpit Agarwal commented on HDFS-10682:
--

Thanks for the updated patch [~vagarychen]! This is looking good. A couple of 
comments:
# We also need to fix other locations that are synchronizing on the 
FSDatasetImpl object e.g. 
[FsVolumeImpl|https://github.com/apache/hadoop/blob/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java#L307],
 
[DirectoryScanner|https://github.com/apache/hadoop/blob/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java#L586].
# Let's move the instrumentation changes to a separate Jira. We can repurpose 
this just for splitting out the lock. Comments on the instrumentation changes:
## We don't need ThreadLocal or threadID-> timestamps map. We are measuring the 
lock held time so we can save a timestamp just after getting the lock and 
another timestamp just before releasing the lock. Then diff them with the lock 
held and log after releasing the lock. We may need to use a thread local 
approach later if we have a read-write lock in which case there can be multiple 
concurrent lock holders.
## You don't need the {{if (start == 0 || start2 == 0)}} checks. These values 
they can be assumed to be correct now they are initialized in the lock class.


> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-01 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-10682:
-
Summary: Replace FsDatasetImpl object lock with a separate lock object  
(was: Add metric to measure lock held time in FSDataSetImpl)

> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403045#comment-15403045
 ] 

Konstantin Shvachko commented on HDFS-10301:


We are actively looking into possible problem with this change. LMK if the 
revert fixes the problem. Just to clarify you are using per-storage reports on 
your cluster?
In the meantime answering your questions Daryn.

??Why is this patch changing per-storage reports when it's the single-rpc 
report that is the problem???
The problem is both with single-rpc and per-storage reports. In multi-rpc case 
DNs can send repeated RPCs for each storage and this will cause incorrect 
zombie detection if RPCs processed out of order.

??Is this change compatible???
Yes. The compatibility issues were discussed here above.

??What does an old NN do if it gets this pseudo-report???
According to [Rolling upgrade 
documentation|https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html]
 we first upgrade NameNodes, then DataNodes. So in practice new DNs don't talk 
to old NNs.

??What does a new NN do when it gets old style reports? Will it remove all but 
the last storage???
As mentioned in [this 
comment|https://issues.apache.org/jira/browse/HDFS-10301?focusedCommentId=15271737=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15271737]
 old DataNodes reports will be processed as regular reports, only zombie 
storages will not be removed until DNs upgraded.
During upgrade no storages are removed.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-08-01 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Status: In Progress  (was: Patch Available)

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-08-01 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Status: Patch Available  (was: In Progress)

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-08-01 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-10682:
--
Attachment: HDFS-10682.006.patch

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10713) Throttle FsNameSystem lock warnings

2016-08-01 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-10713:
-
Assignee: Hanisha Koneru

> Throttle FsNameSystem lock warnings
> ---
>
> Key: HDFS-10713
> URL: https://issues.apache.org/jira/browse/HDFS-10713
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging, namenode
>Reporter: Arpit Agarwal
>Assignee: Hanisha Koneru
>
> The NameNode logs a message if the FSNamesystem write lock is held by a 
> thread for over 1 second. These messages can be throttled to at one most one 
> per x minutes to avoid potentially filling up NN logs. We can also log the 
> number of suppressed notices since the last log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10713) Throttle FsNameSystem lock warnings

2016-08-01 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-10713:
-
Target Version/s: 2.8.0

> Throttle FsNameSystem lock warnings
> ---
>
> Key: HDFS-10713
> URL: https://issues.apache.org/jira/browse/HDFS-10713
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging, namenode
>Reporter: Arpit Agarwal
>
> The NameNode logs a message if the FSNamesystem write lock is held by a 
> thread for over 1 second. These messages can be throttled to at one most one 
> per x minutes to avoid potentially filling up NN logs. We can also log the 
> number of suppressed notices since the last log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10643) HDFS namenode should always use service user (hdfs) to generateEncryptedKey

2016-08-01 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402994#comment-15402994
 ] 

Jitendra Nath Pandey commented on HDFS-10643:
-

+1

> HDFS namenode should always use service user (hdfs) to generateEncryptedKey
> ---
>
> Key: HDFS-10643
> URL: https://issues.apache.org/jira/browse/HDFS-10643
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10643.00.patch, HDFS-10643.01.patch, 
> HDFS-10643.02.patch, HDFS-10643.03.patch, HDFS-10643.04.patch
>
>
> KMSClientProvider is designed to be shared by different KMS clients. When 
> HDFS Namenode as KMS client talks to KMS to generateEncryptedKey for new file 
> creation from proxy user (hive, oozie), the proxyuser handling for 
> KMSClientProvider in this case is unnecessary, which cause 1) an extra proxy 
> user configuration allowing hdfs user to proxy its clients and 2) KMS acls to 
> allow non-hdfs user for GENERATE_EEK operation. 
> This ticket is opened to always use HDFS namenode login user (hdfs) when 
> talking to KMS to generateEncryptedKey for new file creation. This way, we 
> have a more secure KMS based HDFS encryption (we can set kms-acls to allow 
> only hdfs user for GENERATE_EEK) with less configuration hassle for KMS to 
> allow hdfs to proxy other users. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10713) Throttle FsNameSystem lock warnings

2016-08-01 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-10713:


 Summary: Throttle FsNameSystem lock warnings
 Key: HDFS-10713
 URL: https://issues.apache.org/jira/browse/HDFS-10713
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: logging, namenode
Reporter: Arpit Agarwal


The NameNode logs a message if the FSNamesystem write lock is held by a thread 
for over 1 second. These messages can be throttled to at one most one per x 
minutes to avoid potentially filling up NN logs. We can also log the number of 
suppressed notices since the last log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

2016-08-01 Thread Jiayi Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Zhou updated HDFS-10702:
--
Attachment: HDFS-10702.003.patch

Fix checkstyle. Some of the style problems are intentionally preserved.

> Add a Client API and Proxy Provider to enable stale read from Standby
> -
>
> Key: HDFS-10702
> URL: https://issues.apache.org/jira/browse/HDFS-10702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jiayi Zhou
>Assignee: Jiayi Zhou
>Priority: Minor
> Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, 
> HDFS-10702.003.patch, StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing 
> any metadata operation, which means active NameNode could be a bottleneck for 
> scalability. One way to solve this problem is to send read-only operations to 
> Standby NameNode. The disadvantage is that it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from 
> Standby which gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402951#comment-15402951
 ] 

Daryn Sharp commented on HDFS-10301:


I've read this jira as I said I would, and I've looked at the patch.

Our nightly build & deploy for 2.7 is broken.  DNs claim to report thousands of 
blocks, NN says nope, -1.  This should be reason enough to revert until we get 
to the bottom of it.  We're reverting internally.  If that fixes it, I will 
have someone help me revert tomorrow morning if not already.

Why is this patch changing per-storage reports when it's the single-rpc report 
that is the problem?  Is this change compatible?
# What does an old NN do if it gets this pseudo-report?  Will it forget about 
all the blocks on the non-last storage?
# What does a new NN do when it gets old style reports?  Will it remove all but 
the last storage?

This zombie detection, report context, etc is getting out of hand.  I don't 
understand why the zombie detection isn't based on the healthy storages in the 
heartbeat.  Anything else gets flagged as failed and the heartbeat monitor 
disposes of them.


> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10643) HDFS namenode should always use service user (hdfs) to generateEncryptedKey

2016-08-01 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10643:
--
Attachment: HDFS-10643.04.patch

Thanks [~jnp] for the review. Attach a patch that address the comments.

> HDFS namenode should always use service user (hdfs) to generateEncryptedKey
> ---
>
> Key: HDFS-10643
> URL: https://issues.apache.org/jira/browse/HDFS-10643
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10643.00.patch, HDFS-10643.01.patch, 
> HDFS-10643.02.patch, HDFS-10643.03.patch, HDFS-10643.04.patch
>
>
> KMSClientProvider is designed to be shared by different KMS clients. When 
> HDFS Namenode as KMS client talks to KMS to generateEncryptedKey for new file 
> creation from proxy user (hive, oozie), the proxyuser handling for 
> KMSClientProvider in this case is unnecessary, which cause 1) an extra proxy 
> user configuration allowing hdfs user to proxy its clients and 2) KMS acls to 
> allow non-hdfs user for GENERATE_EEK operation. 
> This ticket is opened to always use HDFS namenode login user (hdfs) when 
> talking to KMS to generateEncryptedKey for new file creation. This way, we 
> have a more secure KMS based HDFS encryption (we can set kms-acls to allow 
> only hdfs user for GENERATE_EEK) with less configuration hassle for KMS to 
> allow hdfs to proxy other users. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402938#comment-15402938
 ] 

Vinitha Reddy Gankidi commented on HDFS-10712:
--

Done.

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.branch-2.7.patch, HDFS-10712.branch-2.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: HDFS-10712.branch-2.patch

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.branch-2.7.patch, HDFS-10712.branch-2.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-10712:
---
Status: Patch Available  (was: Open)

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.branch-2.7.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: HDFS-10712.branch-2.7.patch

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.branch-2.7.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: (was: HDFS-10712.001.patch)

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402893#comment-15402893
 ] 

Konstantin Shvachko commented on HDFS-10712:


The patch looks good.  I understand this is for branch-2.7, could you please 
attach one for branch-2 as well.

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.001.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: HDFS-10712.001.patch

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.001.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: (was: HDFS-10712.001.patch)

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402881#comment-15402881
 ] 

Vinitha Reddy Gankidi commented on HDFS-10712:
--

[~shv] I have attached a patch. Can you please take a look? Thanks. 

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.001.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi reassigned HDFS-10712:


Assignee: Vinitha Reddy Gankidi

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
> Attachments: HDFS-10712.001.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinitha Reddy Gankidi updated HDFS-10712:
-
Attachment: HDFS-10712.001.patch

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
> Attachments: HDFS-10712.001.patch
>
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402826#comment-15402826
 ] 

Konstantin Shvachko commented on HDFS-10301:


Daryn, I do not understand what you disagree with. And what is the problem with 
the implementation, which you object to?
Nobody is taking away per-storage block reports.

If you don't have time to understand the jira and don't have time to look at 
your own sandbox cluster, then how I can help you.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-10712:
---
Affects Version/s: 2.7.4
 Target Version/s: 2.7.4

> Fix TestDataNodeVolumeFailure on 2.* branches.
> --
>
> Key: HDFS-10712
> URL: https://issues.apache.org/jira/browse/HDFS-10712
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>
> {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
> {{BlockReportContext}}.
> This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.

2016-08-01 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created HDFS-10712:
--

 Summary: Fix TestDataNodeVolumeFailure on 2.* branches.
 Key: HDFS-10712
 URL: https://issues.apache.org/jira/browse/HDFS-10712
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Konstantin Shvachko


{{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null 
{{BlockReportContext}}.
This has been fixed on trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402798#comment-15402798
 ] 

Daryn Sharp commented on HDFS-10301:


bq. If NN doesn't come out of safe mode, then wouldn't that be caught by unit 
tests.

You have more faith in the unit tests than I do. :)  I do not have time to 
fully debug why sandbox clusters are DOA when I object to the implementation 
anyway.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402795#comment-15402795
 ] 

Daryn Sharp commented on HDFS-10301:


Block report processing does need to be so complicated.  Just ban single-rpc 
reports and the problem goes away.  At most the DN is retransmitting the same 
storage report.  Reprocessing it should not be a problem.

If the only objection is multiple RPCs are a scalability issue, I completely 
disagree.
# A single RPC is not scalable.  It will not work on clusters with many 
hundreds of millions of blocks.
# The size of the RPC quickly becomes an issue.  The memory pressure and 
pre-mature promotion rate - even with a huge young gen (8-16G) - is not 
sustainable.
# The time to process the RPC becomes an issue.  The DN timing out and 
retransmitting (and causing this jira's bug) becomes an issue.

Per-storage block reports eliminated multiple full GCs (2-3 for 5-10mins each) 
during startup on large clusters.

Please revert or I'll grab someone here to help me do it.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402784#comment-15402784
 ] 

Konstantin Shvachko commented on HDFS-10301:


Looks like we need to fix {{TestDataNodeVolumeFailure}} for all 2 branches. 
Will open a jira for that promptly.
Sorry guys for breaking your build.

[~daryn], it seems that you are overreacting a bit. Only one test is broken. I 
rerun other tests reported by Jenkins. They all pass.
Could you please elaborate on the problem with the sandbox cluster. If NN 
doesn't come out of safe mode, then wouldn't that be caught by unit tests.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402747#comment-15402747
 ] 

Konstantin Shvachko commented on HDFS-10301:


And the rest of the tests are passing locally.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Vinitha Reddy Gankidi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402734#comment-15402734
 ] 

Vinitha Reddy Gankidi commented on HDFS-10301:
--

[~ebadger] Thanks for reporting this. TestDataNodeVolumeFailure does not call 
blockReport() with context=null on trunk. This was fixed as a part of 
HDFS-9260. We need to modify TestDataNodeVolumeFailure.testVolumeFailure() for 
branch-2.7 as well:
{code}
-cluster.getNameNodeRpc().blockReport(dnR, bpid, reports, null);
+cluster.getNameNodeRpc().blockReport(dnR, bpid, reports,
+new BlockReportContext(1, 0, System.nanoTime()));
{code}

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp reopened HDFS-10301:


> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402697#comment-15402697
 ] 

Daryn Sharp commented on HDFS-10301:


-1  This needs to be reverted and I'm too git-ignorant to to do.  Our sandbox 
clusters won't come out of safemode because the NN thinks the DNs are reporting 
-1 blocks.  I see this patch is return -1 blocks for a "storage report".  I 
need to catch up on this jira but in the meantime it must be reverted.

I find it odd this patch was committed with so many failed tests.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8780) Fetching live/dead datanode list with arg true for removeDecommissionNode,returns list with decom node.

2016-08-01 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402686#comment-15402686
 ] 

Eric Badger commented on HDFS-8780:
---

[~shv], the 2.7 patch that you committed here breaks 
TestHostsFiles.testHostsExcludeInUI. The failure is consistently reproducible 
and the associated stack trace is shown below.

{noformat}
java.lang.AssertionError: Live nodes should contain the decommissioned node
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.server.namenode.TestHostsFiles.testHostsExcludeInUI(TestHostsFiles.java:126)
{noformat}

> Fetching live/dead datanode list with arg true for 
> removeDecommissionNode,returns list with decom node.
> ---
>
> Key: HDFS-8780
> URL: https://issues.apache.org/jira/browse/HDFS-8780
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: J.Andreina
>Assignee: J.Andreina
> Fix For: 2.7.4
>
> Attachments: HDFS-8780-branch-2.7.patch, HDFS-8780.1.patch, 
> HDFS-8780.2.patch, HDFS-8780.3.patch
>
>
> Current implementation: 
> ==
> DatanodeManager#removeDecomNodeFromList() , Decommissioned node will be 
> removed from dead/live node list only if below conditions are met
>  I . If the Include list is not empty. 
>  II. If include and exclude list does not have decommissioned node and node 
> state is decommissioned. 
> {code}
>   if (!hostFileManager.hasIncludes()) {
>   return;
>}
>   if ((!hostFileManager.isIncluded(node)) && 
> (!hostFileManager.isExcluded(node))
>   && node.isDecommissioned()) {
> // Include list is not empty, an existing datanode does not appear
> // in both include or exclude lists and it has been decommissioned.
> // Remove it from the node list.
> it.remove();
>   }
> {code}
> As mentioned in javadoc a datanode cannot be in "already decommissioned 
> datanode state".
> Following the steps mentioned in javadoc datanode state is "dead" and not 
> decommissioned.
> *Can we avoid the unnecessary checks and have check for the node is in 
> decommissioned state then remove from node list. ?*
> Please provide your feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10643) HDFS namenode should always use service user (hdfs) to generateEncryptedKey

2016-08-01 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402547#comment-15402547
 ] 

Jitendra Nath Pandey commented on HDFS-10643:
-

Minor comment:
The {{edek}} declaration and assignment could be done on the same line i.e.
{code}
EncryptedKeyVersion edek = SecurityUtil.doAs
{code}

> HDFS namenode should always use service user (hdfs) to generateEncryptedKey
> ---
>
> Key: HDFS-10643
> URL: https://issues.apache.org/jira/browse/HDFS-10643
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10643.00.patch, HDFS-10643.01.patch, 
> HDFS-10643.02.patch, HDFS-10643.03.patch
>
>
> KMSClientProvider is designed to be shared by different KMS clients. When 
> HDFS Namenode as KMS client talks to KMS to generateEncryptedKey for new file 
> creation from proxy user (hive, oozie), the proxyuser handling for 
> KMSClientProvider in this case is unnecessary, which cause 1) an extra proxy 
> user configuration allowing hdfs user to proxy its clients and 2) KMS acls to 
> allow non-hdfs user for GENERATE_EEK operation. 
> This ticket is opened to always use HDFS namenode login user (hdfs) when 
> talking to KMS to generateEncryptedKey for new file creation. This way, we 
> have a more secure KMS based HDFS encryption (we can set kms-acls to allow 
> only hdfs user for GENERATE_EEK) with less configuration hassle for KMS to 
> allow hdfs to proxy other users. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-08-01 Thread Chen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402481#comment-15402481
 ] 

Chen Liang commented on HDFS-10682:
---

Thanks [~arpitagarwal] for the comments! Will upload another patch fixing this 
soon.

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl

2016-08-01 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402478#comment-15402478
 ] 

Arpit Agarwal commented on HDFS-10682:
--

Hi [~vagarychen], thanks for taking this up. I recommend splitting the work 
into two parts:
# Refactor the code to synchronize on a new Reentrant lock instead of the 
FsDatasetImpl object. (create a separate Jira for this). The advantage of a 
wrapper object for the lock is callers won't need to add boilerplate code for 
instrumentation. Also we can use try-with-resources instead of having to 
release the lock manually.
# In the second patch we can add instrumentation in just the acquire/close 
methods and expose it as a metric.

> Add metric to measure lock held time in FSDataSetImpl
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch
>
>
> Add a metric to measure the time the lock of FSDataSetImpl is held by a 
> thread. The goal is to expose this for users to identify operations that 
> locks dataset for long time ("long" in some sense) and be able to 
> understand/reason/track the operation based on logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10674) Optimize creating a full path from an inode

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402443#comment-15402443
 ] 

Hadoop QA commented on HDFS-10674:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 293 unchanged - 4 fixed = 293 total (was 297) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 29s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 97m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12819495/HDFS-10674.patch |
| JIRA Issue | HDFS-10674 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux efda9c68b7fa 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9f473cf |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16277/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16277/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16277/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Optimize creating a full path from an inode
> ---
>
> Key: HDFS-10674
> URL: https://issues.apache.org/jira/browse/HDFS-10674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>

[jira] [Commented] (HDFS-10678) Documenting NNThroughputBenchmark tool

2016-08-01 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402427#comment-15402427
 ] 

Masatake Iwasaki commented on HDFS-10678:
-

{noformat}
+| OPERATION\_OPTION| Commands |
{noformat}

"Commands" should be "operation-specific parameters"?


> Documenting NNThroughputBenchmark tool
> --
>
> Key: HDFS-10678
> URL: https://issues.apache.org/jira/browse/HDFS-10678
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: benchmarks, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>  Labels: documentation
> Attachments: HDFS-10678.000.patch, HDFS-10678.001.patch
>
>
> The best (only) documentation for the NNThroughputBenchmark currently exists 
> as a JavaDoc on the NNThroughputBenchmark class. This is less than useful, 
> especially since we no longer generate javadocs for HDFS as part of the build 
> process. I suggest we extract it into a separate markdown doc, or merge it 
> with other benchmarking materials (if any?) about HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10656) Optimize conversion of byte arrays back to path string

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402419#comment-15402419
 ] 

Hadoop QA commented on HDFS-10656:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 62m 
39s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 85m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12819146/HDFS-10656.patch |
| JIRA Issue | HDFS-10656 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ccd3bc7fa258 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9f473cf |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16278/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16278/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Optimize conversion of byte arrays back to path string
> --
>
> Key: HDFS-10656
> URL: https://issues.apache.org/jira/browse/HDFS-10656
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-10656.patch
>
>
> {{DFSUtil.byteArray2PathString}} generates excessive object allocation.
> # each 

[jira] [Commented] (HDFS-10678) Documenting NNThroughputBenchmark tool

2016-08-01 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402397#comment-15402397
 ] 

Masatake Iwasaki commented on HDFS-10678:
-

Thanks for working on this, [~liuml07].

I think Benchmarking.md should be just toc and the doc of NNThroughputBenchmakr 
should be under hadoop-hdfs-project/hadoop-hdfs/src/site as independent page.


> Documenting NNThroughputBenchmark tool
> --
>
> Key: HDFS-10678
> URL: https://issues.apache.org/jira/browse/HDFS-10678
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: benchmarks, test
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>  Labels: documentation
> Attachments: HDFS-10678.000.patch, HDFS-10678.001.patch
>
>
> The best (only) documentation for the NNThroughputBenchmark currently exists 
> as a JavaDoc on the NNThroughputBenchmark class. This is less than useful, 
> especially since we no longer generate javadocs for HDFS as part of the build 
> process. I suggest we extract it into a separate markdown doc, or merge it 
> with other benchmarking materials (if any?) about HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-01 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402341#comment-15402341
 ] 

Eric Badger commented on HDFS-10301:


[~shv], this breaks TestDataNodeVolumeFailure.testVolumeFailure(). 
blockReport() is called with context = null. Then inside of blockReport we try 
to call methods on context with it still set to null

{noformat}
java.lang.NullPointerException: null
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1342)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:189)
{noformat}

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8901) Use ByteBuffer in striping positional read

2016-08-01 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402328#comment-15402328
 ] 

Kai Zheng commented on HDFS-8901:
-

Hi Youwei,

Thanks for your update. Please note we should support both direct ByteBuffer 
and on-heap ByteBuffer, thus calling aBuffer.array() isn't proper. I would 
suggest you resume this effort based on the previous patch, instead of 
reworking this at all.

> Use ByteBuffer in striping positional read
> --
>
> Key: HDFS-8901
> URL: https://issues.apache.org/jira/browse/HDFS-8901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Youwei Wang
> Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, 
> HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, 
> HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, 
> HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, 
> HDFS-8901.v13.patch, initial-poc.patch
>
>
> Native erasure coder prefers to direct ByteBuffer for performance 
> consideration. To prepare for it, this change uses ByteBuffer through the 
> codes in implementing striping position read. It will also fix avoiding 
> unnecessary data copying between striping read chunk buffers and decode input 
> buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10655) Fix path related byte array conversion bugs

2016-08-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402266#comment-15402266
 ] 

Hudson commented on HDFS-10655:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #10188 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10188/])
HDFS-10655. Fix path related byte array conversion bugs. (daryn) (daryn: rev 
9f473cf903e586c556154abd56b3a3d820c6b028)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestPathComponents.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java


> Fix path related byte array conversion bugs
> ---
>
> Key: HDFS-10655
> URL: https://issues.apache.org/jira/browse/HDFS-10655
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0
>
> Attachments: HDFS-10655.patch, HDFS-10655.patch
>
>
> {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple 
> separators, nor does it handle relative paths correctly.
> {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless 
> the specified range is the entire component array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10674) Optimize creating a full path from an inode

2016-08-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-10674:
---
Status: Patch Available  (was: Open)

> Optimize creating a full path from an inode
> ---
>
> Key: HDFS-10674
> URL: https://issues.apache.org/jira/browse/HDFS-10674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-10674.patch
>
>
> {{INode#getFullPathName}} walks up the inode tree, creates a INode[], 
> converts each component byte[] name to a String while building the path.  
> This involves many allocations, copies, and char conversions.
> The path should be built with a single byte[] allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10656) Optimize conversion of byte arrays back to path string

2016-08-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-10656:
---
Status: Patch Available  (was: Open)

> Optimize conversion of byte arrays back to path string
> --
>
> Key: HDFS-10656
> URL: https://issues.apache.org/jira/browse/HDFS-10656
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-10656.patch
>
>
> {{DFSUtil.byteArray2PathString}} generates excessive object allocation.
> # each byte array is encoded to a string (copy)
> # string appended to a builder which extracts the chars from the intermediate 
> string (copy) and adds to its own char array
> # builder's char array is re-alloced if over 16 chars (copy)
> # builder's toString creates another string (copy)
> Instead of allocating all these objects and performing multiple byte/char 
> encoding/decoding conversions, the byte array can be built in-place with a 
> single final conversion to a string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10655) Fix path related byte array conversion bugs

2016-08-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-10655:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Thanks Jing!

> Fix path related byte array conversion bugs
> ---
>
> Key: HDFS-10655
> URL: https://issues.apache.org/jira/browse/HDFS-10655
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0
>
> Attachments: HDFS-10655.patch, HDFS-10655.patch
>
>
> {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple 
> separators, nor does it handle relative paths correctly.
> {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless 
> the specified range is the entire component array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10711) Optimize FSPermissionChecker group membership check

2016-08-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-10711:
---
Attachment: (was: HDFS-10711.patch)

> Optimize FSPermissionChecker group membership check
> ---
>
> Key: HDFS-10711
> URL: https://issues.apache.org/jira/browse/HDFS-10711
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-10711.patch
>
>
> HADOOP-13442 obviates the need for multiple group related object allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10711) Optimize FSPermissionChecker group membership check

2016-08-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-10711:
---
Attachment: HDFS-10711.patch

> Optimize FSPermissionChecker group membership check
> ---
>
> Key: HDFS-10711
> URL: https://issues.apache.org/jira/browse/HDFS-10711
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-10711.patch
>
>
> HADOOP-13442 obviates the need for multiple group related object allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10711) Optimize FSPermissionChecker group membership check

2016-08-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-10711:
---
Attachment: HDFS-10711.patch

> Optimize FSPermissionChecker group membership check
> ---
>
> Key: HDFS-10711
> URL: https://issues.apache.org/jira/browse/HDFS-10711
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-10711.patch
>
>
> HADOOP-13442 obviates the need for multiple group related object allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10711) Optimize FSPermissionChecker group membership check

2016-08-01 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10711:
--

 Summary: Optimize FSPermissionChecker group membership check
 Key: HDFS-10711
 URL: https://issues.apache.org/jira/browse/HDFS-10711
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


HADOOP-13442 obviates the need for multiple group related object allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10673) Optimize FSPermissionChecker's internal path usage

2016-08-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402090#comment-15402090
 ] 

Daryn Sharp commented on HDFS-10673:


[~jingzhao], please let me know if latest patch is ok, or if I should revert 
checking subdir access back to calling the inode attr provider with just the 
components of the original subdir.

> Optimize FSPermissionChecker's internal path usage
> --
>
> Key: HDFS-10673
> URL: https://issues.apache.org/jira/browse/HDFS-10673
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-10673.1.patch, HDFS-10673.patch
>
>
> The INodeAttributeProvider and AccessControlEnforcer features degrade 
> performance and generate excessive garbage even when neither is used.  Main 
> issues:
> # A byte[][] of components is unnecessarily created.  Each path component 
> lookup converts a subrange of the byte[][] to a new String[] - then not used 
> by default attribute provider.
> # Subaccess checks are insanely expensive.  The full path of every subdir is 
> created by walking up the inode tree, creating a INode[], building a string 
> by converting each inode's byte[] name to a string, etc.  Which will only be 
> used if there's an exception.
> The expensive of #1 should only be incurred when using the provider/enforcer 
> feature.  For #2, paths should be created on-demand for exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10700) I increase the value of the GC_OPTS on namenode. After I modified the value ,namenode start failed.

2016-08-01 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402007#comment-15402007
 ] 

Weiwei Yang commented on HDFS-10700:


Hi [~biggersell]

Can you upload related namenode log to help diagnosis ? Thanks!

> I increase the value of the GC_OPTS on namenode. After I modified the value 
> ,namenode start failed.
> ---
>
> Key: HDFS-10700
> URL: https://issues.apache.org/jira/browse/HDFS-10700
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
> Environment: Linux Suse 11 SP3
>Reporter: Liu Guannan
>
> I increase the value of the GC_OPTS on namenode. After I modified the value 
> ,namenode start failed.The reasion is that Datanodes reported  block status 
> to the namenode, resulting in namenode update block status slowly. And then 
> namenode start failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401907#comment-15401907
 ] 

Hadoop QA commented on HDFS-10710:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m  6s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
|   | hadoop.hdfs.TestDFSShell |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821300/HDFS-10710.1.patch |
| JIRA Issue | HDFS-10710 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 04b97c4c9717 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 770b5eb |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16276/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16276/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16276/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
> counts should be get with the protect with lock
> 

[jira] [Commented] (HDFS-10602) TestBalancer runs timeout intermittently

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401798#comment-15401798
 ] 

Hadoop QA commented on HDFS-10602:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 43s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.TestDFSShell |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821292/HDFS-10602.002.patch |
| JIRA Issue | HDFS-10602 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux da9ab7982318 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 770b5eb |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16275/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16275/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16275/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestBalancer runs timeout intermittently
> 
>
> Key: HDFS-10602
> URL: https://issues.apache.org/jira/browse/HDFS-10602
> Project: Hadoop HDFS
>  Issue Type: Bug
>

[jira] [Updated] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock

2016-08-01 Thread GAO Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-10710:
---
Status: Patch Available  (was: Open)

> In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
> counts should be get with the protect with lock
> -
>
> Key: HDFS-10710
> URL: https://issues.apache.org/jira/browse/HDFS-10710
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: GAO Rui
>Assignee: GAO Rui
> Attachments: HDFS-10710.1.patch
>
>
> In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
> counts should be get with the protect with lock. Or, log records like "-1 
> blocks are removed" which indicate minus blocks are removed could be 
> generated. 
> For example, following scenario:
> 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = 
> getPostponedMisreplicatedBlocksCount();}} currently  
> startPostponedMisReplicatedBlocksCount get the value 20.
> 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment 
> postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount 
> is 21 now.
> 3. thread1 end the iteration, but no postponed block is removed, so after run 
> {{long endPostponedMisReplicatedBlocksCount = 
> getPostponedMisreplicatedBlocksCount();}}, 
> endPostponedMisReplicatedBlocksCount get the value of 21.
> 4. thread 1 generate the log:
> {noformat}
>   LOG.info("Rescan of postponedMisreplicatedBlocks completed in " +
>   (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) 
> +
>   " msecs. " + endPostponedMisReplicatedBlocksCount +
>   " blocks are left. " + (startPostponedMisReplicatedBlocksCount -
>   endPostponedMisReplicatedBlocksCount) + " blocks are removed.");
> {noformat}
> Then, we'll get the log record like "-1 blocks are removed."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock

2016-08-01 Thread GAO Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-10710:
---
Attachment: HDFS-10710.1.patch

> In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
> counts should be get with the protect with lock
> -
>
> Key: HDFS-10710
> URL: https://issues.apache.org/jira/browse/HDFS-10710
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: GAO Rui
>Assignee: GAO Rui
> Attachments: HDFS-10710.1.patch
>
>
> In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
> counts should be get with the protect with lock. Or, log records like "-1 
> blocks are removed" which indicate minus blocks are removed could be 
> generated. 
> For example, following scenario:
> 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = 
> getPostponedMisreplicatedBlocksCount();}} currently  
> startPostponedMisReplicatedBlocksCount get the value 20.
> 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment 
> postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount 
> is 21 now.
> 3. thread1 end the iteration, but no postponed block is removed, so after run 
> {{long endPostponedMisReplicatedBlocksCount = 
> getPostponedMisreplicatedBlocksCount();}}, 
> endPostponedMisReplicatedBlocksCount get the value of 21.
> 4. thread 1 generate the log:
> {noformat}
>   LOG.info("Rescan of postponedMisreplicatedBlocks completed in " +
>   (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) 
> +
>   " msecs. " + endPostponedMisReplicatedBlocksCount +
>   " blocks are left. " + (startPostponedMisReplicatedBlocksCount -
>   endPostponedMisReplicatedBlocksCount) + " blocks are removed.");
> {noformat}
> Then, we'll get the log record like "-1 blocks are removed."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock

2016-08-01 Thread GAO Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-10710:
---
Description: 
In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
counts should be get with the protect with lock. Or, log records like "-1 
blocks are removed" which indicate minus blocks are removed could be generated. 

For example, following scenario:
1. thread1 run {{long startPostponedMisReplicatedBlocksCount = 
getPostponedMisreplicatedBlocksCount();}} currently  
startPostponedMisReplicatedBlocksCount get the value 20.
2. before thread1 run {{namesystem.writeLock();}} , thread2 increment 
postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount is 
21 now.
3. thread1 end the iteration, but no postponed block is removed, so after run 
{{long endPostponedMisReplicatedBlocksCount = 
getPostponedMisreplicatedBlocksCount();}}, endPostponedMisReplicatedBlocksCount 
get the value of 21.
4. thread 1 generate the log:
{noformat}
  LOG.info("Rescan of postponedMisreplicatedBlocks completed in " +
  (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) +
  " msecs. " + endPostponedMisReplicatedBlocksCount +
  " blocks are left. " + (startPostponedMisReplicatedBlocksCount -
  endPostponedMisReplicatedBlocksCount) + " blocks are removed.");
{noformat}
Then, we'll get the log record like "-1 blocks are removed."

  was:
In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
counts should be get with the protect with lock. Or, log records like "-1 
blocks are removed" which indicate minus blocks are removed could be generated. 

For example, following scenario:
1. thread1 run {{long startPostponedMisReplicatedBlocksCount = 
getPostponedMisreplicatedBlocksCount();}} currently  
startPostponedMisReplicatedBlocksCount get the value 20.
2. before thread1 run {{namesystem.writeLock();}} , thread2 increment 
postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount is 
21 now.
3. thread1 end the iteration, but no postponed block is removed, so after run 
{{long endPostponedMisReplicatedBlocksCount = 
getPostponedMisreplicatedBlocksCount();}}, endPostponedMisReplicatedBlocksCount 
get the value of 21.
4. thread 1 generate the log:
{code}
LOG.info("Rescan of postponedMisreplicatedBlocks completed in " +
(Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) +
" msecs. " + endPostponedMisReplicatedBlocksCount +
" blocks are left. " + (startPostponedMisReplicatedBlocksCount -
endPostponedMisReplicatedBlocksCount) + " blocks are removed.");
{code}
Then, we'll get the log record like "-1 blocks are removed."


> In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
> counts should be get with the protect with lock
> -
>
> Key: HDFS-10710
> URL: https://issues.apache.org/jira/browse/HDFS-10710
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: GAO Rui
>Assignee: GAO Rui
>
> In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
> counts should be get with the protect with lock. Or, log records like "-1 
> blocks are removed" which indicate minus blocks are removed could be 
> generated. 
> For example, following scenario:
> 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = 
> getPostponedMisreplicatedBlocksCount();}} currently  
> startPostponedMisReplicatedBlocksCount get the value 20.
> 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment 
> postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount 
> is 21 now.
> 3. thread1 end the iteration, but no postponed block is removed, so after run 
> {{long endPostponedMisReplicatedBlocksCount = 
> getPostponedMisreplicatedBlocksCount();}}, 
> endPostponedMisReplicatedBlocksCount get the value of 21.
> 4. thread 1 generate the log:
> {noformat}
>   LOG.info("Rescan of postponedMisreplicatedBlocks completed in " +
>   (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) 
> +
>   " msecs. " + endPostponedMisReplicatedBlocksCount +
>   " blocks are left. " + (startPostponedMisReplicatedBlocksCount -
>   endPostponedMisReplicatedBlocksCount) + " blocks are removed.");
> {noformat}
> Then, we'll get the log record like "-1 blocks are removed."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock

2016-08-01 Thread GAO Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-10710:
---
Description: 
In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
counts should be get with the protect with lock. Or, log records like "-1 
blocks are removed" which indicate minus blocks are removed could be generated. 

For example, following scenario:
1. thread1 run {{long startPostponedMisReplicatedBlocksCount = 
getPostponedMisreplicatedBlocksCount();}} currently  
startPostponedMisReplicatedBlocksCount get the value 20.
2. before thread1 run {{namesystem.writeLock();}} , thread2 increment 
postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount is 
21 now.
3. thread1 end the iteration, but no postponed block is removed, so after run 
{{long endPostponedMisReplicatedBlocksCount = 
getPostponedMisreplicatedBlocksCount();}}, endPostponedMisReplicatedBlocksCount 
get the value of 21.
4. thread 1 generate the log:
{code}
LOG.info("Rescan of postponedMisreplicatedBlocks completed in " +
(Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) +
" msecs. " + endPostponedMisReplicatedBlocksCount +
" blocks are left. " + (startPostponedMisReplicatedBlocksCount -
endPostponedMisReplicatedBlocksCount) + " blocks are removed.");
{code}
Then, we'll get the log record like "-1 blocks are removed."

  was:
In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
counts should be get with the protect with lock. Or, log records like "-1 
blocks are removed" which indicate minus blocks are removed could be generated. 

For example, following scenario:
1. thread1 run {{long startPostponedMisReplicatedBlocksCount = 
getPostponedMisreplicatedBlocksCount();}} currently  
startPostponedMisReplicatedBlocksCount get the value 20.
2. before thread1 run {{namesystem.writeLock();}} , thread2 increment 
postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount is 
21 now.
3. thread1 end the iteration, but no postponed block is removed, so after run 
{{long endPostponedMisReplicatedBlocksCount = 
getPostponedMisreplicatedBlocksCount();}}, endPostponedMisReplicatedBlocksCount 
get the value of 21.
4. thread 1 generate the log:
```
LOG.info("Rescan of postponedMisreplicatedBlocks completed in " +
(Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) +
" msecs. " + endPostponedMisReplicatedBlocksCount +
" blocks are left. " + (startPostponedMisReplicatedBlocksCount -
endPostponedMisReplicatedBlocksCount) + " blocks are removed.");
```
Then, we'll get the log record like "-1 blocks are removed."


> In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
> counts should be get with the protect with lock
> -
>
> Key: HDFS-10710
> URL: https://issues.apache.org/jira/browse/HDFS-10710
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: GAO Rui
>Assignee: GAO Rui
>
> In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
> counts should be get with the protect with lock. Or, log records like "-1 
> blocks are removed" which indicate minus blocks are removed could be 
> generated. 
> For example, following scenario:
> 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = 
> getPostponedMisreplicatedBlocksCount();}} currently  
> startPostponedMisReplicatedBlocksCount get the value 20.
> 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment 
> postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount 
> is 21 now.
> 3. thread1 end the iteration, but no postponed block is removed, so after run 
> {{long endPostponedMisReplicatedBlocksCount = 
> getPostponedMisreplicatedBlocksCount();}}, 
> endPostponedMisReplicatedBlocksCount get the value of 21.
> 4. thread 1 generate the log:
> {code}
> LOG.info("Rescan of postponedMisreplicatedBlocks completed in " +
> (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) +
> " msecs. " + endPostponedMisReplicatedBlocksCount +
> " blocks are left. " + (startPostponedMisReplicatedBlocksCount -
> endPostponedMisReplicatedBlocksCount) + " blocks are removed.");
> {code}
> Then, we'll get the log record like "-1 blocks are removed."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock

2016-08-01 Thread GAO Rui (JIRA)
GAO Rui created HDFS-10710:
--

 Summary: In BlockManager#rescanPostponedMisreplicatedBlocks(), 
start and end block counts should be get with the protect with lock
 Key: HDFS-10710
 URL: https://issues.apache.org/jira/browse/HDFS-10710
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: GAO Rui
Assignee: GAO Rui


In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block 
counts should be get with the protect with lock. Or, log records like "-1 
blocks are removed" which indicate minus blocks are removed could be generated. 

For example, following scenario:
1. thread1 run {{long startPostponedMisReplicatedBlocksCount = 
getPostponedMisreplicatedBlocksCount();}} currently  
startPostponedMisReplicatedBlocksCount get the value 20.
2. before thread1 run {{namesystem.writeLock();}} , thread2 increment 
postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount is 
21 now.
3. thread1 end the iteration, but no postponed block is removed, so after run 
{{long endPostponedMisReplicatedBlocksCount = 
getPostponedMisreplicatedBlocksCount();}}, endPostponedMisReplicatedBlocksCount 
get the value of 21.
4. thread 1 generate the log:
```
LOG.info("Rescan of postponedMisreplicatedBlocks completed in " +
(Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) +
" msecs. " + endPostponedMisReplicatedBlocksCount +
" blocks are left. " + (startPostponedMisReplicatedBlocksCount -
endPostponedMisReplicatedBlocksCount) + " blocks are removed.");
```
Then, we'll get the log record like "-1 blocks are removed."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8901) Use ByteBuffer in striping positional read

2016-08-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401762#comment-15401762
 ] 

Hadoop QA commented on HDFS-8901:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 28s{color} | {color:orange} hadoop-hdfs-project: The patch generated 4 new + 
89 unchanged - 0 fixed = 93 total (was 89) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 34s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.TestEditLog |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821280/HDFS-8901.v14.patch |
| JIRA Issue | HDFS-8901 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ad8d02b861b3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 34ccaa8 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16274/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16274/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16274/testReport/ |
| 

[jira] [Created] (HDFS-10709) hdfs shell du command not match OS du command

2016-08-01 Thread Jiahongchao (JIRA)
Jiahongchao created HDFS-10709:
--

 Summary: hdfs shell du command not match OS du command
 Key: HDFS-10709
 URL: https://issues.apache.org/jira/browse/HDFS-10709
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs
Affects Versions: 2.6.0
 Environment: centos 6.7,jdk 1.7
Reporter: Jiahongchao
Priority: Minor


I got files created by solr on HDFS, but the size is different when using HDFS 
du and centos du.
[apd@dev186 ~]$ hdfs dfs -du /solr/fileSizeTest/core_node1/data/tlog
46  402653184  /solr/fileSizeTest/core_node1/data/tlog/tlog.002
[apd@dev186 ~]$ hdfs dfs -ls /solr/fileSizeTest/core_node1/data/tlog
Found 1 items
-rw-r--r--   3 solr solr 46 2016-08-01 13:18 
/solr/fileSizeTest/core_node1/data/tlog/tlog.002

after download this file using get,
[apd@dev186 ~]$ ll -h  tlog.002 
-rw-r--r-- 1 apd apd 8.5M Aug  1 15:48 tlog.002

so what's dfhs dfs's du?And why the two values are so different? 46  vs 
402653184 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10602) TestBalancer runs timeout intermittently

2016-08-01 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10602:
-
Attachment: HDFS-10602.002.patch

> TestBalancer runs timeout intermittently
> 
>
> Key: HDFS-10602
> URL: https://issues.apache.org/jira/browse/HDFS-10602
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10602.001.patch, HDFS-10602.002.patch, fail.log, 
> pass.log
>
>
> As the jira HDFS-10336 has mentioned, the unit test 
> {{TestBalancer#testBalancerWithKeytabs}} will runs too slowly sometimes and 
> that leads the timeout. The test {{TestBalancer#testUnknownDatanodeSimple}}  
> will also has this problem. These two tests both use the method 
> {{testUnknownDatanode}}. We can do some optimization for this method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10602) TestBalancer runs timeout intermittently

2016-08-01 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401742#comment-15401742
 ] 

Yiqun Lin commented on HDFS-10602:
--

Thanks [~xiaochen] for the comments and thanks [~liuml07] for providing the 
logs.

{quote}
The 3rd DN is excluded for the test case.
{quote}
Hi, Xiao, it seems that the 3rd DN is not excluded in the test case. The 
balancer moves data always between 2 specific DNs so that the 3rd DN can't 
moves data. 

{quote}
It seems the code has been refactored so no getBlockList exists in trunk now
{quote}
Yes, the method getBlockList in {{Balancer.java}} has been changed, now the 
method was updated in {{Dispatcher#Source}}. And I have tested in the case, the 
getBlockList will return the srcBlocks in the failing case.

So the problem is mainly concentrated on why the balancer moves data always 
between 2 DNs. I add some codes to print the detail info.
{code}
private PendingMove chooseNextMove() {
  for (Iterator i = tasks.iterator(); i.hasNext();) {
final Task task = i.next();
final DDatanode target = task.target.getDDatanode();
final PendingMove pendingBlock = new PendingMove(this, task.target);
if (target.addPendingBlock(pendingBlock)) {
  // target is not busy, so do a tentative block allocation
  if (pendingBlock.chooseBlockAndProxy()) {
long blockSize = pendingBlock.reportedBlock.getNumBytes(this);
incScheduledSize(-blockSize);
task.size -= blockSize;
// Print the scheduled size for test
LOG.info("TargetNode: " + target.getDatanodeInfo().getXferPort()
+ ", bytes scheduled to move, after: " + task.size
+ ", before: " + (task.size + blockSize));
if (task.size == 0) {
  LOG.info("TargetNode removed.");
  i.remove();
}
LOG.info("Return pendingBlock for target node "
+ target.getDatanodeInfo().getXferPort());
return pendingBlock;
...
{code}
In here the task.size which means the bytes scheduled to move will not always 
reduce to 0. And then it will return this pendingBlock and the task for next 
target node will be ignored. In test, I saw that the 3rd DN in balancer is 
always the second targetNode in here, and the method will just return when 
deals with the first target node. These are my local logs:
{code}
2016-08-01 16:51:53,466 [pool-49-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(799)) - TargetNode: 58798, bytes scheduled to 
move, after: -1067, before: -967
2016-08-01 16:51:53,466 [pool-49-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(806)) - Return pendingBlock for target node 
58798
2016-08-01 16:51:53,466 [pool-50-thread-10] INFO  balancer.Dispatcher 
(Dispatcher.java:dispatch(322)) - Start moving blk_1073741833_1009 with 
size=100 from 127.0.0.1:58794:DISK to 127.0.0.1:58798:DISK through 
127.0.0.1:58794
2016-08-01 16:51:53,467 [pool-49-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(799)) - TargetNode: 58798, bytes scheduled to 
move, after: -1167, before: -1067
2016-08-01 16:51:53,467 [pool-49-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(806)) - Return pendingBlock for target node 
58798
2016-08-01 16:51:53,467 [pool-50-thread-11] INFO  balancer.Dispatcher 
(Dispatcher.java:dispatch(322)) - Start moving blk_1073741834_1010 with 
size=100 from 127.0.0.1:58794:DISK to 127.0.0.1:58798:DISK through 
127.0.0.1:58794
2016-08-01 16:51:53,468 [pool-49-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(799)) - TargetNode: 58798, bytes scheduled to 
move, after: -1267, before: -1167
2016-08-01 16:51:53,468 [pool-49-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(806)) - Return pendingBlock for target node 
58798
2016-08-01 16:51:53,468 [pool-50-thread-12] INFO  balancer.Dispatcher 
(Dispatcher.java:dispatch(322)) - Start moving blk_1073741835_1011 with 
size=100 from 127.0.0.1:58794:DISK to 127.0.0.1:58798:DISK through 
127.0.0.1:58794
2016-08-01 16:51:53,468 [pool-49-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(799)) - TargetNode: 58798, bytes scheduled to 
move, after: -1367, before: -1267
2016-08-01 16:51:53,468 [pool-49-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(806)) - Return pendingBlock for target node 
58798
2016-08-01 16:51:53,469 [pool-50-thread-13] INFO  balancer.Dispatcher 
(Dispatcher.java:dispatch(322)) - Start moving blk_1073741836_1012 with 
size=100 from 127.0.0.1:58794:DISK to 127.0.0.1:58798:DISK through 
127.0.0.1:58794
2016-08-01 16:51:53,469 [pool-49-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(799)) - TargetNode: 58798, bytes scheduled to 
move, after: -1467, before: -1367
2016-08-01 16:51:53,469 [pool-49-thread-1] INFO  

[jira] [Updated] (HDFS-8901) Use ByteBuffer in striping positional read

2016-08-01 Thread Youwei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Youwei Wang updated HDFS-8901:
--
Attachment: (was: HDFS-8901.v14.patch)

> Use ByteBuffer in striping positional read
> --
>
> Key: HDFS-8901
> URL: https://issues.apache.org/jira/browse/HDFS-8901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Youwei Wang
> Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, 
> HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, 
> HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, 
> HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, 
> HDFS-8901.v13.patch, initial-poc.patch
>
>
> Native erasure coder prefers to direct ByteBuffer for performance 
> consideration. To prepare for it, this change uses ByteBuffer through the 
> codes in implementing striping position read. It will also fix avoiding 
> unnecessary data copying between striping read chunk buffers and decode input 
> buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-8901) Use ByteBuffer in striping positional read

2016-08-01 Thread Youwei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Youwei Wang updated HDFS-8901:
--
Comment: was deleted

(was: New patch submitted.
File name: HDFS-8901.v14.patch
Based on commitid: 34ccaa8367f048ed9f56038efe7b3202c436b6e6
Commet: A small revision for the test class: TestDFSStripedInputStream.java)

> Use ByteBuffer in striping positional read
> --
>
> Key: HDFS-8901
> URL: https://issues.apache.org/jira/browse/HDFS-8901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Youwei Wang
> Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, 
> HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, 
> HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, 
> HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, 
> HDFS-8901.v13.patch, initial-poc.patch
>
>
> Native erasure coder prefers to direct ByteBuffer for performance 
> consideration. To prepare for it, this change uses ByteBuffer through the 
> codes in implementing striping position read. It will also fix avoiding 
> unnecessary data copying between striping read chunk buffers and decode input 
> buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8901) Use ByteBuffer in striping positional read

2016-08-01 Thread Youwei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Youwei Wang updated HDFS-8901:
--
Attachment: HDFS-8901.v14.patch

New patch submitted.
File name: HDFS-8901.v14.patch
Based on commitid: 34ccaa8367f048ed9f56038efe7b3202c436b6e6
Commet: A small revision for the test class: TestDFSStripedInputStream.java

> Use ByteBuffer in striping positional read
> --
>
> Key: HDFS-8901
> URL: https://issues.apache.org/jira/browse/HDFS-8901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Youwei Wang
> Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, 
> HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, 
> HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, 
> HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, 
> HDFS-8901.v13.patch, HDFS-8901.v14.patch, initial-poc.patch
>
>
> Native erasure coder prefers to direct ByteBuffer for performance 
> consideration. To prepare for it, this change uses ByteBuffer through the 
> codes in implementing striping position read. It will also fix avoiding 
> unnecessary data copying between striping read chunk buffers and decode input 
> buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org