[jira] [Commented] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.

2016-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205838#comment-15205838
 ] 

Hadoop QA commented on HDFS-9483:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 10m 20s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12776616/HDFS-9483.001.patch |
| JIRA Issue | HDFS-9483 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 8c9669470813 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / e7ed05e |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/14887/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured 
> WebHDFS.
> -
>
> Key: HDFS-9483
> URL: https://issues.apache.org/jira/browse/HDFS-9483
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9483.001.patch, HDFS-9483.patch
>
>
> If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in 
> a URL to access it.  The current documentation does not state this anywhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly

2016-03-21 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10185:
-
Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

Closed the jira, because this is a not a problem.

> TestHFlushInterrupted verifies interrupt state incorrectly
> --
>
> Key: HDFS-10185
> URL: https://issues.apache.org/jira/browse/HDFS-10185
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10185.001.patch
>
>
> In unit test {{TestHFlush#testHFlushInterrupted}}, there were some places 
> verifying interrupt state incorrectly. As follow:
> {code}
>   Thread.currentThread().interrupt();
>   try {
> stm.hflush();
> // If we made it past the hflush(), then that means that the ack made 
> it back
>// from the pipeline before we got to the wait() call. In that case we 
> should
> // still have interrupted status.
> assertTrue(Thread.interrupted());
>   } catch (InterruptedIOException ie) {
> System.out.println("Got expected exception during flush");
>   }
> {code}
> When stm do the {{hflush}} operation, it will throw interrupted exception and 
> the {{assertTrue(Thread.interrupted())}} will not be execute. And if you put 
> this before the {{hflush}}, this method will clear interrupted state and the 
> expected exception will not be throw. The similar problem also appears after 
> in stm.close.
> So we should use a way to get state without clearing interrupted state like 
> {{Thread.currentThread().isInterrupted()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9959) add log when block removed from last live datanode

2016-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205802#comment-15205802
 ] 

Hadoop QA commented on HDFS-9959:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 51s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 
14 unchanged - 0 fixed = 17 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 49s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 22s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 33s 
{color} | {color:red} Patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 115m 15s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles |
|   | hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.TestFileAppend |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
| JDK v1.8.0_74 Timed out junit tests | 

[jira] [Commented] (HDFS-7648) Verify that HDFS blocks are in the correct datanode directories

2016-03-21 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205769#comment-15205769
 ] 

Rakesh R commented on HDFS-7648:


Thanks [~szetszwo] for the reply.
bq. Why it needs a separated sub-task but not just updating the patch here?
bq. The two chunks of code above are duplicated. Please add a helper method to 
avoid the duplication. Thanks.
The attached patches in this jira is quite old. Please ignore these patches as 
the propsed changes has been implemented using HDFS-7819. I have raised this 
sub-task based on the 
[discussion|https://issues.apache.org/jira/browse/HDFS-7648?focusedCommentId=14332305=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14332305].
 Also, please refer 
[DirectoryScanner.java#L916|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java#L916]
 to see the code changes. Initially the discussion was to identify the wrong 
items and automatically fixing it, but couldn't reach to a conclusion. Since we 
haven't fully implemented the original idea of fixing the wrong blocks I 
thought of keeping this jira open(one can refer this jira comments to get the 
background) and fix smaller parts separately through sub-tasks.

Should I delete the attached old patches from this jira to avoid confusions if 
any?

> Verify that HDFS blocks are in the correct datanode directories
> ---
>
> Key: HDFS-7648
> URL: https://issues.apache.org/jira/browse/HDFS-7648
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-7648-3.patch, HDFS-7648-4.patch, HDFS-7648-5.patch, 
> HDFS-7648.patch, HDFS-7648.patch
>
>
> HDFS-6482 changed datanode layout to use block ID to determine the directory 
> to store the block.  We should have some mechanism to verify it.  Either 
> DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10190) Expose FSDatasetImpl lock metrics

2016-03-21 Thread John Zhuge (JIRA)
John Zhuge created HDFS-10190:
-

 Summary: Expose FSDatasetImpl lock metrics
 Key: HDFS-10190
 URL: https://issues.apache.org/jira/browse/HDFS-10190
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: John Zhuge
Assignee: John Zhuge


Expose FSDatasetImpl lock metrics:
* Number of lock calls
* Contention rate
* Average wait time

Locks of interest:
* FsDatasetImpl intrinsic lock
* FsDatasetImpl.statsLock



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly

2016-03-21 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205713#comment-15205713
 ] 

John Zhuge commented on HDFS-10185:
---

Close it then. Thanks for attempting to fix this unit test though, it is 
definitely flaky.

> TestHFlushInterrupted verifies interrupt state incorrectly
> --
>
> Key: HDFS-10185
> URL: https://issues.apache.org/jira/browse/HDFS-10185
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10185.001.patch
>
>
> In unit test {{TestHFlush#testHFlushInterrupted}}, there were some places 
> verifying interrupt state incorrectly. As follow:
> {code}
>   Thread.currentThread().interrupt();
>   try {
> stm.hflush();
> // If we made it past the hflush(), then that means that the ack made 
> it back
>// from the pipeline before we got to the wait() call. In that case we 
> should
> // still have interrupted status.
> assertTrue(Thread.interrupted());
>   } catch (InterruptedIOException ie) {
> System.out.println("Got expected exception during flush");
>   }
> {code}
> When stm do the {{hflush}} operation, it will throw interrupted exception and 
> the {{assertTrue(Thread.interrupted())}} will not be execute. And if you put 
> this before the {{hflush}}, this method will clear interrupted state and the 
> expected exception will not be throw. The similar problem also appears after 
> in stm.close.
> So we should use a way to get state without clearing interrupted state like 
> {{Thread.currentThread().isInterrupted()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly

2016-03-21 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205707#comment-15205707
 ] 

Lin Yiqun commented on HDFS-10185:
--

{quote}
However, do you think this comment is no longer true?
{quote}
The comment is true, and it means this issue is not a exactly problem. So shall 
I invalidate this jira or do you have other comments for this, [~jzhuge].

> TestHFlushInterrupted verifies interrupt state incorrectly
> --
>
> Key: HDFS-10185
> URL: https://issues.apache.org/jira/browse/HDFS-10185
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10185.001.patch
>
>
> In unit test {{TestHFlush#testHFlushInterrupted}}, there were some places 
> verifying interrupt state incorrectly. As follow:
> {code}
>   Thread.currentThread().interrupt();
>   try {
> stm.hflush();
> // If we made it past the hflush(), then that means that the ack made 
> it back
>// from the pipeline before we got to the wait() call. In that case we 
> should
> // still have interrupted status.
> assertTrue(Thread.interrupted());
>   } catch (InterruptedIOException ie) {
> System.out.println("Got expected exception during flush");
>   }
> {code}
> When stm do the {{hflush}} operation, it will throw interrupted exception and 
> the {{assertTrue(Thread.interrupted())}} will not be execute. And if you put 
> this before the {{hflush}}, this method will clear interrupted state and the 
> expected exception will not be throw. The similar problem also appears after 
> in stm.close.
> So we should use a way to get state without clearing interrupted state like 
> {{Thread.currentThread().isInterrupted()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9559) Add haadmin command to get HA state of all the namenodes

2016-03-21 Thread Surendra Singh Lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205701#comment-15205701
 ] 

Surendra Singh Lilhore commented on HDFS-9559:
--

Please can someone review this jira? 

This command is useful in multiple namenode cluster ( HDFS-6440 ).

> Add haadmin command to get HA state of all the namenodes
> 
>
> Key: HDFS-9559
> URL: https://issues.apache.org/jira/browse/HDFS-9559
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9559.01.patch
>
>
> Currently we have one command to get state of namenode.
> {code}
> ./hdfs haadmin -getServiceState 
> {code}
> It will be good to have command which will give state of all the namenodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.

2016-03-21 Thread Surendra Singh Lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205678#comment-15205678
 ] 

Surendra Singh Lilhore commented on HDFS-9483:
--

[~cnauroth], Please can you review...

> Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured 
> WebHDFS.
> -
>
> Key: HDFS-9483
> URL: https://issues.apache.org/jira/browse/HDFS-9483
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Chris Nauroth
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9483.001.patch, HDFS-9483.patch
>
>
> If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in 
> a URL to access it.  The current documentation does not state this anywhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-21 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205674#comment-15205674
 ] 

Vinayakumar B commented on HDFS-9847:
-

bq. Can we use nothrow exception way, but still print out the lossing precision 
log infos to let users know.
This seems to be better.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-2043) TestHFlush failing intermittently

2016-03-21 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-2043:

Attachment: HDFS-2043.003.patch

Update the patch and adding the sleep time in each retry chance. Testing 
{{TestHFlush}} again, this patch will looks effective if the result is passed 
again. Thanks for reviewing the patch.

> TestHFlush failing intermittently
> -
>
> Key: HDFS-2043
> URL: https://issues.apache.org/jira/browse/HDFS-2043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aaron T. Myers
>Assignee: Lin Yiqun
> Attachments: HDFS-2043.002.patch, HDFS-2043.003.patch, HDFS.001.patch
>
>
> I can't reproduce this failure reliably, but it seems like TestHFlush has 
> been failing intermittently, with the frequency increasing of late.
> Note the following two pre-commit test runs from different JIRAs where 
> TestHFlush seems to have failed spuriously:
> https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/
> https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.

2016-03-21 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205658#comment-15205658
 ] 

Brahma Reddy Battula commented on HDFS-9917:


Uploaded patch..Kindly review.. can we limit the number of IBR's to standby 
where DN keep accumulating the IBRs and use lot of memory..?

> IBR accumulate more objects when SNN was down for sometime.
> ---
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9917.patch
>
>
> SNN was down for sometime because of some reasons..After restarting SNN,it 
> became unreponsive because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), 
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by 
> one..This issue happened in 2.4.1 where split of blockreport was not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.

2016-03-21 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-9917:
---
Status: Patch Available  (was: Open)

> IBR accumulate more objects when SNN was down for sometime.
> ---
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9917.patch
>
>
> SNN was down for sometime because of some reasons..After restarting SNN,it 
> became unreponsive because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), 
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by 
> one..This issue happened in 2.4.1 where split of blockreport was not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.

2016-03-21 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-9917:
---
Attachment: HDFS-9917.patch

> IBR accumulate more objects when SNN was down for sometime.
> ---
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9917.patch
>
>
> SNN was down for sometime because of some reasons..After restarting SNN,it 
> became unreponsive because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), 
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by 
> one..This issue happened in 2.4.1 where split of blockreport was not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-21 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205644#comment-15205644
 ] 

Lin Yiqun commented on HDFS-9847:
-

{quote}
Yes. Similar to the discussion on casting the result to int, the caller is 
responsible for precision.
{quote}
Can we use nothrow exception way, but still print out the lossing precision log 
infos to let users know.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9951) Use string constants for XML tags in OfflineImageReconstructor

2016-03-21 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205621#comment-15205621
 ] 

Lin Yiqun commented on HDFS-9951:
-

Thanks [~cmccabe] for commit!

> Use string constants for XML tags in OfflineImageReconstructor
> --
>
> Key: HDFS-9951
> URL: https://issues.apache.org/jira/browse/HDFS-9951
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9551.001.patch, HDFS-9551.002.patch, 
> HDFS-9551.003.patch, HDFS-9551.004.patch
>
>
> In class {{OfflineImageReconstructor}}, it uses many {{SectionProcessors}} to 
> process xml files and load the subtree of the XML into a Node structure. But 
> there are lots of places that node removes key by directively writing value 
> in methods rather than define them first. Like this:
> {code}
> Node expiration = directive.removeChild("expiration");
> {code}
> We could improve this to define them in Node and them invoked like this way:
> {code}
> Node expiration=directive.removeChild(Node.CACHE_MANAGER_SECTION_EXPIRATION);
> {code}
> And it will be good to manager node key's name in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.

2016-03-21 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205591#comment-15205591
 ] 

Rushabh S Shah commented on HDFS-9874:
--

Sure. Will update the patch shortly.

> Long living DataXceiver threads cause volume shutdown to block.
> ---
>
> Key: HDFS-9874
> URL: https://issues.apache.org/jira/browse/HDFS-9874
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, 
> HDFS-9874-trunk.patch
>
>
> One of the failed volume shutdown took 3 days to complete.
> Below are the relevant datanode logs while shutting down a volume (due to 
> disk failure)
> {noformat}
> 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing 
> failed volume volumeA/current: 
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not 
> writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized
> at 
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194)
> at 
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing 
> scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23)
> 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting.
> 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b.
> java.io.FileNotFoundException: 
> volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp 
> (Read-only file system)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633)
> 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to 
> delete old dfsUsed file in 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current
> 2016-02-24 16:05:53,286 [Thread-49277] WARN impl.FsDatasetImpl: Failed to 
> write dfsUsed to 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed
> java.io.FileNotFoundException: 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed (Read-only 
> file system)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.saveDfsUsed(BlockPoolSlice.java:247)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.shutdown(BlockPoolSlice.java:698)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:815)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:328)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:250)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)
>   at 

[jira] [Commented] (HDFS-9959) add log when block removed from last live datanode

2016-03-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205585#comment-15205585
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9959:
---

For the new patch, please call init() in heartbeatCheck() but not add(..).  
Otherwise, blocks will be added for other cases such as when a block is deleted 
from the last datanode.

> ... Which solution is better: ...

Let's add only first thousand blocks, i.e. check missing.size() before adding 
the block.  We probably should print the log message in multiple lines, say 10 
blocks per line.

Thanks for the update.

> add log when block removed from last live datanode
> --
>
> Key: HDFS-9959
> URL: https://issues.apache.org/jira/browse/HDFS-9959
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>Priority: Minor
> Attachments: HDFS-9959.1.patch, HDFS-9959.2.patch, HDFS-9959.patch
>
>
> Add logs like "BLOCK* No live nodes contain block blk_1073741825_1001, last 
> datanode contain it is node: 127.0.0.1:65341" in BlockStateChange should help 
> to identify which datanode should be fixed first to recover missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++

2016-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205581#comment-15205581
 ] 

Hadoop QA commented on HDFS-9118:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 7m 35s 
{color} | {color:red} Docker failed to build yetus/hadoop:0cf5e66. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794615/HDFS-9118.HDFS-8707.003.patch
 |
| JIRA Issue | HDFS-9118 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/14884/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Add logging system for libdhfs++
> 
>
> Key: HDFS-9118
> URL: https://issues.apache.org/jira/browse/HDFS-9118
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: James Clampffer
> Attachments: HDFS-9118.HDFS-8707.000.patch, 
> HDFS-9118.HDFS-8707.001.patch, HDFS-9118.HDFS-8707.002.patch, 
> HDFS-9118.HDFS-8707.003.patch
>
>
> With HDFS-9505, we've starting logging data from libhdfs++.  Consumers of the 
> library are going to have their own logging infrastructure that we're going 
> to want to provide data to.  
> libhdfs++ should have a logging library that:
> * Is overridable and can provide sufficient information to work well with 
> common C++ logging frameworks
> * Has a rational default implementation 
> * Is performant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10189) PacketResponder toString is built incorrectly

2016-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205469#comment-15205469
 ] 

Hadoop QA commented on HDFS-10189:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 17s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 33s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 132m 36s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.TestHFlush |
| JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.TestDistributedFileSystem |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794590/HDFS-10189.patch |
| JIRA Issue | HDFS-10189 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux db8a1a3624b4 3.13.0-36-lowlatency #63-Ubuntu SMP 

[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205403#comment-15205403
 ] 

Devaraj Das commented on HDFS-3702:
---

Just FYI - the balancer work is being tracked in HBASE-8549.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205395#comment-15205395
 ] 

Arpit Agarwal commented on HDFS-3702:
-

If the region server has write permissions on /hbase/.logs, which I assume it 
does, it should be able to set policies on that directory. The ability for 
administrators to do so upfront would be a nice benefit but not a must.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9959) add log when block removed from last live datanode

2016-03-21 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao updated HDFS-9959:

Attachment: HDFS-9959.2.patch

How about this one?

In case of extreme case, for example, all DataNodes are dead, and the missing 
blocks may have more than hundred thousand, logging all of them might take very 
long time.
{code}
NameNode.blockStateChangeLog.warn("After removed " + dn +
  ", no live nodes contain the following " + missing.size() +
  " blocks: " + missing);
{code}

Which solution is better: 
 1. add only first thousand blocks and ignore others
 2. if there are more than thousand blocks missing, only logging first thousand?

> add log when block removed from last live datanode
> --
>
> Key: HDFS-9959
> URL: https://issues.apache.org/jira/browse/HDFS-9959
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>Priority: Minor
> Attachments: HDFS-9959.1.patch, HDFS-9959.2.patch, HDFS-9959.patch
>
>
> Add logs like "BLOCK* No live nodes contain block blk_1073741825_1001, last 
> datanode contain it is node: 127.0.0.1:65341" in BlockStateChange should help 
> to identify which datanode should be fixed first to recover missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205380#comment-15205380
 ] 

Nicolas Liochon commented on HDFS-3702:
---

bq. The issue was opened in July 2012 so we not holding our breath
If we're not holding our breath is also because we put a hack in HBase 
(HBASE-6435). However, this hack is not perfect and does not help on the write 
path (we write and flush 3 times while two would provide the same level of 
safety), and we still try to do a recoverLease on a dead node when there is a 
server crash.
bq. Yeah, vendors could ensure installers set the attribute.
imho, it's not an optional behavior for HBase (compared to favoredNode which 
was supposed to be a power-user configuration only): out of the box, HBase WALs 
should be written to 2 remote nodes by default, and never to the local node. So 
it would be much better to have the right behavior without requiring any extra 
work, scripts to run or code to deploy on the hdfs namenode (it's too easy to 
mess things up).

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-03-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10175:
-
Status: Patch Available  (was: In Progress)

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205369#comment-15205369
 ] 

Chris Douglas commented on HDFS-9847:
-

bq. Sorry I didn't understand your proposal to handle compatibility in the last 
para, agreed with everything before that.

For params without units we warn, which is consistent with our current strategy 
for deprecating config knobs. If the impl increases precision, then nothing 
distinguishes throw/nothrow. If we decrease precision, the nothrow 
implementation can decide what to do: discard lost precision (default), check 
the old precision and warn, or throw; if {{getTimeDuration}} throws, then old 
configs will throw (default) unless the impl checks the old precision and 
warns, or discards.

w.r.t. compatibility, the only difference is the default for decreasing 
precision. The caller is explicitly saying they want the unit in 
seconds/hours/days, and following the semantics of {{TimeUnit}} seems 
preferable to an error. 

I see your point about our existing cases: we can't switch from 
{{some.config.seconds}} to {{some.config}} with a default in millis by aliasing 
the former to the latter. We need to first warn that {{some.config.seconds}} is 
deprecated in a release, then drop it in the next. Where {{some.config}} has no 
units in the key, we can't convert the API to use {{getTimeDuration}} and 
increase precision in the same release. To be fair, this isn't less flexibility 
than we have now: we don't specify {{some.config\[.seconds\]}} as a float.

The opacity of discarded precision creating the possibility for misattribution 
(e.g., "increasing the hb interval from 1s to 1500ms improved stability" when 
it's noise) is also a concern, but I'd argue that's not a failure of the API.

bq. To continue the example of heartbeat interval do you mean we continue to 
query it with a unit of seconds for now and discard precision without throwing?

Yes. Similar to the discussion on casting the result to int, the caller is 
responsible for precision.

All that said, while I think nothrow is preferable, I won't insist on it and am 
+0 on the latest patch.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-03-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10175:
-
Attachment: HDFS-10175.001.patch

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205361#comment-15205361
 ] 

stack commented on HDFS-3702:
-

bq. ...could you comment on the usability of providing node lists to this API? 

Usually nodes and NN can agree on what they call machines but we've all seen 
plenty of clusters where this is not so. Both HDFS and HBase have their own 
means of insulating themselves against dodgy named setups. These systems are 
not in alignment.

bq. My impression was that tracking this in HBase was onerous, and is part of 
why favored nodes fell out of favor.

No. It was never fully plumbed in HBase (it was plumbed into a balancer that no 
one used and would not swap into place because the default was featureful). 
Regards the FB experience, we need to get them to do us a post-mortem.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205343#comment-15205343
 ] 

stack commented on HDFS-3702:
-

bq. Hi stack, the attribute could be set by an installer script or an API call 
at process startup

 [~arpitagarwal]Thanks. Yeah, vendors could ensure installers set the 
attribute. There are a significant set of installs where HBase shows up 
post-HDFS install and/or where HBase does not have sufficient permissions to 
set attributes on HDFS. I don't know the percentage. Would be just easier all 
around if it could be managed internally by HBase so no need to get scripts 
and/or operators involved. 

bq. ...so if you think HBase needs a solution now, ...

Smile. The issue was opened in July 2012 so we not holding our breath (smile). 
Would be cool if we could ask HDFS to not write local. Anyone doing WAL-on-HDFS 
will appreciate this in HDFS.

Thanks [~arpitagarwal]

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205337#comment-15205337
 ] 

stack commented on HDFS-3702:
-

bq. How would DFSClient know which nodes are disfavored nodes? How could it 
enforce disfavored nodes?

You postulated an application that wanted to  'distribute its files 
uniformly in a cluster.' I was just trying to suggest that users would prefer 
that HDFS would just do it for them. HDFS would know how to do it better being 
the arbiter of what is happening in the cluster. An application will do a poor 
job compared. 'distribute its files uniformly...' sounds like a good feature to 
implement with a block placement policy.

bq. Since we already have favoredNodes, adding disfavoredNodes seems more 
natural than adding a flag.

As noted above at 'stack added a comment - 12/Mar/16 15:20', favoredNodes is an 
unexercised feature that has actually been disavowed by the originators of the 
idea, FB, because it proved broken in practice. I'd suggest we not build more 
atop a feature-under-review as adding disfavoredNodes would (or at least until 
we hear of successful use of favoredNodes -- apparently our Y! are trying it).

bq. In addition, the new FileSystem CreateFlag does not look clean to me since 
it is too specific to HDFS. How would other FileSystems such as LocalFileSystem 
implement it?

The flag added by the attached patch is qualified throughout as a 'hint'. When 
set against LFS, it'll just be ignored. No harm done. The 'hint' didn't take.

If we went your suggested route and added a disfavoredNodes route, things get a 
bit interesting when hbase, say, passes localhost. What'll happen? Does the 
user now have to check the FS implementation type before they select DFSClient 
method to call?

I don't think you are objecting to the passing of flags on create, given this 
seems pretty standard fare in FSs.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-21 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205306#comment-15205306
 ] 

Arpit Agarwal commented on HDFS-9847:
-

Sorry I didn't understand your proposal to handle compatibility in the last 
para, agreed with everything before that. To continue the example of heartbeat 
interval do you mean we continue to query it with a unit of seconds for now and 
discard precision without throwing?

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205275#comment-15205275
 ] 

Arpit Agarwal commented on HDFS-3702:
-

bq. No need for an admin operator to remember to set attributes on specific 
dirs (99% won't);
Hi [~stack], the attribute could be set by an installer script or an API call 
at process startup. Since you agree pluggable policies are good to have 
eventually, CreateFlag becomes a stopgap. However this will take more time so 
if you think HBase needs a solution now, I'm -0. Thanks.

AddBlockFlag should be tagged as {{@InterfaceAudience.Private}} if we proceed 
with the .008 patch.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10189) PacketResponder toString is built incorrectly

2016-03-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205233#comment-15205233
 ] 

Colin Patrick McCabe commented on HDFS-10189:
-

+1 pending jenkins.  Thanks. [~jpallas]

> PacketResponder toString is built incorrectly
> -
>
> Key: HDFS-10189
> URL: https://issues.apache.org/jira/browse/HDFS-10189
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Joe Pallas
>Assignee: Joe Pallas
>Priority: Minor
> Attachments: HDFS-10189.patch
>
>
> The constructor for {{BlockReceiver.PacketResponder}} says
> {code}
>   final StringBuilder b = new StringBuilder(getClass().getSimpleName())
>   .append(": ").append(block).append(", type=").append(type);
>   if (type != PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE) {
> b.append(", downstreams=").append(downstreams.length)
> .append(":").append(Arrays.asList(downstreams));
>   }
> {code}
> So it includes the list of downstreams only when it has no downstreams.  The 
> {{if}} test should be for equality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9118) Add logging system for libdhfs++

2016-03-21 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9118:
--
Attachment: HDFS-9118.HDFS-8707.003.patch

New patch, incorporates a lot of [~bobhansen]'s suggestions.

-rebased onto head
-log macros to avoid creating LogMessage objects if they won't be logged
-add a separate public header for the C logging data structures and defines so 
that the implementation doesn't have to drag in the whole public API
-get rid of lock macros
-more helpful output when null pointers are passed to operator<<(const char*) 
and operator<<(std::string*)
-remove some garbage that slipped into the last patch (stuff commented out but 
not deleted)

> Add logging system for libdhfs++
> 
>
> Key: HDFS-9118
> URL: https://issues.apache.org/jira/browse/HDFS-9118
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: James Clampffer
> Attachments: HDFS-9118.HDFS-8707.000.patch, 
> HDFS-9118.HDFS-8707.001.patch, HDFS-9118.HDFS-8707.002.patch, 
> HDFS-9118.HDFS-8707.003.patch
>
>
> With HDFS-9505, we've starting logging data from libhdfs++.  Consumers of the 
> library are going to have their own logging infrastructure that we're going 
> to want to provide data to.  
> libhdfs++ should have a logging library that:
> * Is overridable and can provide sufficient information to work well with 
> common C++ logging frameworks
> * Has a rational default implementation 
> * Is performant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205215#comment-15205215
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3702:
---

> For uniform distribution of files over a cluster, I think users would prefer 
> that DFSClient managed it for them (a new flag on CreateFlag?) ...

How would DFSClient know which nodes are disfavored nodes?  How could it 
enforce disfavored nodes?

> ... disfavoredNodes seems like a more intrusive and roundabout route – with 
> its overrides, possible builders, and global interpretation of 'localhost' 
> string – to the clean flag this patch carries?

I disagree. Since we already have favoredNodes, adding disfavoredNodes seems 
more natural than adding a flag.

In addition, the new FileSystem CreateFlag does not look clean to me since it 
is too specific to HDFS.  How would other FileSystems such as LocalFileSystem 
implement it?

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205191#comment-15205191
 ] 

Chris Douglas commented on HDFS-9847:
-

Sorry, I overloaded "precision". By providing another parameter, it might 
provide semantics like "give me the value of this parameter in this unit, 
unless it sheds more than (x% | x | whatever) of the configured value". 
If the caller cares this much, they should ask for the var with high precision 
and make the decision to throw in that context.

bq. Without the precision check Configuration silently changes values without 
the caller knowing about it.

Isn't that the intent behind configuring with units? Throwing for loss of 
precision requires the operator to know the default precision (no improvement 
over the current approach, which embeds the unit in the config key), or go 
through at least one round of errors. To make this less painful for users, the 
caller should only use {{getTimeDuration}} with high precision, which gets us 
back to specifying everything but the string value in millis, internally. If 
{{getTimeDuration}} throws, then why is it better?

I'd argue handling backwards compatibility is easier without throwing for loss 
of precision. This warns for vars without units, which we've decided is 
sufficient for deprecated config vars. New versions can increase precision 
changing only the timeunit param, and old client configs will still work. If it 
wants to decrease precision, then the caller can warn/throw for values that 
lose precision (whatever's appropriate for its context), and migrate to the new 
unit in the next version.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205142#comment-15205142
 ] 

stack commented on HDFS-3702:
-

For uniform distribution of files over a cluster, I think users would prefer 
that DFSClient managed it for them (a new flag on CreateFlag?) rather than do 
calculation figuring how to populate favoredNodes and disfavoredNodes using 
imperfect knowledge of the cluster, something the NN will always do better at.

Unless you have other possible uses, disfavoredNodes seems like a more 
intrusive and roundabout route -- with its overrides, possible builders, and 
global interpretation of 'localhost' string -- to the clean flag this patch 
carries?

What you think [~szetszwo]? Thanks Nicolas.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.

2016-03-21 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205137#comment-15205137
 ] 

Wei-Chiu Chuang commented on HDFS-9874:
---

Thanks for looking into it. Maybe the NPE is unrelated.
I'm not able to fail the test, it could be an intermittent flaky test.
But in anyway, it would be great if you could improve the test diagnostics 
using {{GenericTestUtils#assertExceptionContains}}. This utility method prints 
the stack trace if the exception message doesn't match the expected value.

> Long living DataXceiver threads cause volume shutdown to block.
> ---
>
> Key: HDFS-9874
> URL: https://issues.apache.org/jira/browse/HDFS-9874
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, 
> HDFS-9874-trunk.patch
>
>
> One of the failed volume shutdown took 3 days to complete.
> Below are the relevant datanode logs while shutting down a volume (due to 
> disk failure)
> {noformat}
> 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing 
> failed volume volumeA/current: 
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not 
> writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized
> at 
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194)
> at 
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing 
> scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23)
> 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting.
> 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b.
> java.io.FileNotFoundException: 
> volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp 
> (Read-only file system)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633)
> 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to 
> delete old dfsUsed file in 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current
> 2016-02-24 16:05:53,286 [Thread-49277] WARN impl.FsDatasetImpl: Failed to 
> write dfsUsed to 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed
> java.io.FileNotFoundException: 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed (Read-only 
> file system)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.saveDfsUsed(BlockPoolSlice.java:247)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.shutdown(BlockPoolSlice.java:698)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:815)
>   at 
> 

[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205136#comment-15205136
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3702:
---

> ... we can also add another flag for fully random distribution.

It seems not a good idea to keep adding flag.  BTW, fully random distribution 
is not the same as uniform distribution.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205111#comment-15205111
 ] 

Andrew Wang commented on HDFS-3702:
---

[~stack] from a downstream perspective, could you comment on the usability of 
providing node lists to this API? This always felt hacky to me, since 
ultimately the NN is the one who knows about the cluster state and DN names and 
block location constraints. My impression was that tracking this in HBase was 
onerous, and is part of why favored nodes fell out of favor.

bq. For example, some application may want to distribute its files uniformly in 
a cluster

The main reason for skew I've seen is the local writer case, which this patch 
attempts to address. It'll still bias to the local rack, but I doubt that'll be 
an issue in practice, and if it is we can also add another flag for fully 
random distribution.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204945#comment-15204945
 ] 

Chris Douglas edited comment on HDFS-9847 at 3/21/16 8:53 PM:
--

The current patch changes {{Configuration::getTimeDuration}} to throw when 
losing precision:
{noformat}
// Configuration.java
+long raw = Long.parseLong(timeStr);
+long converted = unit.convert(raw, vUnit.unit());
+if (vUnit.unit().convert(converted, unit) != raw) {
+  throw new HadoopIllegalArgumentException("Loss of precision converting "
+  + timeStr + vUnit.suffix() + " to " + unit);
 }
-return unit.convert(Long.parseLong(vStr), vUnit.unit());
+return converted;

// TestConfiguration
 Configuration conf = new Configuration(false);
 conf.setTimeDuration("test.time.a", 7L, SECONDS);
 assertEquals("7s", conf.get("test.time.a"));
-assertEquals(0L, conf.getTimeDuration("test.time.a", 30, MINUTES));
{noformat}

This changes the contract. In the current version, the caller determines the 
precision (as detailed in the javadoc). This is correct; the caller knows 
reasonable precision, and can perform checks (e.g., unexpected value of 0) in 
its context. The {{Configuration}} has no context, and providing the expected 
precision as an argument overengineers the interface. If I want to give a 
heartbeat interval in microseconds that's my right as a lunatic, but the caller 
should not throw because it only cares about the value in seconds.

(aside) Why {{HadoopIllegalArgumentException}} instead of 
{{java.lang.IllegalArgumentException}}? Not why is is thrown here, but why does 
it exist?


was (Author: chris.douglas):
The current patch changes {{Configuration::getTimeDuration}} to throw when 
losing precision:
{{noformat}}
// Configuration.java
+long raw = Long.parseLong(timeStr);
+long converted = unit.convert(raw, vUnit.unit());
+if (vUnit.unit().convert(converted, unit) != raw) {
+  throw new HadoopIllegalArgumentException("Loss of precision converting "
+  + timeStr + vUnit.suffix() + " to " + unit);
 }
-return unit.convert(Long.parseLong(vStr), vUnit.unit());
+return converted;

// TestConfiguration
 Configuration conf = new Configuration(false);
 conf.setTimeDuration("test.time.a", 7L, SECONDS);
 assertEquals("7s", conf.get("test.time.a"));
-assertEquals(0L, conf.getTimeDuration("test.time.a", 30, MINUTES));
{{noformat}}

This changes the contract. In the current version, the caller determines the 
precision (as detailed in the javadoc). This is correct; the caller knows 
reasonable precision, and can perform checks (e.g., unexpected value of 0) in 
its context. The {{Configuration}} has no context, and providing the expected 
precision as an argument overengineers the interface. If I want to give a 
heartbeat interval in microseconds that's my right as a lunatic, but the caller 
should not throw because it only cares about the value in seconds.

(aside) Why {{HadoopIllegalArgumentException}} instead of 
{{java.lang.IllegalArgumentException}}? Not why is is thrown here, but why does 
it exist?

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.

2016-03-21 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205092#comment-15205092
 ] 

Rushabh S Shah commented on HDFS-9874:
--

The NPE is expected.
{quote}
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1714)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdownBlockPool(FsDatasetImpl.java:2591)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:1479)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:411)
{quote}
This is getting called while shutting down the cluster.
This is expected since I have triggered only a part of checkDiskError thread.

{code:title=DataNode.java|borderStyle=solid}
 private void checkDiskError() {
Set unhealthyDataDirs = data.checkDataDir();
if (unhealthyDataDirs != null && !unhealthyDataDirs.isEmpty()) {
  try {
// Remove all unhealthy volumes from DataNode.
removeVolumes(unhealthyDataDirs, false);
  } catch (IOException e) { 
LOG.warn("Error occurred when removing unhealthy storage dirs: "
+ e.getMessage(), e);
  }
  StringBuilder sb = new StringBuilder("DataNode failed volumes:");
  for (File dataDir : unhealthyDataDirs) {
sb.append(dataDir.getAbsolutePath() + ";");
  }
  handleDiskError(sb.toString());
}
  }
{code}

I have only called the first line of the above function in the test case since 
I don't want the test case to wait for DataNode#checkDiskErrorInterval (which 
is 5 secs if defualt).
That's why it will not execute  removeVolumes(unhealthyDataDirs, false)
Therefore the NPE.

I am not able to reproduce the test case failing on my local machine on jdk 7 
and jdk8.
[~jojochuang]: Does it fail on your machine ?

> Long living DataXceiver threads cause volume shutdown to block.
> ---
>
> Key: HDFS-9874
> URL: https://issues.apache.org/jira/browse/HDFS-9874
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, 
> HDFS-9874-trunk.patch
>
>
> One of the failed volume shutdown took 3 days to complete.
> Below are the relevant datanode logs while shutting down a volume (due to 
> disk failure)
> {noformat}
> 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing 
> failed volume volumeA/current: 
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not 
> writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized
> at 
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194)
> at 
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing 
> scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23)
> 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting.
> 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b.
> java.io.FileNotFoundException: 
> volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp 
> (Read-only file system)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at 
> 

[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-03-21 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205086#comment-15205086
 ] 

Mingliang Liu commented on HDFS-10175:
--

Thanks for your comment, [~andrew.wang]. I was aware of the thread local 
statistics data structure, and was in favor of following the same approach. The 
new operation map is still per-thread. The ConcurrentHashMap was used because 
when aggregating, we have to make sure the map should not be modified. It's 
functionality is similar to the "volatile" keyword for other primitive 
statistic data.

Anyway, I will revise the code and will update the patch if ConcurrentHashMap 
turns out unnecessary, for the sake of performance. Before that, the next patch 
will firstly resolve the conflicts from trunk because of [HDFS-9579].

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205079#comment-15205079
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3702:
---

> So, the suggestion is changing public signatures to add a new parameter (Or 
> adding a new override where there are already 6)?

For compatibility reason, we probably have to add a new override.  For better 
usability, we may add a Builder.

> For a client to make effective use of disfavoredNodes, they would have to 
> figure the exact name the NN is using and volunteer it in this 
> disfavoredNodes list? Or could they just write 'localhost' and let NN figure 
> it out?

We should support 'localhost' in the API.  DFSClient or NN may replace 
'localhost' with the corresponding name.

> Do you foresee any other use for this disfavoredNodes parameter other than 
> for the exclusion of 'localnode'?

Yes, disfavoredNodes seems useful.  For example, some application may want to 
distribute its files uniformly in a cluster.  Then, it could specify the 
previously allocated DNs as the disfavoredNodes.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-21 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205020#comment-15205020
 ] 

Arpit Agarwal commented on HDFS-9847:
-

The precision check was my idea so I'll try to answer.

bq. The Configuration has no context, and providing the expected precision as 
an argument overengineers the interface.
{{getTimeDuration}} already takes a precision parameter so this patch didn't 
add to the interface. Without the precision check Configuration silently 
changes values without the caller knowing about it. e.g. this test passes today.
{code}
conf.set("xyz", "1500ms");
assertEquals(conf.getTimeDuration("xyz", 0, TimeUnit.SECONDS), 1);
{code}
The alternative is the caller always passes TimeUnit as microseconds (or 
nanoseconds?) But handling backwards compatibility is harder. How will the 
caller differentiate {{1us}} from plain {{1}} without looking at the raw value?

bq. Why HadoopIllegalArgumentException instead of 
java.lang.IllegalArgumentException?
Added by HADOOP-6537, looks like the intent was to differentiate 
IllegalArgumentException thrown by Hadoop vs the JDK.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10189) PacketResponder toString is built incorrectly

2016-03-21 Thread Joe Pallas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Pallas updated HDFS-10189:
--
Attachment: HDFS-10189.patch

> PacketResponder toString is built incorrectly
> -
>
> Key: HDFS-10189
> URL: https://issues.apache.org/jira/browse/HDFS-10189
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Joe Pallas
>Assignee: Joe Pallas
>Priority: Minor
> Attachments: HDFS-10189.patch
>
>
> The constructor for {{BlockReceiver.PacketResponder}} says
> {code}
>   final StringBuilder b = new StringBuilder(getClass().getSimpleName())
>   .append(": ").append(block).append(", type=").append(type);
>   if (type != PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE) {
> b.append(", downstreams=").append(downstreams.length)
> .append(":").append(Arrays.asList(downstreams));
>   }
> {code}
> So it includes the list of downstreams only when it has no downstreams.  The 
> {{if}} test should be for equality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10189) PacketResponder toString is built incorrectly

2016-03-21 Thread Joe Pallas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Pallas updated HDFS-10189:
--
Assignee: Joe Pallas
  Status: Patch Available  (was: Open)

> PacketResponder toString is built incorrectly
> -
>
> Key: HDFS-10189
> URL: https://issues.apache.org/jira/browse/HDFS-10189
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Joe Pallas
>Assignee: Joe Pallas
>Priority: Minor
>
> The constructor for {{BlockReceiver.PacketResponder}} says
> {code}
>   final StringBuilder b = new StringBuilder(getClass().getSimpleName())
>   .append(": ").append(block).append(", type=").append(type);
>   if (type != PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE) {
> b.append(", downstreams=").append(downstreams.length)
> .append(":").append(Arrays.asList(downstreams));
>   }
> {code}
> So it includes the list of downstreams only when it has no downstreams.  The 
> {{if}} test should be for equality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204989#comment-15204989
 ] 

stack commented on HDFS-3702:
-

[~szetszwo] So, the suggestion is changing public signatures to add a new 
parameter (Or adding a new override where there are already 6)? For a client to 
make effective use of disfavoredNodes, they would have to figure the exact name 
the NN is using and volunteer it in this disfavoredNodes list? Or could they 
just write 'localhost' and let NN figure it out? Do you foresee any other use 
for this disfavoredNodes parameter other than for the exclusion of 'localnode'? 
Thanks Nicolas.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.

2016-03-21 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204987#comment-15204987
 ] 

Rushabh S Shah commented on HDFS-9874:
--

[~jojochuang]: Thanks for reporting. Taking a look now.

> Long living DataXceiver threads cause volume shutdown to block.
> ---
>
> Key: HDFS-9874
> URL: https://issues.apache.org/jira/browse/HDFS-9874
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, 
> HDFS-9874-trunk.patch
>
>
> One of the failed volume shutdown took 3 days to complete.
> Below are the relevant datanode logs while shutting down a volume (due to 
> disk failure)
> {noformat}
> 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing 
> failed volume volumeA/current: 
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not 
> writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized
> at 
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194)
> at 
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing 
> scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23)
> 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting.
> 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b.
> java.io.FileNotFoundException: 
> volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp 
> (Read-only file system)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633)
> 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to 
> delete old dfsUsed file in 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current
> 2016-02-24 16:05:53,286 [Thread-49277] WARN impl.FsDatasetImpl: Failed to 
> write dfsUsed to 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed
> java.io.FileNotFoundException: 
> volumeA/current/BP-1788428031-nnIp-1351700107344/current/dfsUsed (Read-only 
> file system)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at java.io.FileOutputStream.(FileOutputStream.java:162)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.saveDfsUsed(BlockPoolSlice.java:247)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.shutdown(BlockPoolSlice.java:698)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:815)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:328)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:250)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)

[jira] [Commented] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers

2016-03-21 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204985#comment-15204985
 ] 

Wei-Chiu Chuang commented on HDFS-10187:


Test failures are all known bugs and unrelated.
The checkstyle warning need to be addressed though.

> Add a "list" refresh handler to list all registered refresh 
> identifiers/handlers
> 
>
> Key: HDFS-10187
> URL: https://issues.apache.org/jira/browse/HDFS-10187
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: commandline, supportability
> Attachments: HDFS-10187.001.patch
>
>
> HADOOP-10376 added a new feature to register handlers for refreshing daemon 
> configurations, because name node properties can be reconfigured without 
> restarting the daemon. This can be a very useful generic interface, but so 
> far no real handlers have been registered using this interface. I added a new 
> 'list' handler to list all registered handlers. My plan is to add more 
> handlers in the future using this interface.
> Another minor fix is return a more explicit error message to the client if a 
> handler is not registered. (It is currently logged at namenode side, but 
> client side only gets a "Failed to get response." message without knowing why)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9874) Long living DataXceiver threads cause volume shutdown to block.

2016-03-21 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204980#comment-15204980
 ] 

Wei-Chiu Chuang commented on HDFS-9874:
---

It seems this patch is buggy.

In a precommit job, this test threw NPE:
https://builds.apache.org/job/PreCommit-HDFS-Build/14881/testReport/org.apache.hadoop.hdfs.server.datanode.fsdataset.impl/TestFsDatasetImpl/testCleanShutdownOfVolume/

Exception in thread "DataNode: 
[[[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/dfs/data/data1/,
 
[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/dfs/data/data2/]]
  heartbeating to localhost/127.0.0.1:39740" java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1714)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdownBlockPool(FsDatasetImpl.java:2591)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:1479)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:411)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:494)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:749)
at java.lang.Thread.run(Thread.java:745)

And the precommit record shows it has been failing continuously for 3 times.

> Long living DataXceiver threads cause volume shutdown to block.
> ---
>
> Key: HDFS-9874
> URL: https://issues.apache.org/jira/browse/HDFS-9874
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-9874-trunk-1.patch, HDFS-9874-trunk-2.patch, 
> HDFS-9874-trunk.patch
>
>
> One of the failed volume shutdown took 3 days to complete.
> Below are the relevant datanode logs while shutting down a volume (due to 
> disk failure)
> {noformat}
> 2016-02-21 10:12:55,333 [Thread-49277] WARN impl.FsDatasetImpl: Removing 
> failed volume volumeA/current: 
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not 
> writable: volumeA/current/BP-1788428031-nnIp-1351700107344/current/finalized
> at 
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:194)
> at 
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:786)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:242)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:2011)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3145)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.access$800(DataNode.java:243)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$7.run(DataNode.java:3178)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-21 10:12:55,334 [Thread-49277] INFO datanode.BlockScanner: Removing 
> scanner for volume volumeA (StorageID DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23)
> 2016-02-21 10:12:55,334 [VolumeScannerThread(volumeA)] INFO 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23) exiting.
> 2016-02-21 10:12:55,335 [VolumeScannerThread(volumeA)] WARN 
> datanode.VolumeScanner: VolumeScanner(volumeA, 
> DS-cd2ea223-bab3-4361-a567-5f3f27a5dd23): error saving 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl@4169ad8b.
> java.io.FileNotFoundException: 
> volumeA/current/BP-1788428031-nnIp-1351700107344/scanner.cursor.tmp 
> (Read-only file system)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl$BlockIteratorImpl.save(FsVolumeImpl.java:669)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.saveBlockIterator(VolumeScanner.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633)
> 2016-02-24 16:05:53,285 [Thread-49277] WARN impl.FsDatasetImpl: Failed to 
> delete old dfsUsed file in 
> 

[jira] [Commented] (HDFS-9405) Warmup NameNode EDEK caches in background thread

2016-03-21 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204977#comment-15204977
 ] 

Xiao Chen commented on HDFS-9405:
-

Thanks Andrew and all for the review! (Sorry for that missed final)

> Warmup NameNode EDEK caches in background thread
> 
>
> Key: HDFS-9405
> URL: https://issues.apache.org/jira/browse/HDFS-9405
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption, namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Xiao Chen
> Fix For: 2.8.0
>
> Attachments: HDFS-9405.01.patch, HDFS-9405.02.patch, 
> HDFS-9405.03.patch, HDFS-9405.04.patch, HDFS-9405.05.patch, 
> HDFS-9405.06.patch, HDFS-9405.07.patch, HDFS-9405.08.patch, 
> HDFS-9405.09.patch, HDFS-9405.10.patch, HDFS-9405.11.patch, 
> HDFS-9405.12.patch, HDFS-9405.13.patch
>
>
> {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation 
> to the key provider, which could be slow or cause timeout. It should be done 
> as a separate thread so as to return a proper error message to the RPC caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204981#comment-15204981
 ] 

stack commented on HDFS-3702:
-

[~arpiagariu] You'd like to purge all of the CreateFlag parameters Arpit? 
CreateFlag seems to be how other filesystems do color on a particular creation 
and this patch was able to make use of it and save changing a bunch of method 
signatures. Seems kinda useful? And seems like we could get more flags on 
CreateFlag down the road (ASYNC?).

bq.  What do you think of per-target block placement policies as proposed in 
this comment e.g. set a custom placement policy for /hbase/.logs/. 

Seems like a grand idea (then and now) being able to do it for a whole class of 
files based-off their location in HDFS. Would this be instead of this patches' 
decoration on CreateFlag? I'd suggest not. I like this patch. It gets us what 
we want nicely. No need for an admin operator to remember to set attributes on 
specific dirs (99% won't); we can just do the code change in hbase (and rip out 
the hacks we have had in place for years now that have been our workaround in 
the absence of this patch).

Thanks



> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10189) PacketResponder toString is built incorrectly

2016-03-21 Thread Joe Pallas (JIRA)
Joe Pallas created HDFS-10189:
-

 Summary: PacketResponder toString is built incorrectly
 Key: HDFS-10189
 URL: https://issues.apache.org/jira/browse/HDFS-10189
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.1
Reporter: Joe Pallas
Priority: Minor


The constructor for {{BlockReceiver.PacketResponder}} says
{code}
  final StringBuilder b = new StringBuilder(getClass().getSimpleName())
  .append(": ").append(block).append(", type=").append(type);
  if (type != PacketResponderType.HAS_DOWNSTREAM_IN_PIPELINE) {
b.append(", downstreams=").append(downstreams.length)
.append(":").append(Arrays.asList(downstreams));
  }
{code}
So it includes the list of downstreams only when it has no downstreams.  The 
{{if}} test should be for equality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9951) Use string constants for XML tags in OfflineImageReconstructor

2016-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204956#comment-15204956
 ] 

Hudson commented on HDFS-9951:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9483 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9483/])
HDFS-9951. Use string constants for XML tags in (cmccabe: rev 
680716f31e120f4d3ee70b095e4db46c05b891d9)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageReconstructor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java


> Use string constants for XML tags in OfflineImageReconstructor
> --
>
> Key: HDFS-9951
> URL: https://issues.apache.org/jira/browse/HDFS-9951
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9551.001.patch, HDFS-9551.002.patch, 
> HDFS-9551.003.patch, HDFS-9551.004.patch
>
>
> In class {{OfflineImageReconstructor}}, it uses many {{SectionProcessors}} to 
> process xml files and load the subtree of the XML into a Node structure. But 
> there are lots of places that node removes key by directively writing value 
> in methods rather than define them first. Like this:
> {code}
> Node expiration = directive.removeChild("expiration");
> {code}
> We could improve this to define them in Node and them invoked like this way:
> {code}
> Node expiration=directive.removeChild(Node.CACHE_MANAGER_SECTION_EXPIRATION);
> {code}
> And it will be good to manager node key's name in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204945#comment-15204945
 ] 

Chris Douglas commented on HDFS-9847:
-

The current patch changes {{Configuration::getTimeDuration}} to throw when 
losing precision:
{{noformat}}
// Configuration.java
+long raw = Long.parseLong(timeStr);
+long converted = unit.convert(raw, vUnit.unit());
+if (vUnit.unit().convert(converted, unit) != raw) {
+  throw new HadoopIllegalArgumentException("Loss of precision converting "
+  + timeStr + vUnit.suffix() + " to " + unit);
 }
-return unit.convert(Long.parseLong(vStr), vUnit.unit());
+return converted;

// TestConfiguration
 Configuration conf = new Configuration(false);
 conf.setTimeDuration("test.time.a", 7L, SECONDS);
 assertEquals("7s", conf.get("test.time.a"));
-assertEquals(0L, conf.getTimeDuration("test.time.a", 30, MINUTES));
{{noformat}}

This changes the contract. In the current version, the caller determines the 
precision (as detailed in the javadoc). This is correct; the caller knows 
reasonable precision, and can perform checks (e.g., unexpected value of 0) in 
its context. The {{Configuration}} has no context, and providing the expected 
precision as an argument overengineers the interface. If I want to give a 
heartbeat interval in microseconds that's my right as a lunatic, but the caller 
should not throw because it only cares about the value in seconds.

(aside) Why {{HadoopIllegalArgumentException}} instead of 
{{java.lang.IllegalArgumentException}}? Not why is is thrown here, but why does 
it exist?

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers

2016-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204949#comment-15204949
 ] 

Hadoop QA commented on HDFS-10187:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 36s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 2s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s 
{color} | {color:red} root: patch generated 6 new + 230 unchanged - 5 fixed = 
236 total (was 235) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 22s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 39s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 32s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 22s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 33s 
{color} | {color:red} Patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 232m 46s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 

[jira] [Updated] (HDFS-9809) Abstract implementation-specific details from the datanode

2016-03-21 Thread Virajith Jalaparti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HDFS-9809:
-
Component/s: fs
 datanode

> Abstract implementation-specific details from the datanode
> --
>
> Key: HDFS-9809
> URL: https://issues.apache.org/jira/browse/HDFS-9809
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode, fs
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-9809.001.patch
>
>
> Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.) 
> implicitly assume that blocks are stored in java.io.File(s) and that volumes 
> are divided into directories. We propose to abstract these details, which 
> would help in supporting other storages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9809) Abstract implementation-specific details from the datanode

2016-03-21 Thread Virajith Jalaparti (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204943#comment-15204943
 ] 

Virajith Jalaparti commented on HDFS-9809:
--

Hi [~zhz], Thanks for the comment. 

One concrete example where a different sub-class of {{ReplicaInfo}} might be 
used is Ozone (HDFS-7240) where the replica can be stored in the underlying 
key-value store. Similar to the {{DatasetSpi}}/{{FsDatasetSpi}}, we could have 
{{ReplicaInfo}}/{{FsReplicaInfo}}. We are still going over the changes proposed 
by HDFS-7240 and in the process of understanding if a different sub-class of 
{{ReplicaInfo}} makes sense there. 

We are currently working on the design document for HDFS-9806 and will post it 
soon (within the week). If it ends up being the case that this work is useful 
only in the context of HDFS-9806, we will make it a subtask. 



> Abstract implementation-specific details from the datanode
> --
>
> Key: HDFS-9809
> URL: https://issues.apache.org/jira/browse/HDFS-9809
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-9809.001.patch
>
>
> Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.) 
> implicitly assume that blocks are stored in java.io.File(s) and that volumes 
> are divided into directories. We propose to abstract these details, which 
> would help in supporting other storages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204925#comment-15204925
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3702:
---

Please do commit the patch yet.  Some idea below.

In DistributedFileSystem, we already have create(..) and append(..) methods to 
support favoredNodes.  How about we also add a new parameter disfavoredNodes?  
It supports a more general API -- we could set disfavoredNodes to one or more 
hosts.  BPP can fallback to these nodes if the other nodes are unavailable.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9951) Use string constants for XML tags in OfflineImageReconstructor

2016-03-21 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-9951:
---
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
Tags: tools
  Status: Resolved  (was: Patch Available)

Committed to 2.8

> Use string constants for XML tags in OfflineImageReconstructor
> --
>
> Key: HDFS-9951
> URL: https://issues.apache.org/jira/browse/HDFS-9951
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9551.001.patch, HDFS-9551.002.patch, 
> HDFS-9551.003.patch, HDFS-9551.004.patch
>
>
> In class {{OfflineImageReconstructor}}, it uses many {{SectionProcessors}} to 
> process xml files and load the subtree of the XML into a Node structure. But 
> there are lots of places that node removes key by directively writing value 
> in methods rather than define them first. Like this:
> {code}
> Node expiration = directive.removeChild("expiration");
> {code}
> We could improve this to define them in Node and them invoked like this way:
> {code}
> Node expiration=directive.removeChild(Node.CACHE_MANAGER_SECTION_EXPIRATION);
> {code}
> And it will be good to manager node key's name in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9951) Use string constants for XML tags in OfflineImageReconstructor

2016-03-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204862#comment-15204862
 ] 

Colin Patrick McCabe commented on HDFS-9951:


Thanks for fixing the patch.  +1

> Use string constants for XML tags in OfflineImageReconstructor
> --
>
> Key: HDFS-9951
> URL: https://issues.apache.org/jira/browse/HDFS-9951
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
> Attachments: HDFS-9551.001.patch, HDFS-9551.002.patch, 
> HDFS-9551.003.patch, HDFS-9551.004.patch
>
>
> In class {{OfflineImageReconstructor}}, it uses many {{SectionProcessors}} to 
> process xml files and load the subtree of the XML into a Node structure. But 
> there are lots of places that node removes key by directively writing value 
> in methods rather than define them first. Like this:
> {code}
> Node expiration = directive.removeChild("expiration");
> {code}
> We could improve this to define them in Node and them invoked like this way:
> {code}
> Node expiration=directive.removeChild(Node.CACHE_MANAGER_SECTION_EXPIRATION);
> {code}
> And it will be good to manager node key's name in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9405) Warmup NameNode EDEK caches in background thread

2016-03-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204863#comment-15204863
 ] 

Hudson commented on HDFS-9405:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9482 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9482/])
HDFS-9405. Warmup NameNode EDEK caches in background thread. Contributed (wang: 
rev e3bb38d62567eafe57d16b78deeba1b71c58e41c)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/ValueQueue.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZonesWithKMS.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/TestValueQueue.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirEncryptionZoneOp.java


> Warmup NameNode EDEK caches in background thread
> 
>
> Key: HDFS-9405
> URL: https://issues.apache.org/jira/browse/HDFS-9405
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption, namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Xiao Chen
> Fix For: 2.8.0
>
> Attachments: HDFS-9405.01.patch, HDFS-9405.02.patch, 
> HDFS-9405.03.patch, HDFS-9405.04.patch, HDFS-9405.05.patch, 
> HDFS-9405.06.patch, HDFS-9405.07.patch, HDFS-9405.08.patch, 
> HDFS-9405.09.patch, HDFS-9405.10.patch, HDFS-9405.11.patch, 
> HDFS-9405.12.patch, HDFS-9405.13.patch
>
>
> {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation 
> to the key provider, which could be slow or cause timeout. It should be done 
> as a separate thread so as to return a proper error message to the RPC caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9405) Warmup NameNode EDEK caches in background thread

2016-03-21 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-9405:
--
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Pushed to trunk, branch-2, branch-2.8! Thanks Xiao for patch and also Arun for 
reviews!

> Warmup NameNode EDEK caches in background thread
> 
>
> Key: HDFS-9405
> URL: https://issues.apache.org/jira/browse/HDFS-9405
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption, namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Xiao Chen
> Fix For: 2.8.0
>
> Attachments: HDFS-9405.01.patch, HDFS-9405.02.patch, 
> HDFS-9405.03.patch, HDFS-9405.04.patch, HDFS-9405.05.patch, 
> HDFS-9405.06.patch, HDFS-9405.07.patch, HDFS-9405.08.patch, 
> HDFS-9405.09.patch, HDFS-9405.10.patch, HDFS-9405.11.patch, 
> HDFS-9405.12.patch, HDFS-9405.13.patch
>
>
> {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation 
> to the key provider, which could be slow or cause timeout. It should be done 
> as a separate thread so as to return a proper error message to the RPC caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9405) Warmup NameNode EDEK caches in background thread

2016-03-21 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204847#comment-15204847
 ] 

Andrew Wang commented on HDFS-9405:
---

One teensy nit is that I think we could make initialDelay and retryInterval 
final, but that's not a blocker. Will commit shortly, thanks Xiao!

> Warmup NameNode EDEK caches in background thread
> 
>
> Key: HDFS-9405
> URL: https://issues.apache.org/jira/browse/HDFS-9405
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption, namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Xiao Chen
> Attachments: HDFS-9405.01.patch, HDFS-9405.02.patch, 
> HDFS-9405.03.patch, HDFS-9405.04.patch, HDFS-9405.05.patch, 
> HDFS-9405.06.patch, HDFS-9405.07.patch, HDFS-9405.08.patch, 
> HDFS-9405.09.patch, HDFS-9405.10.patch, HDFS-9405.11.patch, 
> HDFS-9405.12.patch, HDFS-9405.13.patch
>
>
> {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation 
> to the key provider, which could be slow or cause timeout. It should be done 
> as a separate thread so as to return a proper error message to the RPC caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9917) IBR accumulate more objects when SNN was down for sometime.

2016-03-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204843#comment-15204843
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9917:
---

[~brahmareddy], your proposal on reRegister() sounds great, thanks.

> IBR accumulate more objects when SNN was down for sometime.
> ---
>
> Key: HDFS-9917
> URL: https://issues.apache.org/jira/browse/HDFS-9917
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>
> SNN was down for sometime because of some reasons..After restarting SNN,it 
> became unreponsive because 
> - 29 DN's sending IBR in each 5 million ( most of them are delete IBRs), 
> where as each datanode had only ~2.5 million blocks.
> - GC can't trigger on this objects since all will be under RPC queue. 
> To recover this( to clear this objects) ,restarted all the DN's one by 
> one..This issue happened in 2.4.1 where split of blockreport was not 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7648) Verify that HDFS blocks are in the correct datanode directories

2016-03-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204822#comment-15204822
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7648:
---

> ... I've raised separate sub-task HDFS-10186 to improve logging part. ...

Why it needs a separated sub-task but not just updating the patch here?

> I failed to understand it, could you please provide few more details.

{code}
+// Check whether the actual directory location of block file
+// is block ID-based layout
+File blockDir = DatanodeUtil.idToBlockDir(bpFinalizedDir, blockId);
+File actualBlockDir = files[i].getParentFile();
+if (actualBlockDir.compareTo(blockDir) != 0) {
+  LOG.warn("Block: " + blockId
+  + " has to be upgraded to block ID-based layout");
+}
 report.add(new ScanInfo(blockId, null, files[i], vol));
   }
   continue;
@@ -646,6 +655,14 @@ public ScanInfoPerBlockPool call() throws Exception {
 break;
   }
 }
+// Check whether the actual directory location of block file
+// is block ID-based layout
+File blockDir = DatanodeUtil.idToBlockDir(bpFinalizedDir, blockId);
+File actualBlockDir = blockFile.getParentFile();
+if (actualBlockDir.compareTo(blockDir) != 0) {
+  LOG.warn("Block: " + blockId
+  + " has to be upgraded to block ID-based layout");
+}
 report.add(new ScanInfo(blockId, blockFile, metaFile, vol));
{code}
The two chunks of code above are duplicated.  Please add a helper method to 
avoid the duplication.  Thanks.

> Verify that HDFS blocks are in the correct datanode directories
> ---
>
> Key: HDFS-7648
> URL: https://issues.apache.org/jira/browse/HDFS-7648
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-7648-3.patch, HDFS-7648-4.patch, HDFS-7648-5.patch, 
> HDFS-7648.patch, HDFS-7648.patch
>
>
> HDFS-6482 changed datanode layout to use block ID to determine the directory 
> to store the block.  We should have some mechanism to verify it.  Either 
> DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9616) libhdfs++ Add runtime hooks to allow a client application to add low level monitoring and tests.

2016-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204776#comment-15204776
 ] 

Hadoop QA commented on HDFS-9616:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 38s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
59s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 39s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 32s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 4s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 4s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 51s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794531/HDFS-9616.HDFS-8707.002.patch
 |
| JIRA Issue | HDFS-9616 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 4d8b97f384b4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / 7751507 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_74 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/14882/artifact/patchprocess/whitespace-eol.txt
 |
| JDK v1.7.0_95  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/14882/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/14882/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> libhdfs++ Add runtime hooks to allow a client application to add low level 
> monitoring and tests.
> 

[jira] [Updated] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level

2016-03-21 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HDFS-9579:
--
Fix Version/s: 2.9.0

Committed the branch-2 patch onto branch-2. I am not committing it to 
branch-2.8 because I understand 2.8 needs to be stabilized and I don't think it 
is critical this makes 2.8.0. Do let me know if you feel strongly that this 
should be in 2.8.0.

> Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
> -
>
> Key: HDFS-9579
> URL: https://issues.apache.org/jira/browse/HDFS-9579
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 3.0.0, 2.9.0
>
> Attachments: HDFS-9579-10.patch, HDFS-9579-2.patch, 
> HDFS-9579-3.patch, HDFS-9579-4.patch, HDFS-9579-5.patch, HDFS-9579-6.patch, 
> HDFS-9579-7.patch, HDFS-9579-8.patch, HDFS-9579-9.patch, 
> HDFS-9579-branch-2.patch, HDFS-9579.patch, MR job counters.png
>
>
> For cross DC distcp or other applications, it becomes useful to have insight 
> as to the traffic volume for each network distance to distinguish cross-DC 
> traffic, local-DC-remote-rack, etc.
> FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To 
> provide additional metrics for each network distance, we can add additional 
> metrics to FileSystem level and have {{DFSInputStream}} update the value 
> based on the network distance between client and the datanode.
> {{DFSClient}} will resolve client machine's network location as part of its 
> initialization. It doesn't need to resolve datanode's network location for 
> each read as {{DatanodeInfo}} already has the info.
> There are existing HDFS specific metrics such as {{ReadStatistics}} and 
> {{DFSHedgedReadMetrics}}. But these metrics are only accessible via 
> {{DFSClient}} or {{DFSInputStream}}. Not something that application framework 
> such as MR and Tez can get to. That is the benefit of storing these new 
> metrics in FileSystem.Statistics.
> This jira only includes metrics generation by HDFS. The consumption of these 
> metrics at MR and Tez will be tracked by separated jiras.
> We can add similar metrics for HDFS write scenario later if it is necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204734#comment-15204734
 ] 

Lei (Eddy) Xu commented on HDFS-3702:
-

Hey, Guys

Based on [~stack] and [~andrew.wang]'s +1s, I will commit this by end of day if 
there is no further comment.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level

2016-03-21 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204679#comment-15204679
 ] 

Ming Ma commented on HDFS-9579:
---

The branch-2 version also applies to branch-2.8. Unless others ask for it, skip 
the work for branch-2.7 and branch-2.6 given this is a feature enhancement and 
requires extra effort.

> Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
> -
>
> Key: HDFS-9579
> URL: https://issues.apache.org/jira/browse/HDFS-9579
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 3.0.0
>
> Attachments: HDFS-9579-10.patch, HDFS-9579-2.patch, 
> HDFS-9579-3.patch, HDFS-9579-4.patch, HDFS-9579-5.patch, HDFS-9579-6.patch, 
> HDFS-9579-7.patch, HDFS-9579-8.patch, HDFS-9579-9.patch, 
> HDFS-9579-branch-2.patch, HDFS-9579.patch, MR job counters.png
>
>
> For cross DC distcp or other applications, it becomes useful to have insight 
> as to the traffic volume for each network distance to distinguish cross-DC 
> traffic, local-DC-remote-rack, etc.
> FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To 
> provide additional metrics for each network distance, we can add additional 
> metrics to FileSystem level and have {{DFSInputStream}} update the value 
> based on the network distance between client and the datanode.
> {{DFSClient}} will resolve client machine's network location as part of its 
> initialization. It doesn't need to resolve datanode's network location for 
> each read as {{DatanodeInfo}} already has the info.
> There are existing HDFS specific metrics such as {{ReadStatistics}} and 
> {{DFSHedgedReadMetrics}}. But these metrics are only accessible via 
> {{DFSClient}} or {{DFSInputStream}}. Not something that application framework 
> such as MR and Tez can get to. That is the benefit of storing these new 
> metrics in FileSystem.Statistics.
> This jira only includes metrics generation by HDFS. The consumption of these 
> metrics at MR and Tez will be tracked by separated jiras.
> We can add similar metrics for HDFS write scenario later if it is necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10184) Introduce unit tests framework for HDFS UI

2016-03-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204670#comment-15204670
 ] 

Haohui Mai commented on HDFS-10184:
---

Unless we move to Java 8 this is a non starter. The UI is heavily driven by 
HTML 5. Rhino, the JavaScript engine in Java 7, is at least 10x slower than 
Nashorn and node.js.

I have written a JavaScript version of dfsadmin using Rhino and the performance 
is very unsatisfactory. Given the scope of the tests I'm not fully convinced it 
is a good idea.

I believe that the new Web UI in YARN is adopiting npm / node.js -- these tools 
will be integrated with the current Jenkins workflow so the integration should 
not be an issue.

> Introduce unit tests framework for HDFS UI
> --
>
> Key: HDFS-10184
> URL: https://issues.apache.org/jira/browse/HDFS-10184
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Haohui Mai
>
> The current HDFS UI is based on HTML5 and it does not have unit tests yet. 
> Occasionally things break and we can't catch it. We should investigate and 
> introduce unit test frameworks such as Mocha for the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204661#comment-15204661
 ] 

Lei (Eddy) Xu commented on HDFS-3702:
-

[~arpitagarwal] and [~szetszwo]. Thanks for these useful suggestions.

I had “{{per-block}} block placement hint” and “putting local node into the 
excludedNodess", in patch 001 and 002, respectively. But for the reasons of 
performance concerns and the capability of fallback, as mentioned in the 
previous comments, I changed the patch to the current solution, which re-uses 
the fallback code in BPP. 




> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2043) TestHFlush failing intermittently

2016-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204615#comment-15204615
 ] 

Hadoop QA commented on HDFS-2043:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 52s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 46s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 199m 40s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.server.namenode.TestEditLog |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
|   | hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.TestEncryptionZones |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 

[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-21 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204613#comment-15204613
 ] 

Arpit Agarwal commented on HDFS-9847:
-

Thanks [~linyiqun]. I am +1 for the latest patch with a minor issue - the 
following line should also include the name of the configuration setting.
{code}
1660  throw new HadoopIllegalArgumentException("Loss of precision 
converting "
1661  + timeStr + vUnit.suffix() + " to " + unit);
{code}

I will hold off committing for now in case [~chris.douglas] or 
[~ste...@apache.org] have comments. Steve - I think the only open question was 
whether to add week/year suffixes. I don't have an opinion either way.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9959) add log when block removed from last live datanode

2016-03-21 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204604#comment-15204604
 ] 

yunjiong zhao commented on HDFS-9959:
-

+1 for this.

> add log when block removed from last live datanode
> --
>
> Key: HDFS-9959
> URL: https://issues.apache.org/jira/browse/HDFS-9959
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>Priority: Minor
> Attachments: HDFS-9959.1.patch, HDFS-9959.patch
>
>
> Add logs like "BLOCK* No live nodes contain block blk_1073741825_1001, last 
> datanode contain it is node: 127.0.0.1:65341" in BlockStateChange should help 
> to identify which datanode should be fixed first to recover missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level

2016-03-21 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9579:
--
Attachment: HDFS-9579-branch-2.patch

Thank you [~sjlee0], [~liuml07] and [~cmccabe]! Here is the patch for branch-2 
which passed all tests under hadoop-common-project and hadoop-hdfs-project 
locally.

> Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
> -
>
> Key: HDFS-9579
> URL: https://issues.apache.org/jira/browse/HDFS-9579
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 3.0.0
>
> Attachments: HDFS-9579-10.patch, HDFS-9579-2.patch, 
> HDFS-9579-3.patch, HDFS-9579-4.patch, HDFS-9579-5.patch, HDFS-9579-6.patch, 
> HDFS-9579-7.patch, HDFS-9579-8.patch, HDFS-9579-9.patch, 
> HDFS-9579-branch-2.patch, HDFS-9579.patch, MR job counters.png
>
>
> For cross DC distcp or other applications, it becomes useful to have insight 
> as to the traffic volume for each network distance to distinguish cross-DC 
> traffic, local-DC-remote-rack, etc.
> FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To 
> provide additional metrics for each network distance, we can add additional 
> metrics to FileSystem level and have {{DFSInputStream}} update the value 
> based on the network distance between client and the datanode.
> {{DFSClient}} will resolve client machine's network location as part of its 
> initialization. It doesn't need to resolve datanode's network location for 
> each read as {{DatanodeInfo}} already has the info.
> There are existing HDFS specific metrics such as {{ReadStatistics}} and 
> {{DFSHedgedReadMetrics}}. But these metrics are only accessible via 
> {{DFSClient}} or {{DFSInputStream}}. Not something that application framework 
> such as MR and Tez can get to. That is the benefit of storing these new 
> metrics in FileSystem.Statistics.
> This jira only includes metrics generation by HDFS. The consumption of these 
> metrics at MR and Tez will be tracked by separated jiras.
> We can add similar metrics for HDFS write scenario later if it is necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9616) libhdfs++ Add runtime hooks to allow a client application to add low level monitoring and tests.

2016-03-21 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9616:
-
Assignee: Bob Hansen  (was: James Clampffer)

> libhdfs++ Add runtime hooks to allow a client application to add low level 
> monitoring and tests.
> 
>
> Key: HDFS-9616
> URL: https://issues.apache.org/jira/browse/HDFS-9616
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: Bob Hansen
> Attachments: HDFS-9616.HDFS-8707.002.patch
>
>
> It would be nice to have a set of callable objects and corresponding event 
> hooks in useful places that can be set by a client application at runtime.  
> This is intended to provide a scalable mechanism for implementing counters 
> (#retries, #namenode requests) or application specific testing e.g. simulate 
> a dropped connection when the test system running the client application 
> requests.
> Current implementation plan is a struct full of callbacks (std::functions) 
> owned by the FileSystemImpl.  A callback could be set (or left as a no-op) 
> and when the code hits the corresponding event it will be invoked with a 
> reference to the object (for context) and each method argument by reference.  
> The callback returns a bool: true to continue execution or false to bail out 
> of the calling method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9616) libhdfs++ Add runtime hooks to allow a client application to add low level monitoring and tests.

2016-03-21 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9616:
-
Attachment: HDFS-9616.HDFS-8707.002.patch

Added callbacks that provide context (cluster, file, and event) and a channel 
back to the libhdfspp code (currently can simulate errors if doing a debug 
compile).

> libhdfs++ Add runtime hooks to allow a client application to add low level 
> monitoring and tests.
> 
>
> Key: HDFS-9616
> URL: https://issues.apache.org/jira/browse/HDFS-9616
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-9616.HDFS-8707.002.patch
>
>
> It would be nice to have a set of callable objects and corresponding event 
> hooks in useful places that can be set by a client application at runtime.  
> This is intended to provide a scalable mechanism for implementing counters 
> (#retries, #namenode requests) or application specific testing e.g. simulate 
> a dropped connection when the test system running the client application 
> requests.
> Current implementation plan is a struct full of callbacks (std::functions) 
> owned by the FileSystemImpl.  A callback could be set (or left as a no-op) 
> and when the code hits the corresponding event it will be invoked with a 
> reference to the object (for context) and each method argument by reference.  
> The callback returns a bool: true to continue execution or false to bail out 
> of the calling method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9616) libhdfs++ Add runtime hooks to allow a client application to add low level monitoring and tests.

2016-03-21 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9616:
-
Status: Patch Available  (was: Open)

> libhdfs++ Add runtime hooks to allow a client application to add low level 
> monitoring and tests.
> 
>
> Key: HDFS-9616
> URL: https://issues.apache.org/jira/browse/HDFS-9616
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-9616.HDFS-8707.002.patch
>
>
> It would be nice to have a set of callable objects and corresponding event 
> hooks in useful places that can be set by a client application at runtime.  
> This is intended to provide a scalable mechanism for implementing counters 
> (#retries, #namenode requests) or application specific testing e.g. simulate 
> a dropped connection when the test system running the client application 
> requests.
> Current implementation plan is a struct full of callbacks (std::functions) 
> owned by the FileSystemImpl.  A callback could be set (or left as a no-op) 
> and when the code hits the corresponding event it will be invoked with a 
> reference to the object (for context) and each method argument by reference.  
> The callback returns a bool: true to continue execution or false to bail out 
> of the calling method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10183) Prevent race condition during class initialization

2016-03-21 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204495#comment-15204495
 ] 

Sangjin Lee commented on HDFS-10183:


I agree that JLS makes it clear that a memory barrier is required (by the JVM) 
and is expected from the user standpoint. This is something we should be able 
to rely on safely, or we have a bigger problem. And I don't think there is 
anything special about {{ThreadLocal}}.

I think it is a good idea to make these static variables final for a semantic 
reason and possibly to work around a JVM bug. However, for the record, we 
should be able to rely on any initial values of (non-final) static fields in 
general.

I'm +1 on the patch with that note.

> Prevent race condition during class initialization
> --
>
> Key: HDFS-10183
> URL: https://issues.apache.org/jira/browse/HDFS-10183
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.9.0
>Reporter: Pavel Avgustinov
>Assignee: Pavel Avgustinov
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HADOOP-12944.1.patch, HDFS-10183.2.patch
>
>
> In HADOOP-11969, [~busbey] tracked down a non-deterministic 
> {{NullPointerException}} to an oddity in the Java memory model: When multiple 
> threads trigger the loading of a class at the same time, one of them wins and 
> creates the {{java.lang.Class}} instance; the others block during this 
> initialization, but once it is complete they may obtain a reference to the 
> {{Class}} which has non-{{final}} fields still containing their default (i.e. 
> {{null}}) values. This leads to runtime failures that are hard to debug or 
> diagnose.
> HADOOP-11969 observed that {{ThreadLocal}} fields, by their very nature, are 
> very likely to be accessed from multiple threads, and thus the problem is 
> particularly severe there. Consequently, the patch removed all occurrences of 
> the issue in the code base.
> Unfortunately, since then HDFS-7964 has [reverted one of the fixes during a 
> refactoring|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-0c2e9f7f9e685f38d1a11373b627cfa6R151],
>  and introduced a [new instance of the 
> problem|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-6334d0df7d9aefbccd12b21bb7603169R43].
> The attached patch addresses the issue by adding the missing {{final}} 
> modifier in these two cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers

2016-03-21 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10187:
---
Status: Patch Available  (was: Open)

> Add a "list" refresh handler to list all registered refresh 
> identifiers/handlers
> 
>
> Key: HDFS-10187
> URL: https://issues.apache.org/jira/browse/HDFS-10187
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: commandline, supportability
> Attachments: HDFS-10187.001.patch
>
>
> HADOOP-10376 added a new feature to register handlers for refreshing daemon 
> configurations, because name node properties can be reconfigured without 
> restarting the daemon. This can be a very useful generic interface, but so 
> far no real handlers have been registered using this interface. I added a new 
> 'list' handler to list all registered handlers. My plan is to add more 
> handlers in the future using this interface.
> Another minor fix is return a more explicit error message to the client if a 
> handler is not registered. (It is currently logged at namenode side, but 
> client side only gets a "Failed to get response." message without knowing why)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly

2016-03-21 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204427#comment-15204427
 ] 

John Zhuge commented on HDFS-10185:
---

bq. Yes, my new code is used to check if {{Thread.currentThread().interrupt()}} 
works. Because the original code {{assertTrue(Thread.interrupted())}} of 
checking this is not executed.

Sorry I don't understand, hasn't the unit test of Java Thread package already 
tested it?

If you think the code {{assertTrue(Thread.interrupted())}} can never be 
reached, use {{Assert.fail}}:
{code}
  Thread.currentThread().interrupt();
  try {
stm.hflush();
Assert.fail("Not interrupted as expected");
  } catch (InterruptedIOException ie) {
System.out.println("Got expected exception during flush");
  }
{code}

However, do you think this comment is no longer true?
{quote}
// If we made it past the hflush(), then that means that the ack made 
it back
// from the pipeline before we got to the wait() call. In that case we 
should
// still have interrupted status.
{quote}

> TestHFlushInterrupted verifies interrupt state incorrectly
> --
>
> Key: HDFS-10185
> URL: https://issues.apache.org/jira/browse/HDFS-10185
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10185.001.patch
>
>
> In unit test {{TestHFlush#testHFlushInterrupted}}, there were some places 
> verifying interrupt state incorrectly. As follow:
> {code}
>   Thread.currentThread().interrupt();
>   try {
> stm.hflush();
> // If we made it past the hflush(), then that means that the ack made 
> it back
>// from the pipeline before we got to the wait() call. In that case we 
> should
> // still have interrupted status.
> assertTrue(Thread.interrupted());
>   } catch (InterruptedIOException ie) {
> System.out.println("Got expected exception during flush");
>   }
> {code}
> When stm do the {{hflush}} operation, it will throw interrupted exception and 
> the {{assertTrue(Thread.interrupted())}} will not be execute. And if you put 
> this before the {{hflush}}, this method will clear interrupted state and the 
> expected exception will not be throw. The similar problem also appears after 
> in stm.close.
> So we should use a way to get state without clearing interrupted state like 
> {{Thread.currentThread().isInterrupted()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10188) libhdfs++: Implement debug allocators

2016-03-21 Thread James Clampffer (JIRA)
James Clampffer created HDFS-10188:
--

 Summary: libhdfs++: Implement debug allocators
 Key: HDFS-10188
 URL: https://issues.apache.org/jira/browse/HDFS-10188
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: James Clampffer
Assignee: James Clampffer


I propose implementing a set of memory new/delete pairs with additional 
checking to detect double deletes, read-after-delete, and write-after-deletes 
to help debug resource ownership issues and prevent new ones from entering the 
library.

One of the most common issues we have is use-after-free issues.  The 
continuation pattern makes these really tricky to debug because by the time a 
segsegv is raised the context of what has caused the error is long gone.

The plan is to add allocators that can be turned on that can do the following, 
in order of runtime cost.
1: no-op, forward through to default new/delete
2: memset free'd memory to 0
3: implement operator new with mmap, lock that region of memory once it's been 
deleted; obviously this can't be left to run forever because the memory is 
never unmapped

This should also put some groundwork in place for implementing specialized 
allocators for tiny objects that we churn through like std::string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers

2016-03-21 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10187:
---
Labels: commandline supportability  (was: supportability)

> Add a "list" refresh handler to list all registered refresh 
> identifiers/handlers
> 
>
> Key: HDFS-10187
> URL: https://issues.apache.org/jira/browse/HDFS-10187
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: commandline, supportability
> Attachments: HDFS-10187.001.patch
>
>
> HADOOP-10376 added a new feature to register handlers for refreshing daemon 
> configurations, because name node properties can be reconfigured without 
> restarting the daemon. This can be a very useful generic interface, but so 
> far no real handlers have been registered using this interface. I added a new 
> 'list' handler to list all registered handlers. My plan is to add more 
> handlers in the future using this interface.
> Another minor fix is return a more explicit error message to the client if a 
> handler is not registered. (It is currently logged at namenode side, but 
> client side only gets a "Failed to get response." message without knowing why)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers

2016-03-21 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10187:
---
Attachment: HDFS-10187.001.patch

Rev01: added the list refresh handler, a test case to verify it is added by 
default, command line help/usage, and updated documentation.

> Add a "list" refresh handler to list all registered refresh 
> identifiers/handlers
> 
>
> Key: HDFS-10187
> URL: https://issues.apache.org/jira/browse/HDFS-10187
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Attachments: HDFS-10187.001.patch
>
>
> HADOOP-10376 added a new feature to register handlers for refreshing daemon 
> configurations, because name node properties can be reconfigured without 
> restarting the daemon. This can be a very useful generic interface, but so 
> far no real handlers have been registered using this interface. I added a new 
> 'list' handler to list all registered handlers. My plan is to add more 
> handlers in the future using this interface.
> Another minor fix is return a more explicit error message to the client if a 
> handler is not registered. (It is currently logged at namenode side, but 
> client side only gets a "Failed to get response." message without knowing why)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10187) Add a "list" refresh handler to list all registered refresh identifiers/handlers

2016-03-21 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-10187:
--

 Summary: Add a "list" refresh handler to list all registered 
refresh identifiers/handlers
 Key: HDFS-10187
 URL: https://issues.apache.org/jira/browse/HDFS-10187
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 2.7.2
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang


HADOOP-10376 added a new feature to register handlers for refreshing daemon 
configurations, because name node properties can be reconfigured without 
restarting the daemon. This can be a very useful generic interface, but so far 
no real handlers have been registered using this interface. I added a new 
'list' handler to list all registered handlers. My plan is to add more handlers 
in the future using this interface.

Another minor fix is return a more explicit error message to the client if a 
handler is not registered. (It is currently logged at namenode side, but client 
side only gets a "Failed to get response." message without knowing why)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10183) Prevent race condition during class initialization

2016-03-21 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204345#comment-15204345
 ] 

Daryn Sharp commented on HDFS-10183:


My reading of the JLS and the footnotes seems to pretty clearly indicate memory 
barriers are required.  I suspect HADOOP-11969 discovered a (hopefully fixes) 
jvm bug, so this patch is probably cosmetic but certainly doesn't hurt anything.

> Prevent race condition during class initialization
> --
>
> Key: HDFS-10183
> URL: https://issues.apache.org/jira/browse/HDFS-10183
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.9.0
>Reporter: Pavel Avgustinov
>Assignee: Pavel Avgustinov
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HADOOP-12944.1.patch, HDFS-10183.2.patch
>
>
> In HADOOP-11969, [~busbey] tracked down a non-deterministic 
> {{NullPointerException}} to an oddity in the Java memory model: When multiple 
> threads trigger the loading of a class at the same time, one of them wins and 
> creates the {{java.lang.Class}} instance; the others block during this 
> initialization, but once it is complete they may obtain a reference to the 
> {{Class}} which has non-{{final}} fields still containing their default (i.e. 
> {{null}}) values. This leads to runtime failures that are hard to debug or 
> diagnose.
> HADOOP-11969 observed that {{ThreadLocal}} fields, by their very nature, are 
> very likely to be accessed from multiple threads, and thus the problem is 
> particularly severe there. Consequently, the patch removed all occurrences of 
> the issue in the code base.
> Unfortunately, since then HDFS-7964 has [reverted one of the fixes during a 
> refactoring|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-0c2e9f7f9e685f38d1a11373b627cfa6R151],
>  and introduced a [new instance of the 
> problem|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-6334d0df7d9aefbccd12b21bb7603169R43].
> The attached patch addresses the issue by adding the missing {{final}} 
> modifier in these two cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-2043) TestHFlush failing intermittently

2016-03-21 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-2043:

Attachment: HDFS-2043.002.patch

> TestHFlush failing intermittently
> -
>
> Key: HDFS-2043
> URL: https://issues.apache.org/jira/browse/HDFS-2043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aaron T. Myers
>Assignee: Lin Yiqun
> Attachments: HDFS-2043.002.patch, HDFS.001.patch
>
>
> I can't reproduce this failure reliably, but it seems like TestHFlush has 
> been failing intermittently, with the frequency increasing of late.
> Note the following two pre-commit test runs from different JIRAs where 
> TestHFlush seems to have failed spuriously:
> https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/
> https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2043) TestHFlush failing intermittently

2016-03-21 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204126#comment-15204126
 ] 

Lin Yiqun commented on HDFS-2043:
-

As [~kihwal] pointed out, it seems to be an actual race. The two exception logs 
info in HDFS-10181 also indicated that the stream has been executed other 
operations when the {{DataStreamer.run}} was did. Update a patch to catch the 
potential exceptions in stream writing operations and add some chances to 
retry. Assigned this jira to me, pending jenkins.

> TestHFlush failing intermittently
> -
>
> Key: HDFS-2043
> URL: https://issues.apache.org/jira/browse/HDFS-2043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aaron T. Myers
> Attachments: HDFS-2043.002.patch, HDFS.001.patch
>
>
> I can't reproduce this failure reliably, but it seems like TestHFlush has 
> been failing intermittently, with the frequency increasing of late.
> Note the following two pre-commit test runs from different JIRAs where 
> TestHFlush seems to have failed spuriously:
> https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/
> https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-2043) TestHFlush failing intermittently

2016-03-21 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-2043:

Status: Patch Available  (was: Open)

> TestHFlush failing intermittently
> -
>
> Key: HDFS-2043
> URL: https://issues.apache.org/jira/browse/HDFS-2043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aaron T. Myers
>Assignee: Lin Yiqun
> Attachments: HDFS-2043.002.patch, HDFS.001.patch
>
>
> I can't reproduce this failure reliably, but it seems like TestHFlush has 
> been failing intermittently, with the frequency increasing of late.
> Note the following two pre-commit test runs from different JIRAs where 
> TestHFlush seems to have failed spuriously:
> https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/
> https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-2043) TestHFlush failing intermittently

2016-03-21 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun reassigned HDFS-2043:
---

Assignee: Lin Yiqun

> TestHFlush failing intermittently
> -
>
> Key: HDFS-2043
> URL: https://issues.apache.org/jira/browse/HDFS-2043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aaron T. Myers
>Assignee: Lin Yiqun
> Attachments: HDFS-2043.002.patch, HDFS.001.patch
>
>
> I can't reproduce this failure reliably, but it seems like TestHFlush has 
> been failing intermittently, with the frequency increasing of late.
> Note the following two pre-commit test runs from different JIRAs where 
> TestHFlush seems to have failed spuriously:
> https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/
> https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10184) Introduce unit tests framework for HDFS UI

2016-03-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204086#comment-15204086
 ] 

Steve Loughran commented on HDFS-10184:
---

can  you see if you can get away with HtmlUnit first? That uses the JVM's own 
JS engine & so doesn't need the browser —it'll be able to run under Jenkins. As 
it's designed to run under JUnit, it'll be maintainable by the current set of 
java developers, without anyone having to learn node.js 

> Introduce unit tests framework for HDFS UI
> --
>
> Key: HDFS-10184
> URL: https://issues.apache.org/jira/browse/HDFS-10184
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Haohui Mai
>
> The current HDFS UI is based on HTML5 and it does not have unit tests yet. 
> Occasionally things break and we can't catch it. We should investigate and 
> introduce unit test frameworks such as Mocha for the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10186) DirectoryScanner: Improve logs by adding full path of both actual and expected block directories

2016-03-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203998#comment-15203998
 ] 

Kai Zheng commented on HDFS-10186:
--

The patch LGTM. Thanks [~rakeshr] for the nice work.

> DirectoryScanner: Improve logs by adding full path of both actual and 
> expected block directories
> 
>
> Key: HDFS-10186
> URL: https://issues.apache.org/jira/browse/HDFS-10186
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Minor
> Attachments: HDFS-10186-001.patch
>
>
> As per the 
> [discussion|https://issues.apache.org/jira/browse/HDFS-7648?focusedCommentId=15195908=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15195908],
>  this jira is to improve directory scanner log by adding the wrong and 
> correct directory path so that admins can take necessary actions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly

2016-03-21 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203975#comment-15203975
 ] 

Lin Yiqun commented on HDFS-10185:
--

The failed reason of unit test {{hadoop.hdfs.TestHFlush }} is not related by 
the change in patch. And I tested it in local, the result is good.

> TestHFlushInterrupted verifies interrupt state incorrectly
> --
>
> Key: HDFS-10185
> URL: https://issues.apache.org/jira/browse/HDFS-10185
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10185.001.patch
>
>
> In unit test {{TestHFlush#testHFlushInterrupted}}, there were some places 
> verifying interrupt state incorrectly. As follow:
> {code}
>   Thread.currentThread().interrupt();
>   try {
> stm.hflush();
> // If we made it past the hflush(), then that means that the ack made 
> it back
>// from the pipeline before we got to the wait() call. In that case we 
> should
> // still have interrupted status.
> assertTrue(Thread.interrupted());
>   } catch (InterruptedIOException ie) {
> System.out.println("Got expected exception during flush");
>   }
> {code}
> When stm do the {{hflush}} operation, it will throw interrupted exception and 
> the {{assertTrue(Thread.interrupted())}} will not be execute. And if you put 
> this before the {{hflush}}, this method will clear interrupted state and the 
> expected exception will not be throw. The similar problem also appears after 
> in stm.close.
> So we should use a way to get state without clearing interrupted state like 
> {{Thread.currentThread().isInterrupted()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9555) LazyPersistFileScrubber should still sleep if there are errors in the clear progress

2016-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203881#comment-15203881
 ] 

Hadoop QA commented on HDFS-9555:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 33s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 18s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 143m 14s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777436/9555-v1.patch |
| JIRA Issue | HDFS-9555 |
| Optional Tests |  asflicense  

[jira] [Commented] (HDFS-6489) DFS Used space is not correct computed on frequent append operations

2016-03-21 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203879#comment-15203879
 ] 

Weiwei Yang commented on HDFS-6489:
---

This issue happened to us as well. We have some clients continually append data 
to HDFS, this causes the dfs usage jump up to a very high value and gives 
insufficient disk space issue just like this. It looks like HDFS refreshes the 
space usage in a time interval (with property fs.du.interval), default value is 
60, 10 minutes ... that means the dfs usage will be inaccurate and causing 
a lot of operations fail in 10 minutes (even client did close the streams). We 
really should have this fixed.

> DFS Used space is not correct computed on frequent append operations
> 
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: stanley shi
> Attachments: HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space 
> on each block write operation. This is correct in most scenario (create new 
> file), but sometimes it will behave in-correct(append small data to a large 
> block).
> For example, I have a file with only one block(say, 60M). Then I try to 
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the 
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large 
> number of files (1000+), assume the block size is 32M (half of the default 
> value), then the dfs used will be increased 1000*32M = 32G on each append to 
> the files; but actually I only write 10K bytes; this will cause the datanode 
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock  
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received 
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: 
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, 
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sda3  16G  2.9G   13G  20% /
> tmpfs 1.9G   72K  1.9G   1% /dev/shm
> /dev/sda1  97M   32M   61M  35% /boot
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10186) DirectoryScanner: Improve logs by adding full path of both actual and expected block directories

2016-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203867#comment-15203867
 ] 

Hadoop QA commented on HDFS-10186:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 12s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 50s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 45s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.TestGetFileChecksum |
| JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.TestHFlush |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794472/HDFS-10186-001.patch |
| JIRA Issue | HDFS-10186 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 

[jira] [Commented] (HDFS-10185) TestHFlushInterrupted verifies interrupt state incorrectly

2016-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203857#comment-15203857
 ] 

Hadoop QA commented on HDFS-10185:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 45s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 16s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 163m 38s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | hadoop.hdfs.TestHFlush |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794463/HDFS-10185.001.patch |
| JIRA Issue | HDFS-10185 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux def2ab8f11bf 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / ed1e23f |
| Default Java