[jira] [Comment Edited] (HDFS-8161) Both Namenodes are in standby State

2016-04-15 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243980#comment-15243980
 ] 

Harsh J edited comment on HDFS-8161 at 4/16/16 3:26 AM:


[~brahmareddy] - was this encountered on virtual machine hosts, or physical 
ones? Asking because 
https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-data-to-mesos-kubernetes-docker-containers-4986f88f7a19#.v3hx212ne
 (H/T [~daisuke.kobayashi])


was (Author: qwertymaniac):
[~brahmareddy] - was this encountered on virtual machine hosts, or physical 
ones? Asking because 
https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-data-to-mesos-kubernetes-docker-containers-4986f88f7a19#.v3hx212ne

> Both Namenodes are in standby State
> ---
>
> Key: HDFS-8161
> URL: https://issues.apache.org/jira/browse/HDFS-8161
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Affects Versions: 2.6.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: ACTIVEBreadcumb and StandbyElector.txt
>
>
> Suspected Scenario:
> 
> Start cluster with three Nodes.
> Reboot Machine where ZKFC is not running..( Here Active Node ZKFC should open 
> session with this ZK )
> Now  ZKFC ( Active NN's ) session expire and try re-establish connection with 
> another ZK...Bythe time  ZKFC ( StndBy NN's ) will try to fence old active 
> and create the active Breadcrumb and Makes SNN to active state..
> But immediately it fence to standby state.. ( Here is the doubt)
> Hence both will be in standby state..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8161) Both Namenodes are in standby State

2016-04-15 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243980#comment-15243980
 ] 

Harsh J commented on HDFS-8161:
---

[~brahmareddy] - was this encountered on virtual machine hosts, or physical 
ones? Asking because 
https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-data-to-mesos-kubernetes-docker-containers-4986f88f7a19#.v3hx212ne

> Both Namenodes are in standby State
> ---
>
> Key: HDFS-8161
> URL: https://issues.apache.org/jira/browse/HDFS-8161
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Affects Versions: 2.6.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: ACTIVEBreadcumb and StandbyElector.txt
>
>
> Suspected Scenario:
> 
> Start cluster with three Nodes.
> Reboot Machine where ZKFC is not running..( Here Active Node ZKFC should open 
> session with this ZK )
> Now  ZKFC ( Active NN's ) session expire and try re-establish connection with 
> another ZK...Bythe time  ZKFC ( StndBy NN's ) will try to fence old active 
> and create the active Breadcrumb and Makes SNN to active state..
> But immediately it fence to standby state.. ( Here is the doubt)
> Hence both will be in standby state..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10284) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode fails intermittently

2016-04-15 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243977#comment-15243977
 ] 

Brahma Reddy Battula commented on HDFS-10284:
-

[~liuml07] thanks for reporting this. Yes, we should separate out...will look 
into this

> o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode 
> fails intermittently
> -
>
> Key: HDFS-10284
> URL: https://issues.apache.org/jira/browse/HDFS-10284
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Minor
> Attachments: HDFS-10284.000.patch, HDFS-10284.001.patch
>
>
> *Stacktrace*
> {code}
> org.mockito.exceptions.misusing.UnfinishedStubbingException: 
> Unfinished stubbing detected here:
> -> at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169)
> E.g. thenReturn() may be missing.
> Examples of correct stubbing:
> when(mock.isOk()).thenReturn(true);
> when(mock.isOk()).thenThrow(exception);
> doThrow(exception).when(mock).someVoidMethod();
> Hints:
>  1. missing thenReturn()
>  2. although stubbed methods may return mocks, you cannot inline mock 
> creation (mock()) call inside a thenReturn method (see issue 53)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:169)
> {code}
> Sample failing pre-commit UT: 
> https://builds.apache.org/job/PreCommit-HDFS-Build/15153/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-15 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243970#comment-15243970
 ] 

Mingliang Liu commented on HDFS-10175:
--

Thanks [~cmccabe] very much for the discussion.

# The v5 patch is to make the detailed statistics optional. As the statistics 
is shared among file system objects, it's hard to enable/disable it using 
per-file-system config key. The patch added a new API to {{Statistics}} to 
enable/disable this feature according to the tradeoff between reduced cost and 
detailed per-op counter. The extra overhead should be avoided if the enum map 
is not constructed.
# I filed [HADOOP-13031] to track the discussion and effort of refactoring the 
code that maintains rack-aware counters. Specially, I also think it's not good 
to expose the internal composite data structure of distance-aware bytes read. 
Those use cases that iterate all the distances will call 
{{getBytesReadByDistance(int distance)}} multiple times, which internally will 
trigger the aggregation among all threads statistics data multiple times. To 
address this, perhaps they can use the {{getData()}} to get all the statistics 
data at once. I reviewed the current patch iof [MAPREDUCE-6660] which employs 
the bytes-read-by-distance, and found it used the {{getData()}} as I expected.
# Based on the current FileSystem design, many of which are HDFS specific, we 
see no better choice than putting them in FileSystem$Statistics, for supporting 
either distance-aware read counters (HDFS-specific) or per-operation-counters 
(many of which are HDFS specific). By now, when the detailed statistics are 
missing (e.g. {{S3AFileSystem#append()}}), we treat it as zero. If some 
operations' statistics are different, they can update the statistics 
accordingly (e.g. {{S3AFileSystem#rename}}) as the counters are populated in 
concrete file system operations. Another point is that, for existing 
{{readOps/writeOps}} counters, we also have similar scenario (and challenges). 
Will file follow-up jiras if we have specific cases to handle.
# I created a new jira [HADOOP-13032] to track the effort of moving the 
{{Statistics}} class out of {{FileSystem}} for shorter source code and simpler 
class structure, thought incompatible change.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-15 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10175:
-
Attachment: HDFS-10175.005.patch

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243927#comment-15243927
 ] 

Hadoop QA commented on HDFS-8986:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 49s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 45s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s 
{color} | {color:red} root: patch generated 9 new + 177 unchanged - 0 fixed = 
186 total (was 177) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
53s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 55s 
{color} | {color:red} hadoop-common-project_hadoop-common-jdk1.8.0_77 with JDK 
v1.8.0_77 generated 12 new + 1 unchanged - 0 fixed = 13 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 39s 
{color} | {color:red} hadoop-common-project_hadoop-common-jdk1.7.0_95 with JDK 
v1.7.0_95 generated 12 new + 13 unchanged - 0 fixed = 25 total (was 13) {color} 
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 24s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | 

[jira] [Commented] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243887#comment-15243887
 ] 

Hadoop QA commented on HDFS-10207:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 35s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 8s 
{color} | {color:red} root: patch generated 7 new + 417 unchanged - 0 fixed = 
424 total (was 417) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 46s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 41s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 55s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 28s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 143m 33s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK 

[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-15 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243878#comment-15243878
 ] 

Lei (Eddy) Xu commented on HDFS-9543:
-

Hi, [~anu] Thanks for the patches

A few comments:

Could you put {{getNextBlock()}} logic into a separate Iterator, and make it 
{{Closable}}, which will include {{getBlockToCopy(), openPoolIters(), 
getNextBlock(), closePoolIters()}}. There are a few draw backs of separating 
them into different functions.  1) The states (i.e. poolIndex,) are stored 
outside these functions, the caller needs maintain these states.  2) 
{{poolIndex}} is never initialized and is not able be reset. 

{code}
  } catch (IOException e) {
item.incErrorCount();
 }
{code}
Please always log the IOEs.  And I think it is better to throw {{IOE}} here as 
well as many other places.

{code}
private void openPoolIters();
{code}
Can it be a {{private static List openPoolIters()}}?

{code}
 // Check for the max error count constraint.
if (item.getErrorCount() > getMaxError(item)) {
LOG.error("Exceeded the max error count. source {}, dest: {} " +
 "error count: {}", source.getBasePath(), 
dest.getBasePath(),
item.getErrorCount());
this.setExitFlag();
continue;
}
{code}

In a few such places, should we actually {{break}} the while loop? Wouldn't 
{{continue}} here just generate a lot of LOGS and spend CPU cycles?


Why do you need to change {{float}} to {{double}}. In this case, wouldn't 
{{float}} good enough ? I think a {{5%}} of errors are OK for these tasks.

Thanks very much!

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch
>
>
> This patch adds the actual mover logic to the datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10289) Balancer configures DNs directly

2016-04-15 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243816#comment-15243816
 ] 

Ravi Prakash commented on HDFS-10289:
-

Fair enough! Thanks for the responses Kihwal and Ming! I do recognize that this 
particular JIRA is a little bit orthogonal, so I welcome all the improvements 
intended. 

> Balancer configures DNs directly
> 
>
> Key: HDFS-10289
> URL: https://issues.apache.org/jira/browse/HDFS-10289
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Critical
>
> Balancer directly configures the 2 balance-related properties 
> (bandwidthPerSec and concurrentMoves) on the DNs involved.
> Details:
> * Before each balancing iteration, set the properties on all DNs involved in 
> the current iteration.
> * The DN property changes will not survive restart.
> * Balancer gets the property values from command line or its config file.
> * Need new DN APIs to query and set the 2 properties.
> * No need to edit the config file on each DN or run {{hdfs dfsadmin 
> -setBalancerBandwidth}} to configure every DN in the cluster.
> Pros:
> * Improve ease of use because all configurations are done at one place, the 
> balancer. We saw many customers often forgot to set concurrentMoves properly 
> since it is required on both DN and Balancer.
> * Support new DNs added between iterations
> * Handle DN restarts between iterations
> * May be able to dynamically adjust the thresholds in different iterations. 
> Don't know how useful though.
> Cons:
> * New DN property API
> * A malicious/misconfigured balancer may overwhelm DNs. {{hdfs dfsadmin 
> -setBalancerBandwidth}} has the same issue. Also Balancer can only be run by 
> admin.
> Questions:
> * Can we create {{BalancerConcurrentMovesCommand}} similar to 
> {{BalancerBandwidthCommand}}? Can Balancer use them directly without going 
> through NN?
> One proposal to implement HDFS-7466 calls for an API to query DN properties. 
> DN Conf Servlet returns all config properties. It does not return individual 
> property and it does not return the value set by {{hdfs dfsadmin 
> -setBalancerBandwidth}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10299) libhdfs++: File length doesn't always going the last block if it's being written to

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243771#comment-15243771
 ] 

Hadoop QA commented on HDFS-10299:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
33s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 44s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 50s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 1s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 38s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 36s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 42s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12799019/HDFS-10299.HDFS-8707.000.patch
 |
| JIRA Issue | HDFS-10299 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux b8c951f9f489 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / 0828600 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_77 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| JDK v1.7.0_95  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15174/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15174/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> libhdfs++: File length doesn't always going the last block if it's being 
> written to
> ---
>
> Key: HDFS-10299
> URL: 

[jira] [Commented] (HDFS-10289) Balancer configures DNs directly

2016-04-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243740#comment-15243740
 ] 

Ming Ma commented on HDFS-10289:


There was some discussion about moving balancer into namenode in 
https://issues.apache.org/jira/browse/HDFS-1431. Maybe we can address the issue 
[~kihwal] brought up, say to have some lightweight balancer CLI send NN  
balancer commands via RPC. Even with that there will be complexity to move 
balancer into namenode. Thus we would like to try out the "block movement 
scheduling inside NN" idea as part of the migrator HDFS-8789 [~ctrezzo] is 
working on.

> Balancer configures DNs directly
> 
>
> Key: HDFS-10289
> URL: https://issues.apache.org/jira/browse/HDFS-10289
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Critical
>
> Balancer directly configures the 2 balance-related properties 
> (bandwidthPerSec and concurrentMoves) on the DNs involved.
> Details:
> * Before each balancing iteration, set the properties on all DNs involved in 
> the current iteration.
> * The DN property changes will not survive restart.
> * Balancer gets the property values from command line or its config file.
> * Need new DN APIs to query and set the 2 properties.
> * No need to edit the config file on each DN or run {{hdfs dfsadmin 
> -setBalancerBandwidth}} to configure every DN in the cluster.
> Pros:
> * Improve ease of use because all configurations are done at one place, the 
> balancer. We saw many customers often forgot to set concurrentMoves properly 
> since it is required on both DN and Balancer.
> * Support new DNs added between iterations
> * Handle DN restarts between iterations
> * May be able to dynamically adjust the thresholds in different iterations. 
> Don't know how useful though.
> Cons:
> * New DN property API
> * A malicious/misconfigured balancer may overwhelm DNs. {{hdfs dfsadmin 
> -setBalancerBandwidth}} has the same issue. Also Balancer can only be run by 
> admin.
> Questions:
> * Can we create {{BalancerConcurrentMovesCommand}} similar to 
> {{BalancerBandwidthCommand}}? Can Balancer use them directly without going 
> through NN?
> One proposal to implement HDFS-7466 calls for an API to query DN properties. 
> DN Conf Servlet returns all config properties. It does not return individual 
> property and it does not return the value set by {{hdfs dfsadmin 
> -setBalancerBandwidth}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10208) Addendum for HDFS-9579: to handle the case when client machine can't resolve network path

2016-04-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243738#comment-15243738
 ] 

Ming Ma commented on HDFS-10208:


TestReadStripedFileWithDecoding and TestWriteReadStripedFile aren't related. 
All other tests passed locally.

> Addendum for HDFS-9579: to handle the case when client machine can't resolve 
> network path
> -
>
> Key: HDFS-10208
> URL: https://issues.apache.org/jira/browse/HDFS-10208
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-10208-2.patch, HDFS-10208-3.patch, 
> HDFS-10208-4.patch, HDFS-10208-5.patch, HDFS-10208.patch
>
>
> If DFSClient runs on a machine that can't resolve network path, 
> {{DNSToSwitchMapping}} will return {{DEFAULT_RACK}}. In addition, if somehow 
> {{dnsToSwitchMapping.resolve}} returns null, that will cause exception when 
> it tries to create {{clientNode}}. In either case, there is no need to create 
> {{clientNode}} and we should treat its network distance with any datanode as 
> Integer.MAX_VALUE.
> {noformat}
> clientNode = new NodeBase(clientHostName,
> dnsToSwitchMapping.resolve(nodes).get(0));
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-04-15 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243716#comment-15243716
 ] 

Xiaobing Zhou commented on HDFS-10207:
--

[~xyao] thanks for review. I posted patch v004 that addressed your comments.

> Support enable Hadoop IPC backoff without namenode restart
> --
>
> Key: HDFS-10207
> URL: https://issues.apache.org/jira/browse/HDFS-10207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10207-HDFS-9000.000.patch, 
> HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch, 
> HDFS-10207-HDFS-9000.003.patch, HDFS-10207-HDFS-9000.004.patch
>
>
> It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a 
> namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-04-15 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10207:
-
Attachment: HDFS-10207-HDFS-9000.004.patch

> Support enable Hadoop IPC backoff without namenode restart
> --
>
> Key: HDFS-10207
> URL: https://issues.apache.org/jira/browse/HDFS-10207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10207-HDFS-9000.000.patch, 
> HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch, 
> HDFS-10207-HDFS-9000.003.patch, HDFS-10207-HDFS-9000.004.patch
>
>
> It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a 
> namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-04-15 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-8986:

Target Version/s: 2.8.0

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Xiao Chen
> Attachments: HDFS-8986.01.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8986) Add option to -du to calculate directory space usage excluding snapshots

2016-04-15 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-8986:

Status: Patch Available  (was: Open)

> Add option to -du to calculate directory space usage excluding snapshots
> 
>
> Key: HDFS-8986
> URL: https://issues.apache.org/jira/browse/HDFS-8986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Gautam Gopalakrishnan
>Assignee: Xiao Chen
> Attachments: HDFS-8986.01.patch
>
>
> When running {{hadoop fs -du}} on a snapshotted directory (or one of its 
> children), the report includes space consumed by blocks that are only present 
> in the snapshots. This is confusing for end users.
> {noformat}
> $  hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -createSnapshot /tmp/parent snap1
> Created snapshot /tmp/parent/.snapshot/snap1
> $ hadoop fs -rm -skipTrash /tmp/parent/sub1/*
> ...
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 799.7 M  2.3 G  /tmp/parent
> 799.7 M  2.3 G  /tmp/parent/sub1
> $ hdfs dfs -deleteSnapshot /tmp/parent snap1
> $ hadoop fs -du -h -s /tmp/parent /tmp/parent/*
> 0  0  /tmp/parent
> 0  0  /tmp/parent/sub1
> {noformat}
> It would be helpful if we had a flag, say -X, to exclude any snapshot related 
> disk usage in the output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243691#comment-15243691
 ] 

Hadoop QA commented on HDFS-10297:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 
393 unchanged - 2 fixed = 395 total (was 395) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 42s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 18s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 135m 38s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.namenode.TestEditLog |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  

[jira] [Commented] (HDFS-9016) Display upgrade domain information in fsck

2016-04-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243689#comment-15243689
 ] 

Ming Ma commented on HDFS-9016:
---

TestReadStripedFileWithDecoding and TestWriteReadStripedFile aren't related. 
All other tests passed locally.

> Display upgrade domain information in fsck
> --
>
> Key: HDFS-9016
> URL: https://issues.apache.org/jira/browse/HDFS-9016
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9016.patch
>
>
> This will make it easy for people to use fsck to check block placement when 
> upgrade domain is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10299) libhdfs++: File length doesn't always going the last block if it's being written to

2016-04-15 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Zhu updated HDFS-10299:
---
Attachment: HDFS-10299.HDFS-8707.000.patch

> libhdfs++: File length doesn't always going the last block if it's being 
> written to
> ---
>
> Key: HDFS-10299
> URL: https://issues.apache.org/jira/browse/HDFS-10299
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: Xiaowei Zhu
> Attachments: HDFS-10299.HDFS-8707.000.patch
>
>
> It looks like we aren't factoring in the last block of files that are being 
> written to or haven't been closed yet into the length of the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10299) libhdfs++: File length doesn't always going the last block if it's being written to

2016-04-15 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Zhu updated HDFS-10299:
---
Status: Patch Available  (was: In Progress)

> libhdfs++: File length doesn't always going the last block if it's being 
> written to
> ---
>
> Key: HDFS-10299
> URL: https://issues.apache.org/jira/browse/HDFS-10299
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: Xiaowei Zhu
> Attachments: HDFS-10299.HDFS-8707.000.patch
>
>
> It looks like we aren't factoring in the last block of files that are being 
> written to or haven't been closed yet into the length of the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9670) DistCp throws NPE when source is root

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243675#comment-15243675
 ] 

Hadoop QA commented on HDFS-9670:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} hadoop-tools/hadoop-distcp: patch generated 0 new + 
18 unchanged - 1 fixed = 18 total (was 19) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 32s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 25s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 33s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12799010/HDFS-9670.002.patch |
| JIRA Issue | HDFS-9670 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux db40902e1ce5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 69f3d42 |
| Default Java | 

[jira] [Updated] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-15 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9543:
---
Description: This patch adds the actual mover logic to the datanode.  (was: 
This patch adds the RPCs and mover logic that allows data to be moved from one 
storage partition to another)

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch
>
>
> This patch adds the actual mover logic to the datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-10299) libhdfs++: File length doesn't always going the last block if it's being written to

2016-04-15 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-10299 started by Xiaowei Zhu.
--
> libhdfs++: File length doesn't always going the last block if it's being 
> written to
> ---
>
> Key: HDFS-10299
> URL: https://issues.apache.org/jira/browse/HDFS-10299
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: Xiaowei Zhu
>
> It looks like we aren't factoring in the last block of files that are being 
> written to or haven't been closed yet into the length of the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-15 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243672#comment-15243672
 ] 

Anu Engineer commented on HDFS-9543:


Test failures are not related to this patch.

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch
>
>
> This patch adds the RPCs and mover logic that allows data to be moved from 
> one storage partition to another



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243664#comment-15243664
 ] 

Hadoop QA commented on HDFS-9543:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
50s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s 
{color} | {color:green} HDFS-1312 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s 
{color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
45s {color} | {color:green} HDFS-1312 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} HDFS-1312 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 43s 
{color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: patch generated 0 
new + 180 unchanged - 1 fixed = 180 total (was 181) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 13s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 23s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 32s 
{color} | {color:red} Patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 119m 6s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.TestFileAppend |
|   | hadoop.hdfs.tools.TestDFSAdmin |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
| JDK v1.8.0_77 

[jira] [Commented] (HDFS-10300) TestDistCpSystem should share MiniDFSCluster

2016-04-15 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243650#comment-15243650
 ] 

John Zhuge commented on HDFS-10300:
---

It is the old style of JUnit 3. Switch to JUnit 4 annotation style.

> TestDistCpSystem should share MiniDFSCluster
> 
>
> Key: HDFS-10300
> URL: https://issues.apache.org/jira/browse/HDFS-10300
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Trivial
>
> The test cases in this class should share MiniDFSCluster if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10300) TestDistCpSystem should share MiniDFSCluster

2016-04-15 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243647#comment-15243647
 ] 

John Zhuge commented on HDFS-10300:
---

Do not understand why {{TestDistCpSystem}} extends {{TestCase}}.

> TestDistCpSystem should share MiniDFSCluster
> 
>
> Key: HDFS-10300
> URL: https://issues.apache.org/jira/browse/HDFS-10300
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Trivial
>
> The test cases in this class should share MiniDFSCluster if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9670) DistCp throws NPE when source is root

2016-04-15 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243556#comment-15243556
 ] 

John Zhuge commented on HDFS-9670:
--

Created HDFS-10300 "TestDistCpSystem should share MiniDFSCluster".

> DistCp throws NPE when source is root
> -
>
> Key: HDFS-9670
> URL: https://issues.apache.org/jira/browse/HDFS-9670
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9670.001.patch, HDFS-9670.002.patch
>
>
> Symptom:
> {quote}
> [root@vb0724 ~]# hadoop distcp hdfs://X:8020/ hdfs://Y:8020/
> 16/01/20 11:33:33 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[hdfs://X:8020/], targetPath=hdfs://Y:8020/, 
> targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
> 16/01/20 11:33:33 INFO client.RMProxy: Connecting to ResourceManager at Z:8032
> 16/01/20 11:33:33 ERROR tools.DistCp: Exception encountered 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:598)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListingRoot(SimpleCopyListing.java:583)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:313)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
> {quote}
> Relevant code:
> {code}
>   private Path computeSourceRootPath(FileStatus sourceStatus,
>  DistCpOptions options) throws 
> IOException {
> Path target = options.getTargetPath();
> FileSystem targetFS = target.getFileSystem(getConf());
> final boolean targetPathExists = options.getTargetPathExists();
> boolean solitaryFile = options.getSourcePaths().size() == 1
> && 
> !sourceStatus.isDirectory();
> if (solitaryFile) {
>   if (targetFS.isFile(target) || !targetPathExists) {
> return sourceStatus.getPath();
>   } else {
> return sourceStatus.getPath().getParent();
>   }
> } else {
>   boolean specialHandling = (options.getSourcePaths().size() == 1 && 
> !targetPathExists) ||
>   options.shouldSyncFolder() || options.shouldOverwrite();
>   return specialHandling && sourceStatus.isDirectory() ? 
> sourceStatus.getPath() :
>   sourceStatus.getPath().getParent();
> }
>   }
> {code}
> We can see that it could return NULL at the end when doing 
> {{sourceStatus.getPath().getParent()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10300) TestDistCpSystem should share MiniDFSCluster

2016-04-15 Thread John Zhuge (JIRA)
John Zhuge created HDFS-10300:
-

 Summary: TestDistCpSystem should share MiniDFSCluster
 Key: HDFS-10300
 URL: https://issues.apache.org/jira/browse/HDFS-10300
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: John Zhuge
Assignee: John Zhuge
Priority: Trivial


The test cases in this class should share MiniDFSCluster if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9670) DistCp throws NPE when source is root

2016-04-15 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-9670:
-
Attachment: HDFS-9670.002.patch

Patch 002:
* Use 1 mini cluster in unit test because the issue is reproducible with both 
source and target on the same cluster.

> DistCp throws NPE when source is root
> -
>
> Key: HDFS-9670
> URL: https://issues.apache.org/jira/browse/HDFS-9670
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9670.001.patch, HDFS-9670.002.patch
>
>
> Symptom:
> {quote}
> [root@vb0724 ~]# hadoop distcp hdfs://X:8020/ hdfs://Y:8020/
> 16/01/20 11:33:33 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[hdfs://X:8020/], targetPath=hdfs://Y:8020/, 
> targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
> 16/01/20 11:33:33 INFO client.RMProxy: Connecting to ResourceManager at Z:8032
> 16/01/20 11:33:33 ERROR tools.DistCp: Exception encountered 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:598)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListingRoot(SimpleCopyListing.java:583)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:313)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
> {quote}
> Relevant code:
> {code}
>   private Path computeSourceRootPath(FileStatus sourceStatus,
>  DistCpOptions options) throws 
> IOException {
> Path target = options.getTargetPath();
> FileSystem targetFS = target.getFileSystem(getConf());
> final boolean targetPathExists = options.getTargetPathExists();
> boolean solitaryFile = options.getSourcePaths().size() == 1
> && 
> !sourceStatus.isDirectory();
> if (solitaryFile) {
>   if (targetFS.isFile(target) || !targetPathExists) {
> return sourceStatus.getPath();
>   } else {
> return sourceStatus.getPath().getParent();
>   }
> } else {
>   boolean specialHandling = (options.getSourcePaths().size() == 1 && 
> !targetPathExists) ||
>   options.shouldSyncFolder() || options.shouldOverwrite();
>   return specialHandling && sourceStatus.isDirectory() ? 
> sourceStatus.getPath() :
>   sourceStatus.getPath().getParent();
> }
>   }
> {code}
> We can see that it could return NULL at the end when doing 
> {{sourceStatus.getPath().getParent()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10291) TestShortCircuitLocalRead failing

2016-04-15 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243553#comment-15243553
 ] 

Mingliang Liu commented on HDFS-10291:
--

Thanks for the detailed investigation, [~ste...@apache.org]. I prefer the first 
choice (fixing the test only). IMHO, throwing exception is a better defined 
behavior than making HDFS itself shrinking read length as this is not really a 
necessary feature.

> TestShortCircuitLocalRead failing
> -
>
> Key: HDFS-10291
> URL: https://issues.apache.org/jira/browse/HDFS-10291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> {{TestShortCircuitLocalRead}} failing as length of read is considered off end 
> of buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9670) DistCp throws NPE when source is root

2016-04-15 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243547#comment-15243547
 ] 

John Zhuge commented on HDFS-9670:
--

Thanks, will fix it.

> DistCp throws NPE when source is root
> -
>
> Key: HDFS-9670
> URL: https://issues.apache.org/jira/browse/HDFS-9670
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9670.001.patch
>
>
> Symptom:
> {quote}
> [root@vb0724 ~]# hadoop distcp hdfs://X:8020/ hdfs://Y:8020/
> 16/01/20 11:33:33 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[hdfs://X:8020/], targetPath=hdfs://Y:8020/, 
> targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
> 16/01/20 11:33:33 INFO client.RMProxy: Connecting to ResourceManager at Z:8032
> 16/01/20 11:33:33 ERROR tools.DistCp: Exception encountered 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:598)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListingRoot(SimpleCopyListing.java:583)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:313)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
> {quote}
> Relevant code:
> {code}
>   private Path computeSourceRootPath(FileStatus sourceStatus,
>  DistCpOptions options) throws 
> IOException {
> Path target = options.getTargetPath();
> FileSystem targetFS = target.getFileSystem(getConf());
> final boolean targetPathExists = options.getTargetPathExists();
> boolean solitaryFile = options.getSourcePaths().size() == 1
> && 
> !sourceStatus.isDirectory();
> if (solitaryFile) {
>   if (targetFS.isFile(target) || !targetPathExists) {
> return sourceStatus.getPath();
>   } else {
> return sourceStatus.getPath().getParent();
>   }
> } else {
>   boolean specialHandling = (options.getSourcePaths().size() == 1 && 
> !targetPathExists) ||
>   options.shouldSyncFolder() || options.shouldOverwrite();
>   return specialHandling && sourceStatus.isDirectory() ? 
> sourceStatus.getPath() :
>   sourceStatus.getPath().getParent();
> }
>   }
> {code}
> We can see that it could return NULL at the end when doing 
> {{sourceStatus.getPath().getParent()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-15 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243522#comment-15243522
 ] 

John Zhuge commented on HDFS-10297:
---

Thanks [~jingzhao].

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10297.001.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10289) Balancer configures DNs directly

2016-04-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243518#comment-15243518
 ] 

Kihwal Lee commented on HDFS-10289:
---

bq.  Does anyone remember why the Balancer was a separate process from the 
Namenode, rather than just a thread in it?
Sometimes you want to stop it and restart it later.  It is not impossible to do 
it inside NN, but is easier if it's separate.

> Balancer configures DNs directly
> 
>
> Key: HDFS-10289
> URL: https://issues.apache.org/jira/browse/HDFS-10289
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Critical
>
> Balancer directly configures the 2 balance-related properties 
> (bandwidthPerSec and concurrentMoves) on the DNs involved.
> Details:
> * Before each balancing iteration, set the properties on all DNs involved in 
> the current iteration.
> * The DN property changes will not survive restart.
> * Balancer gets the property values from command line or its config file.
> * Need new DN APIs to query and set the 2 properties.
> * No need to edit the config file on each DN or run {{hdfs dfsadmin 
> -setBalancerBandwidth}} to configure every DN in the cluster.
> Pros:
> * Improve ease of use because all configurations are done at one place, the 
> balancer. We saw many customers often forgot to set concurrentMoves properly 
> since it is required on both DN and Balancer.
> * Support new DNs added between iterations
> * Handle DN restarts between iterations
> * May be able to dynamically adjust the thresholds in different iterations. 
> Don't know how useful though.
> Cons:
> * New DN property API
> * A malicious/misconfigured balancer may overwhelm DNs. {{hdfs dfsadmin 
> -setBalancerBandwidth}} has the same issue. Also Balancer can only be run by 
> admin.
> Questions:
> * Can we create {{BalancerConcurrentMovesCommand}} similar to 
> {{BalancerBandwidthCommand}}? Can Balancer use them directly without going 
> through NN?
> One proposal to implement HDFS-7466 calls for an API to query DN properties. 
> DN Conf Servlet returns all config properties. It does not return individual 
> property and it does not return the value set by {{hdfs dfsadmin 
> -setBalancerBandwidth}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10256) Use GenericTestUtils.getTestDir method in tests for temporary directories

2016-04-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243502#comment-15243502
 ] 

Kihwal Lee commented on HDFS-10256:
---

We have seen cases where a mini dfs cluster startup fails due to not being able 
to delete the {{data_dir}} in {{initMiniDFSCluster()}}.  Depending on when the 
build machine gets busy, it hits random test cases. If we make it sleep few 
seconds and try again, it works most of times.  The surefire doc says,

{quote}
After the test-set has completed, the process executes java.lang.System.exit(0) 
which starts shutdown hooks. At this point the process may run next 30 seconds 
until all non daemon Threads die. After the period of time has elapsed, the 
process kills itself by java.lang.Runtime.halt(0).
{quote}

{{MiniDFSCluster#shutdown()}} registers {{base_dir}} to be deleted on shutdown. 
If this gets slow, the next test JVM will start to run before the shutdown hook 
completes.  But forcing every test to call {{shutdown(true)}} can slowdown 
things. Instead, each instance should get a random {{base_dir}}, so that the 
deletion through shutdown hook and the subsequent new test setup can overlap.

[~steve_l] mentioned this in HADOOP-12984.
bq. many buildups of test dirs now use something random, rather than a 
hard-coded path like "dfs". This includes minidfs cluster...which should 
improve parallelism on test runs.

Can we actually make sure each MiniDFSCluster gets a unique base directory?

> Use GenericTestUtils.getTestDir method in tests for temporary directories
> -
>
> Key: HDFS-10256
> URL: https://issues.apache.org/jira/browse/HDFS-10256
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9670) DistCp throws NPE when source is root

2016-04-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243482#comment-15243482
 ] 

Yongjun Zhang commented on HDFS-9670:
-

Hi [~jzhuge],

For the new test you added, it seems creating one cluster would be sufficient. 
Would you please look into?

Then we can consider a future jira for consolidating the set of tests.

Thanks.


> DistCp throws NPE when source is root
> -
>
> Key: HDFS-9670
> URL: https://issues.apache.org/jira/browse/HDFS-9670
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9670.001.patch
>
>
> Symptom:
> {quote}
> [root@vb0724 ~]# hadoop distcp hdfs://X:8020/ hdfs://Y:8020/
> 16/01/20 11:33:33 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[hdfs://X:8020/], targetPath=hdfs://Y:8020/, 
> targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
> 16/01/20 11:33:33 INFO client.RMProxy: Connecting to ResourceManager at Z:8032
> 16/01/20 11:33:33 ERROR tools.DistCp: Exception encountered 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:598)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListingRoot(SimpleCopyListing.java:583)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:313)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
> {quote}
> Relevant code:
> {code}
>   private Path computeSourceRootPath(FileStatus sourceStatus,
>  DistCpOptions options) throws 
> IOException {
> Path target = options.getTargetPath();
> FileSystem targetFS = target.getFileSystem(getConf());
> final boolean targetPathExists = options.getTargetPathExists();
> boolean solitaryFile = options.getSourcePaths().size() == 1
> && 
> !sourceStatus.isDirectory();
> if (solitaryFile) {
>   if (targetFS.isFile(target) || !targetPathExists) {
> return sourceStatus.getPath();
>   } else {
> return sourceStatus.getPath().getParent();
>   }
> } else {
>   boolean specialHandling = (options.getSourcePaths().size() == 1 && 
> !targetPathExists) ||
>   options.shouldSyncFolder() || options.shouldOverwrite();
>   return specialHandling && sourceStatus.isDirectory() ? 
> sourceStatus.getPath() :
>   sourceStatus.getPath().getParent();
> }
>   }
> {code}
> We can see that it could return NULL at the end when doing 
> {{sourceStatus.getPath().getParent()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9670) DistCp throws NPE when source is root

2016-04-15 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243416#comment-15243416
 ] 

John Zhuge commented on HDFS-9670:
--

Very good point!
* File a separate jira to consolidate mini cluster in this test class?
* Or bundle the change in this patch?

> DistCp throws NPE when source is root
> -
>
> Key: HDFS-9670
> URL: https://issues.apache.org/jira/browse/HDFS-9670
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9670.001.patch
>
>
> Symptom:
> {quote}
> [root@vb0724 ~]# hadoop distcp hdfs://X:8020/ hdfs://Y:8020/
> 16/01/20 11:33:33 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[hdfs://X:8020/], targetPath=hdfs://Y:8020/, 
> targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
> 16/01/20 11:33:33 INFO client.RMProxy: Connecting to ResourceManager at Z:8032
> 16/01/20 11:33:33 ERROR tools.DistCp: Exception encountered 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:598)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListingRoot(SimpleCopyListing.java:583)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:313)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
> {quote}
> Relevant code:
> {code}
>   private Path computeSourceRootPath(FileStatus sourceStatus,
>  DistCpOptions options) throws 
> IOException {
> Path target = options.getTargetPath();
> FileSystem targetFS = target.getFileSystem(getConf());
> final boolean targetPathExists = options.getTargetPathExists();
> boolean solitaryFile = options.getSourcePaths().size() == 1
> && 
> !sourceStatus.isDirectory();
> if (solitaryFile) {
>   if (targetFS.isFile(target) || !targetPathExists) {
> return sourceStatus.getPath();
>   } else {
> return sourceStatus.getPath().getParent();
>   }
> } else {
>   boolean specialHandling = (options.getSourcePaths().size() == 1 && 
> !targetPathExists) ||
>   options.shouldSyncFolder() || options.shouldOverwrite();
>   return specialHandling && sourceStatus.isDirectory() ? 
> sourceStatus.getPath() :
>   sourceStatus.getPath().getParent();
> }
>   }
> {code}
> We can see that it could return NULL at the end when doing 
> {{sourceStatus.getPath().getParent()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10283) o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending fails intermittently

2016-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243399#comment-15243399
 ] 

Hudson commented on HDFS-10283:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9619 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9619/])
HDFS-10283. (jing9: rev 89a838769ff5b6c64565e6949b14d7fed05daf54)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java


> o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending
>  fails intermittently
> --
>
> Key: HDFS-10283
> URL: https://issues.apache.org/jira/browse/HDFS-10283
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HDFS-10283.000.patch
>
>
> The test fails with exception as following: 
> {code}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1162)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9539) libhdfs++: enable default configuration files

2016-04-15 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243402#comment-15243402
 ] 

James Clampffer commented on HDFS-9539:
---

So the idea is just grab a default config file, turn it into a giant C string 
literal, and link that in?  That sounds good to me.

> libhdfs++: enable default configuration files
> -
>
> Key: HDFS-9539
> URL: https://issues.apache.org/jira/browse/HDFS-9539
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
>
> In the Java implementation of config files, the Hadoop jars included a 
> default core-default and hdfs-default.xml file that provided default values 
> for the run-time configurations.  libhdfs++ should honor that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10293) StripedFileTestUtil#readAll flaky

2016-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243401#comment-15243401
 ] 

Hudson commented on HDFS-10293:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9619 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9619/])
HDFS-10293. StripedFileTestUtil#readAll flaky. Contributed by Mingliang (jing9: 
rev 55e19b7f0c1243090dff2d08ed785cefd420b009)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/StripedFileTestUtil.java


> StripedFileTestUtil#readAll flaky
> -
>
> Key: HDFS-10293
> URL: https://issues.apache.org/jira/browse/HDFS-10293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 3.0.0
>
> Attachments: HDFS-10293.000.patch
>
>
> The flaky test helper method cause several UT test failing intermittently. 
> For example, the 
> {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}}
>  timed out in a recent run (see 
> [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]),
>  which can be easily reproduced locally.
> Debugging at the code, chances are that the helper method is stuck in an 
> infinite loop. We need a fix to make the test robust.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10299) libhdfs++: File length doesn't always going the last block if it's being written to

2016-04-15 Thread James Clampffer (JIRA)
James Clampffer created HDFS-10299:
--

 Summary: libhdfs++: File length doesn't always going the last 
block if it's being written to
 Key: HDFS-10299
 URL: https://issues.apache.org/jira/browse/HDFS-10299
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: James Clampffer
Assignee: Xiaowei Zhu


It looks like we aren't factoring in the last block of files that are being 
written to or haven't been closed yet into the length of the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9670) DistCp throws NPE when source is root

2016-04-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243389#comment-15243389
 ] 

Yongjun Zhang commented on HDFS-9670:
-

Hi [~jzhuge],

Thanks for working on this issue. The solution looks good to me. One comment 
about the test code here. The cost of starting Mini cluster is expensive, 
ideally we could try to think about using the same cluster for the set of 
tests. In this case, can we at least try to create a single cluster and do 
distcp within the same cluster?

Thanks.





> DistCp throws NPE when source is root
> -
>
> Key: HDFS-9670
> URL: https://issues.apache.org/jira/browse/HDFS-9670
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9670.001.patch
>
>
> Symptom:
> {quote}
> [root@vb0724 ~]# hadoop distcp hdfs://X:8020/ hdfs://Y:8020/
> 16/01/20 11:33:33 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[hdfs://X:8020/], targetPath=hdfs://Y:8020/, 
> targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
> 16/01/20 11:33:33 INFO client.RMProxy: Connecting to ResourceManager at Z:8032
> 16/01/20 11:33:33 ERROR tools.DistCp: Exception encountered 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:598)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListingRoot(SimpleCopyListing.java:583)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:313)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
> {quote}
> Relevant code:
> {code}
>   private Path computeSourceRootPath(FileStatus sourceStatus,
>  DistCpOptions options) throws 
> IOException {
> Path target = options.getTargetPath();
> FileSystem targetFS = target.getFileSystem(getConf());
> final boolean targetPathExists = options.getTargetPathExists();
> boolean solitaryFile = options.getSourcePaths().size() == 1
> && 
> !sourceStatus.isDirectory();
> if (solitaryFile) {
>   if (targetFS.isFile(target) || !targetPathExists) {
> return sourceStatus.getPath();
>   } else {
> return sourceStatus.getPath().getParent();
>   }
> } else {
>   boolean specialHandling = (options.getSourcePaths().size() == 1 && 
> !targetPathExists) ||
>   options.shouldSyncFolder() || options.shouldOverwrite();
>   return specialHandling && sourceStatus.isDirectory() ? 
> sourceStatus.getPath() :
>   sourceStatus.getPath().getParent();
> }
>   }
> {code}
> We can see that it could return NULL at the end when doing 
> {{sourceStatus.getPath().getParent()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-15 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9543:
---
Attachment: HDFS-9543-HDFS-1312.001.patch

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch
>
>
> This patch adds the RPCs and mover logic that allows data to be moved from 
> one storage partition to another



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-15 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9543:
---
Status: Patch Available  (was: Open)

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch
>
>
> This patch adds the RPCs and mover logic that allows data to be moved from 
> one storage partition to another



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10289) Balancer configures DNs directly

2016-04-15 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243385#comment-15243385
 ] 

Ravi Prakash commented on HDFS-10289:
-

Thanks for trying to improve the Balancer John! Does anyone remember why the 
Balancer was a separate process from the Namenode, rather than just a thread in 
it?

> Balancer configures DNs directly
> 
>
> Key: HDFS-10289
> URL: https://issues.apache.org/jira/browse/HDFS-10289
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Critical
>
> Balancer directly configures the 2 balance-related properties 
> (bandwidthPerSec and concurrentMoves) on the DNs involved.
> Details:
> * Before each balancing iteration, set the properties on all DNs involved in 
> the current iteration.
> * The DN property changes will not survive restart.
> * Balancer gets the property values from command line or its config file.
> * Need new DN APIs to query and set the 2 properties.
> * No need to edit the config file on each DN or run {{hdfs dfsadmin 
> -setBalancerBandwidth}} to configure every DN in the cluster.
> Pros:
> * Improve ease of use because all configurations are done at one place, the 
> balancer. We saw many customers often forgot to set concurrentMoves properly 
> since it is required on both DN and Balancer.
> * Support new DNs added between iterations
> * Handle DN restarts between iterations
> * May be able to dynamically adjust the thresholds in different iterations. 
> Don't know how useful though.
> Cons:
> * New DN property API
> * A malicious/misconfigured balancer may overwhelm DNs. {{hdfs dfsadmin 
> -setBalancerBandwidth}} has the same issue. Also Balancer can only be run by 
> admin.
> Questions:
> * Can we create {{BalancerConcurrentMovesCommand}} similar to 
> {{BalancerBandwidthCommand}}? Can Balancer use them directly without going 
> through NN?
> One proposal to implement HDFS-7466 calls for an API to query DN properties. 
> DN Conf Servlet returns all config properties. It does not return individual 
> property and it does not return the value set by {{hdfs dfsadmin 
> -setBalancerBandwidth}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-15 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243340#comment-15243340
 ] 

Jing Zhao commented on HDFS-10297:
--

[~jzhuge], thanks for reporting the issue and providing a patch. I noticed you 
set the "Fix Version" to 2.8.0. We usually update "Fix Version" only after the 
patch has been committed. So currently I removed that field for you.

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10297.001.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243338#comment-15243338
 ] 

Hadoop QA commented on HDFS-10220:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 
601 unchanged - 4 fixed = 602 total (was 605) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 107m 30s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 98m 50s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 236m 9s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate |
|   | hadoop.hdfs.server.namenode.TestNestedEncryptionZones |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion |
|   | 

[jira] [Updated] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10297:
-
Fix Version/s: (was: 2.8.0)

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10297.001.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-15 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-10297:
--
Fix Version/s: 2.8.0
   Status: Patch Available  (was: Open)

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-10297.001.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-15 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-10297:
--
Attachment: HDFS-10297.001.patch

Patch 001:
* Change values in {{hdfs-default.xml}}
* Change values of {{DFS_DATANODE_BALANCE_BANDWIDTHPERSEC_DEFAULT}} and 
{{DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_DEFAULT}}

Test output:
{noformat}
$ ( cd hadoop-hdfs-project/hadoop-hdfs ; mvn test 
-Dtest=TestBalancer#testBalancer2 )
$ grep 'concurrent\.moves\|Number threads for balancing' 
hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports/*output*
2016-04-15 10:46:58,563 [Thread-0] INFO  datanode.DataNode 
(DataXceiverServer.java:(79)) - Number threads for balancing is 50
2016-04-15 10:46:58,781 [Thread-0] INFO  datanode.DataNode 
(DataXceiverServer.java:(79)) - Number threads for balancing is 50
2016-04-15 10:46:59,942 [Thread-0] INFO  datanode.DataNode 
(DataXceiverServer.java:(79)) - Number threads for balancing is 50
2016-04-15 10:47:00,153 [Thread-0] INFO  balancer.Balancer 
(Balancer.java:getInt(240)) - dfs.datanode.balance.max.concurrent.moves = 50 
(default=50)
2016-04-15 10:47:04,205 [Thread-0] INFO  balancer.Balancer 
(Balancer.java:getInt(240)) - dfs.datanode.balance.max.concurrent.moves = 50 
(default=50)
[jzhuge@jzhuge-MBP hadoop](trunk *)$ grep 'bandwidthPer\|Balancing bandwidth' 
hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports/*output*
2016-04-15 10:46:58,563 [Thread-0] INFO  datanode.DataNode 
(DataXceiverServer.java:(78)) - Balancing bandwidth is 10485760 bytes/s
2016-04-15 10:46:58,781 [Thread-0] INFO  datanode.DataNode 
(DataXceiverServer.java:(78)) - Balancing bandwidth is 10485760 bytes/s
2016-04-15 10:46:59,942 [Thread-0] INFO  datanode.DataNode 
(DataXceiverServer.java:(78)) - Balancing bandwidth is 10485760 bytes/s
{noformat}

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10297.001.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10293) StripedFileTestUtil#readAll flaky

2016-04-15 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243326#comment-15243326
 ] 

Mingliang Liu commented on HDFS-10293:
--

Thanks for your review and commit, [~jing9].

> StripedFileTestUtil#readAll flaky
> -
>
> Key: HDFS-10293
> URL: https://issues.apache.org/jira/browse/HDFS-10293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 3.0.0
>
> Attachments: HDFS-10293.000.patch
>
>
> The flaky test helper method cause several UT test failing intermittently. 
> For example, the 
> {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}}
>  timed out in a recent run (see 
> [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]),
>  which can be easily reproduced locally.
> Debugging at the code, chances are that the helper method is stuck in an 
> infinite loop. We need a fix to make the test robust.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10293) StripedFileTestUtil#readAll flaky

2016-04-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10293:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

I've committed this to trunk. Thanks for the fix, [~liuml07]!

> StripedFileTestUtil#readAll flaky
> -
>
> Key: HDFS-10293
> URL: https://issues.apache.org/jira/browse/HDFS-10293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 3.0.0
>
> Attachments: HDFS-10293.000.patch
>
>
> The flaky test helper method cause several UT test failing intermittently. 
> For example, the 
> {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}}
>  timed out in a recent run (see 
> [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]),
>  which can be easily reproduced locally.
> Debugging at the code, chances are that the helper method is stuck in an 
> infinite loop. We need a fix to make the test robust.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10283) o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending fails intermittently

2016-04-15 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243319#comment-15243319
 ] 

Mingliang Liu commented on HDFS-10283:
--

Thank you [~jingzhao] for your review and commit.

> o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending
>  fails intermittently
> --
>
> Key: HDFS-10283
> URL: https://issues.apache.org/jira/browse/HDFS-10283
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HDFS-10283.000.patch
>
>
> The test fails with exception as following: 
> {code}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1162)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10283) o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending fails intermittently

2016-04-15 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-10283:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks for the contribution, 
[~liuml07]!

> o.a.h.hdfs.server.namenode.TestFSImageWithSnapshot#testSaveLoadImageWithAppending
>  fails intermittently
> --
>
> Key: HDFS-10283
> URL: https://issues.apache.org/jira/browse/HDFS-10283
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HDFS-10283.000.patch
>
>
> The test fails with exception as following: 
> {code}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:47227,DS-dd109c14-79e5-4380-ac5e-4434cd7e25b5,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:56949,DS-6c0be75e-a78c-41b9-bfd0-7ee0cdefaa0e,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>   at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1162)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243216#comment-15243216
 ] 

Steve Loughran commented on HDFS-9732:
--

Getting close, just some details

in {{DelegationTokenFetcher}} 
# good to see you are using StringBuilder, can you split the append() sequence 
in to one per entry. Thats in {{toStringStable}} and {{printTokensToString}}. 
If you use intellij IDEA, it'll do that automatically if you ask it nicely.
#line 78, how about we make the text  just "print verbose output"

In {{TestDelegationTokenFetcher}}
# Line 141 assert statement should build up a string to print on failure. 
Imagine: everything you'd need to understand the problem from a jenkins test 
failure
# Lines 142-143 use SLF4J logging APIs, not System.out

> Remove DelegationTokenIdentifier.toString() —for better logging output
> --
>
> Key: HDFS-9732
> URL: https://issues.apache.org/jira/browse/HDFS-9732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Yongjun Zhang
> Attachments: HADOOP-12752-001.patch, HDFS-9732-000.patch, 
> HDFS-9732.001.patch, HDFS-9732.002.patch, HDFS-9732.003.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> HDFS {{DelegationTokenIdentifier.toString()}} adds some diagnostics info, 
> owner, sequence number. But its superclass,  
> {{AbstractDelegationTokenIdentifier}} contains a lot more information, 
> including token issue and expiry times.
> Because  {{DelegationTokenIdentifier.toString()}} doesn't include this data,
> information that is potentially useful for kerberos diagnostics is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10291) TestShortCircuitLocalRead failing

2016-04-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-10291:
--
Description: {{TestShortCircuitLocalRead}} failing as length of read is 
considered off end of buffer.  (was: {{TestShortCircuitLocalRead}} failing as 
length of read is considered off end of buffer. There's an off-by-one error 
somewhere in the test or the new validation code)

> TestShortCircuitLocalRead failing
> -
>
> Key: HDFS-10291
> URL: https://issues.apache.org/jira/browse/HDFS-10291
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> {{TestShortCircuitLocalRead}} failing as length of read is considered off end 
> of buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-334) dfsadmin -metasave should also log corrupt replicas info

2016-04-15 Thread Denis Bolshakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Bolshakov reassigned HDFS-334:


Assignee: Denis Bolshakov

> dfsadmin -metasave should also log corrupt replicas info
> 
>
> Key: HDFS-334
> URL: https://issues.apache.org/jira/browse/HDFS-334
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Lohit Vijayarenu
>Assignee: Denis Bolshakov
>Priority: Minor
>  Labels: newbie
>
> _hadoop dfsadmin -metasave _ should also dump information about 
> corrupt replicas map. This could help in telling if pending replication was 
> due to corrupt replicas. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-334) dfsadmin -metasave should also log corrupt replicas info

2016-04-15 Thread Andras Bokor (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243145#comment-15243145
 ] 

Andras Bokor commented on HDFS-334:
---

Not at all. Go ahead.

> dfsadmin -metasave should also log corrupt replicas info
> 
>
> Key: HDFS-334
> URL: https://issues.apache.org/jira/browse/HDFS-334
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Lohit Vijayarenu
>Priority: Minor
>  Labels: newbie
>
> _hadoop dfsadmin -metasave _ should also dump information about 
> corrupt replicas map. This could help in telling if pending replication was 
> due to corrupt replicas. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-334) dfsadmin -metasave should also log corrupt replicas info

2016-04-15 Thread Denis Bolshakov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243137#comment-15243137
 ] 

Denis Bolshakov commented on HDFS-334:
--

[~boky01]Do you mind if I take care about this issue? If so, I will assign it 
to me.

> dfsadmin -metasave should also log corrupt replicas info
> 
>
> Key: HDFS-334
> URL: https://issues.apache.org/jira/browse/HDFS-334
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Lohit Vijayarenu
>Priority: Minor
>  Labels: newbie
>
> _hadoop dfsadmin -metasave _ should also dump information about 
> corrupt replicas map. This could help in telling if pending replication was 
> due to corrupt replicas. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9940) Balancer should not use property dfs.datanode.balance.max.concurrent.moves

2016-04-15 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-9940:
-
Summary: Balancer should not use property 
dfs.datanode.balance.max.concurrent.moves  (was: Balancer should not use 
property name dfs.datanode.balance.max.concurrent.moves)

> Balancer should not use property dfs.datanode.balance.max.concurrent.moves
> --
>
> Key: HDFS-9940
> URL: https://issues.apache.org/jira/browse/HDFS-9940
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.8.0
>
>
> It is very confusing for both Balancer and Datanode to use the same property 
> {{dfs.datanode.balance.max.concurrent.moves}}. It is especially so for the 
> Balancer because the property has "datanode" in the name string. Many 
> customers forget to set the property for the Balancer.
> Change the Balancer to use a new property 
> {{dfs.balancer.max.concurrent.moves}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-15 Thread Nicolas Fraison (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Fraison updated HDFS-10220:
---
Status: Patch Available  (was: Open)

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-15 Thread Nicolas Fraison (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Fraison updated HDFS-10220:
---
Status: Open  (was: Patch Available)

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-15 Thread Nicolas Fraison (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Fraison updated HDFS-10220:
---
Attachment: HADOOP-10220.004.patch

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode

2016-04-15 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242516#comment-15242516
 ] 

Rakesh R commented on HDFS-7859:


bq. I thought many considerations originally targeted for the issue have 
already been implemented elsewhere, therefore the only thing left is custom 
codec and schema support. I don't think there is a strong requirement for this 
feature but we can implement it perhaps in phase II I guess.
Thanks for making it clear, [~drankye].

> Erasure Coding: Persist erasure coding policies in NameNode
> ---
>
> Key: HDFS-7859
> URL: https://issues.apache.org/jira/browse/HDFS-7859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Xinwei Qin 
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7859-HDFS-7285.002.patch, 
> HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, 
> HDFS-7859.001.patch, HDFS-7859.002.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
> persist EC schemas in NameNode centrally and reliably, so that EC zones can 
> reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode

2016-04-15 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242515#comment-15242515
 ] 

Rakesh R commented on HDFS-7859:


bq. For the builtin schema and policies, IIRC, there was a consideration that 
we still need to persist the schema and policy to indicate the software 
upgrades (so the builtin ones may be changed).
Yes, changing built-in schema is an interesting case. If we end up in a case to 
change the default one then persisting would be required. I think we can 
proceed to persist the ec policy details in the fsimage and editlog. I'm just 
adding a thought to understand more - probably we could explore whether layout 
version can be utilized to handle this kinda situations.

The patch need to rebase in latest code. Would you mind rebasing it, [~xinwei].

> Erasure Coding: Persist erasure coding policies in NameNode
> ---
>
> Key: HDFS-7859
> URL: https://issues.apache.org/jira/browse/HDFS-7859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Xinwei Qin 
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7859-HDFS-7285.002.patch, 
> HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, 
> HDFS-7859.001.patch, HDFS-7859.002.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
> persist EC schemas in NameNode centrally and reliably, so that EC zones can 
> reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10298) Document the usage of distcp -diff option

2016-04-15 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HDFS-10298:


 Summary: Document the usage of distcp -diff option
 Key: HDFS-10298
 URL: https://issues.apache.org/jira/browse/HDFS-10298
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, documentation
Affects Versions: 2.8.0
Reporter: Akira AJISAKA


Distcp -diff options is currently documented as "Use snapshot diff report to 
identify the difference between source and target.", but the usage is not 
documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)