[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++

2015-11-06 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994679#comment-14994679
 ] 

Bob Hansen commented on HDFS-9117:
--

[~wheat9] - do you agree that we should be reading in xml streams that follow 
the conventions of the hdfs-config.xml files?  e.g. configuration, property, 
name, value, and final stanzas?  

bq. The functionality is definitely helpful, but it can be provided as a 
utility helper instead of baking it into the main contract of libhdfs+.

That was my intention in writing this class.  A utility helper that would 
encapsulate reading config files from the field and producing a libhdfs++ 
Options object.  That's what each version has done.

I can strip it down to the API you provided, but I wonder what use case it will 
be serving then.

> Config file reader / options classes for libhdfs++
> --
>
> Key: HDFS-9117
> URL: https://issues.apache.org/jira/browse/HDFS-9117
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9117.HDFS-8707.001.patch, 
> HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, 
> HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, 
> HDFS-9117.HDFS-8707.006.patch, HDFS-9117.HDFS-8707.008.patch, 
> HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch, 
> HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, 
> HDFS-9117.HDFS-9288.007.patch
>
>
> For environmental compatability with HDFS installations, libhdfs++ should be 
> able to read the configurations from Hadoop XML files and behave in line with 
> the Java implementation.
> Most notably, machine names and ports should be readable from Hadoop XML 
> configuration files.
> Similarly, an internal Options architecture for libhdfs++ should be developed 
> to efficiently transport the configuration information within the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-2261) AOP unit tests are not getting compiled or run

2015-11-06 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-2261:
-
Status: Patch Available  (was: Open)

> AOP unit tests are not getting compiled or run 
> ---
>
> Key: HDFS-2261
> URL: https://issues.apache.org/jira/browse/HDFS-2261
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.4-alpha, 2.0.0-alpha
> Environment: 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console
> -compile-fault-inject ant target 
>Reporter: Giridharan Kesavan
>Priority: Minor
> Attachments: HDFS-2261.000.patch, hdfs-2261.patch
>
>
> The tests in src/test/aop are not getting compiled or run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-2261) AOP unit tests are not getting compiled or run

2015-11-06 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-2261:
-
Attachment: HDFS-2261.000.patch

> AOP unit tests are not getting compiled or run 
> ---
>
> Key: HDFS-2261
> URL: https://issues.apache.org/jira/browse/HDFS-2261
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha, 2.0.4-alpha
> Environment: 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console
> -compile-fault-inject ant target 
>Reporter: Giridharan Kesavan
>Priority: Minor
> Attachments: HDFS-2261.000.patch, hdfs-2261.patch
>
>
> The tests in src/test/aop are not getting compiled or run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9397) Fix typo for readChecksum() LOG.warn in BlockSender.java

2015-11-06 Thread Enrique Flores (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrique Flores updated HDFS-9397:
-
Attachment: HDFS-9397.patch

attaching proposed fix. 

> Fix typo for readChecksum() LOG.warn in BlockSender.java
> 
>
> Key: HDFS-9397
> URL: https://issues.apache.org/jira/browse/HDFS-9397
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Enrique Flores
>Priority: Trivial
> Attachments: HDFS-9397.patch
>
>
> typo for word "verify" found in: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java#L647
>  
> {code}
>   LOG.warn(" Could not read or failed to veirfy checksum for data"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in client package

2015-11-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994743#comment-14994743
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8971:
---

Sure, please file a JIRA to revert the change.  Thanks!

> Remove guards when calling LOG.debug() and LOG.trace() in client package
> 
>
> Key: HDFS-8971
> URL: https://issues.apache.org/jira/browse/HDFS-8971
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8971.000.patch, HDFS-8971.001.patch
>
>
> We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to 
> {{hadoop-hdfs-client}} module in JIRA 
> [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and 
> [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951], and 
> {{BlockReader}} in 
> [HDFS-8925|https://issues.apache.org/jira/browse/HDFS-8925]. Meanwhile, we 
> also replaced the _log4j_ log with _slf4j_ logger. There were existing code 
> in the client package to guard the log when calling {{LOG.debug()}} and 
> {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this:
> {code:title=Trace with guards|borderStyle=solid}
> 724if (LOG.isTraceEnabled()) {
> 725  LOG.trace(this + ": found waitable for " + key);
> 726}
> {code}
> In _slf4j_, this kind of guard is not necessary. We should clean the code by 
> removing the guard from the client package.
> {code:title=Trace without guards|borderStyle=solid}
> 724LOG.trace("{}: found waitable for {}", this, key);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9397) Fix typo for readChecksum() LOG.warn in BlockSender.java

2015-11-06 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HDFS-9397:
---
Assignee: Nicole Pazmany

> Fix typo for readChecksum() LOG.warn in BlockSender.java
> 
>
> Key: HDFS-9397
> URL: https://issues.apache.org/jira/browse/HDFS-9397
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Enrique Flores
>Assignee: Nicole Pazmany
>Priority: Trivial
> Attachments: HDFS-9397.patch
>
>
> typo for word "verify" found in: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java#L647
>  
> {code}
>   LOG.warn(" Could not read or failed to veirfy checksum for data"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9398) Make ByteArraryManager log message in one-line format

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994868#comment-14994868
 ] 

Hadoop QA commented on HDFS-9398:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 32s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 7s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 20s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-07 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771143/HDFS-9398.000.patch |
| JIRA Issue | HDFS-9398 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux 0a08e6ac7939 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/apache-yetus-ee5baeb/precommit/personality/hadoop.sh
 |
| git revision | trunk / bf6aa30 |
| Default Java | 1.7.0_79 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_60 

[jira] [Updated] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.

2015-11-06 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9267:

Attachment: HDFS-9267.03.patch

Hi, [~cmccabe]. 

I updated the patch to provide an {{ReplicaIterator}} class and refactor the 
{{BlockPoolSlice}} to use it. The reason that using {{ReplicaIterator}} instead 
of {{java.util.Iterator}} is that it can {{IOException}} in {{next()}}.

Could you give some feedbacks? Thanks a lot.

> TestDiskError should get stored replicas through FsDatasetTestUtils.
> 
>
> Key: HDFS-9267
> URL: https://issues.apache.org/jira/browse/HDFS-9267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-9267.00.patch, HDFS-9267.01.patch, 
> HDFS-9267.02.patch, HDFS-9267.03.patch
>
>
> {{TestDiskError#testReplicationError}} scans local directories to verify 
> blocks and metadata files, which leaks the details of {{FsDataset}} 
> implementation. 
> This JIRA will abstract the "scanning" operation to {{FsDatasetTestUtils}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9397) Fix typo for readChecksum() LOG.warn in BlockSender.java

2015-11-06 Thread Enrique Flores (JIRA)
Enrique Flores created HDFS-9397:


 Summary: Fix typo for readChecksum() LOG.warn in BlockSender.java
 Key: HDFS-9397
 URL: https://issues.apache.org/jira/browse/HDFS-9397
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Enrique Flores
Priority: Trivial


typo for word "verify" found in: 

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java#L647
 

{code}
  LOG.warn(" Could not read or failed to veirfy checksum for data"
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994771#comment-14994771
 ] 

Hadoop QA commented on HDFS-9394:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
8s {color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} branch-2 passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s 
{color} | {color:green} branch-2 passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} branch-2 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 54s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in branch-2 has 1 extant 
Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 46s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client in branch-2 has 5 
extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} branch-2 passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s 
{color} | {color:green} branch-2 passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 40s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 11s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 10s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 152m 43s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | 
hadoop.hdfs.server.namenode.TestCacheDirectives |
|   | hadoop.hdfs.TestRollingUpgradeRollback |
|   | hadoop.hdfs.TestDistributedFileSystem |
|   | 

[jira] [Updated] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes

2015-11-06 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9379:

  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.8.0
Target Version/s:   (was: 2.8.0)
  Status: Resolved  (was: Patch Available)

Committed for 2.8.0. Thanks for the contribution [~liuml07].

> Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
> --
>
> Key: HDFS-9379
> URL: https://issues.apache.org/jira/browse/HDFS-9379
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9379.000.patch
>
>
> Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on 
> sorted {{datanodes}} array in the lexicographical order of datanode's 
> {{xferAddr}}.
> * There is an assertion of datanode's {{xferAddr}} lexicographical order when 
> filling the {{datanodes}}, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152].
> * When searching the datanode by {{DatanodeInfo}}, it uses binary search 
> against the {{datanodes}} array, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187]
> In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In 
> {{NNThroughputBenchmark}}, the port is simply _the index of the tiny 
> datanode_ plus one.
> The problem here is that, when there are more than 9 tiny datanodes 
> ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will 
> be invalid as the string value of datanode index is not in lexicographical 
> order any more. For example, 
> {code}
> ...
> 192.168.54.40:8
> 192.168.54.40:9
> 192.168.54.40:10
> 192.168.54.40:11
> ...
> {code}
> {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will 
> fail and the binary search won't work.
> The simple fix is to calculate the datanode index by port directly, instead 
> of using binary search.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9318) considerLoad factor can be improved

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994597#comment-14994597
 ] 

Hudson commented on HDFS-9318:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1371 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1371/])
HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: 
rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java


> considerLoad factor can be improved
> ---
>
> Key: HDFS-9318
> URL: https://issues.apache.org/jira/browse/HDFS-9318
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch
>
>
> Currently considerLoad avoids choosing nodes that are too active, so it helps 
> level the HDFS load across the cluster. Under normal conditions, this is 
> desired. However, when a cluster has a large percentage of nearly full nodes, 
> this can make it difficult to find good targets because the placement policy 
> wants to avoid the full nodes, but considerLoad wants to avoid the busy 
> less-full nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994595#comment-14994595
 ] 

Hudson commented on HDFS-9236:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1371 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1371/])
HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: 
rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java


> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Fix For: 2.8.0
>
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, 
> HDFS-9236.006.patch, HDFS-9236.007.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2261) AOP unit tests are not getting compiled or run

2015-11-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994720#comment-14994720
 ] 

Karthik Kambatla commented on HDFS-2261:


+1, pending Jenkins. Thanks for taking this up, [~wheat9]. 

> AOP unit tests are not getting compiled or run 
> ---
>
> Key: HDFS-2261
> URL: https://issues.apache.org/jira/browse/HDFS-2261
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha, 2.0.4-alpha
> Environment: 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console
> -compile-fault-inject ant target 
>Reporter: Giridharan Kesavan
>Priority: Minor
> Attachments: HDFS-2261.000.patch, hdfs-2261.patch
>
>
> The tests in src/test/aop are not getting compiled or run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9318) considerLoad factor can be improved

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994727#comment-14994727
 ] 

Hudson commented on HDFS-9318:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #638 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/638/])
HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: 
rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java


> considerLoad factor can be improved
> ---
>
> Key: HDFS-9318
> URL: https://issues.apache.org/jira/browse/HDFS-9318
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch
>
>
> Currently considerLoad avoids choosing nodes that are too active, so it helps 
> level the HDFS load across the cluster. Under normal conditions, this is 
> desired. However, when a cluster has a large percentage of nearly full nodes, 
> this can make it difficult to find good targets because the placement policy 
> wants to avoid the full nodes, but considerLoad wants to avoid the busy 
> less-full nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994726#comment-14994726
 ] 

Hudson commented on HDFS-9236:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #638 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/638/])
HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: 
rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Fix For: 2.8.0
>
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, 
> HDFS-9236.006.patch, HDFS-9236.007.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.

2015-11-06 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994814#comment-14994814
 ] 

Mingliang Liu commented on HDFS-9394:
-

Test {{org.apache.hadoop.hdfs.TestRollingUpgradeRollback}} fails in branch-2. 
All other tests can pass locally. Seem unrelated?

> branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader 
> initialization, because HftpFileSystem is missing.
> 
>
> Key: HDFS-9394
> URL: https://issues.apache.org/jira/browse/HDFS-9394
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Chris Nauroth
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-9394.000.branch-2.patch
>
>
> On branch-2, hadoop-hdfs-client contains a {{FileSystem}} service descriptor 
> that lists {{HftpFileSystem}} and {{HsftpFileSystem}}.  These classes do not 
> reside in hadoop-hdfs-client.  Instead, they reside in hadoop-hdfs.  If the 
> application has hadoop-hdfs-client.jar on the classpath, but not 
> hadoop-hdfs.jar, then this can cause a {{ServiceConfigurationError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9364) Unnecessary DNS resolution attempts when creating NameNodeProxies

2015-11-06 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9364:

Attachment: HDFS-9364.004.patch

> Unnecessary DNS resolution attempts when creating NameNodeProxies
> -
>
> Key: HDFS-9364
> URL: https://issues.apache.org/jira/browse/HDFS-9364
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, performance
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9364.001.patch, HDFS-9364.002.patch, 
> HDFS-9364.003.patch, HDFS-9364.004.patch
>
>
> When creating NameNodeProxies, we always try to DNS-resolve namenode URIs. 
> This is unnecessary if the URI is logical, and may be significantly slow if 
> the DNS is having problems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes

2015-11-06 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994954#comment-14994954
 ] 

Arpit Agarwal commented on HDFS-9379:
-

Thanks for confirming you tested it manually. I will commit this shortly.

> Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
> --
>
> Key: HDFS-9379
> URL: https://issues.apache.org/jira/browse/HDFS-9379
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9379.000.patch
>
>
> Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on 
> sorted {{datanodes}} array in the lexicographical order of datanode's 
> {{xferAddr}}.
> * There is an assertion of datanode's {{xferAddr}} lexicographical order when 
> filling the {{datanodes}}, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152].
> * When searching the datanode by {{DatanodeInfo}}, it uses binary search 
> against the {{datanodes}} array, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187]
> In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In 
> {{NNThroughputBenchmark}}, the port is simply _the index of the tiny 
> datanode_ plus one.
> The problem here is that, when there are more than 9 tiny datanodes 
> ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will 
> be invalid as the string value of datanode index is not in lexicographical 
> order any more. For example, 
> {code}
> ...
> 192.168.54.40:8
> 192.168.54.40:9
> 192.168.54.40:10
> 192.168.54.40:11
> ...
> {code}
> {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will 
> fail and the binary search won't work.
> The simple fix is to calculate the datanode index by port directly, instead 
> of using binary search.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++

2015-11-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994598#comment-14994598
 ] 

Haohui Mai commented on HDFS-9117:
--

bq. As an example, let's say we are writing a native replacement for the dfs 
tool using the native libhdfs++ codebase (not the libhdfs compatability layer) 
that cat do "-ls" and "-copyFromLocal", etc. To provide Least Astonishment for 
our consumers, they would expect that a properly configured Hadoop node [with 
the HADOOP_HOME pointing to /etc/hadoop-2.9.9 and its config files] could run 
"hdfspp -ls /tmp" and have it automatically find the NN and configure the 
communications parameters correctly to talk to their cluster.

Unfortunately the assumption is broken in many ways -- it is fully 
implementation defined.  For example, there are issues whether {{HADOOP_HOME}} 
or {{HADOOP_PREFIX}} should be chosen. Configuration files are only required to 
be specified in {{CLASSPATH}} but not necessary in the {{HADOOP_HOME}} 
directory. Different vendors might have changed their scripts and put the 
configuration in different places. Scripts evolves across versions. We have 
very different scripts between trunk and branch-2.

While it definitely useful in the libhdfs compatibility layer, I'm doubtful it 
should be added into the core part of the library due to all these complexity.

Therefore I believe that the focus of the library should be providing 
mechanisms to interact with HDFS but not concrete policy (e.g., location of the 
configuration) on how to interact. We don't have any libraries to implement the 
protocols and mechanisms to interact with HDFS yet (which is the reusable 
part). The policy is highly customized in different environments but it can be 
worked around easily (which is the less reusable part).

bq. given this context, do you agree that we need to support libhdfs++ 
compatibility with the hdfs-site.xml files that are already deployed at 
customer 

There are two levels of APIs when you talk about libhdfs++ APIs. The core API 
focuses on providing mechanisms to interact with HDFS, such as implementing the 
Hadoop RPC, DataTransferProtocol. The API that you're referring to might be a 
convenient API for libhdfs++. The functionality is definitely helpful, but it 
can be provided as a utility helper instead of baking it into the main contract 
of libhdfs++.

My suggestion is the following:

1. Focusing on getting the code on parsing XML in strings (which is the core 
functionality of parsing configuration) in this jira. It should not contain any 
file operations.
2. Separating the tasks on searching through paths, reading files, etc. into 
different jiras. For now it makes sense to put it along with the {{libhdfs}} 
compatibility layer. Since it's an implementation detail I believe we can 
quickly go through it. At a later point of time we can promote the code to a 
common library once we have a proposal on how the libhdfs++ convenient APIs 
look like.


> Config file reader / options classes for libhdfs++
> --
>
> Key: HDFS-9117
> URL: https://issues.apache.org/jira/browse/HDFS-9117
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9117.HDFS-8707.001.patch, 
> HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, 
> HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, 
> HDFS-9117.HDFS-8707.006.patch, HDFS-9117.HDFS-8707.008.patch, 
> HDFS-9117.HDFS-8707.009.patch, HDFS-9117.HDFS-8707.010.patch, 
> HDFS-9117.HDFS-8707.011.patch, HDFS-9117.HDFS-8707.012.patch, 
> HDFS-9117.HDFS-9288.007.patch
>
>
> For environmental compatability with HDFS installations, libhdfs++ should be 
> able to read the configurations from Hadoop XML files and behave in line with 
> the Java implementation.
> Most notably, machine names and ports should be readable from Hadoop XML 
> configuration files.
> Similarly, an internal Options architecture for libhdfs++ should be developed 
> to efficiently transport the configuration information within the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9369) Use ctest to run tests for hadoop-hdfs-native-client

2015-11-06 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994676#comment-14994676
 ] 

Jing Zhao commented on HDFS-9369:
-

The change looks good to me. +1

> Use ctest to run tests for hadoop-hdfs-native-client
> 
>
> Key: HDFS-9369
> URL: https://issues.apache.org/jira/browse/HDFS-9369
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-9369.000.patch
>
>
> Currently we write special rules in {{pom.xml}} to run tests in 
> {{hadoop-hdfs-native-client}}. This jira proposes to run these tests using 
> ctest to simplify {{pom.xml}} and improve portability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994756#comment-14994756
 ] 

Hudson commented on HDFS-6481:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2578 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2578/])
HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 
0b18e5e8c69b40c9a446fff448d38e0dd10cb45e)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java


> DatanodeManager#getDatanodeStorageInfos() should check the length of 
> storageIDs
> ---
>
> Key: HDFS-6481
> URL: https://issues.apache.org/jira/browse/HDFS-6481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.7.3
>
> Attachments: h6481_20151105.patch, hdfs-6481-v1.txt
>
>
> Ian Brooks reported the following stack trace:
> {code}
> 2014-06-03 13:05:03,915 WARN  [DataStreamer for file 
> /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200
>  block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] 
> hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
>  0
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
> at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> 

[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994755#comment-14994755
 ] 

Hudson commented on HDFS-9236:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2578 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2578/])
HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: 
rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java


> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Fix For: 2.8.0
>
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, 
> HDFS-9236.006.patch, HDFS-9236.007.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9318) considerLoad factor can be improved

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994757#comment-14994757
 ] 

Hudson commented on HDFS-9318:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2578 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2578/])
HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: 
rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


> considerLoad factor can be improved
> ---
>
> Key: HDFS-9318
> URL: https://issues.apache.org/jira/browse/HDFS-9318
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch
>
>
> Currently considerLoad avoids choosing nodes that are too active, so it helps 
> level the HDFS load across the cluster. Under normal conditions, this is 
> desired. However, when a cluster has a large percentage of nearly full nodes, 
> this can make it difficult to find good targets because the placement policy 
> wants to avoid the full nodes, but considerLoad wants to avoid the busy 
> less-full nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy

2015-11-06 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8905:

Status: Open  (was: Patch Available)

> Refactor DFSInputStream#ReaderStrategy
> --
>
> Key: HDFS-8905
> URL: https://issues.apache.org/jira/browse/HDFS-8905
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HDFS-8905-HDFS-7285-v1.patch, HDFS-8905-v2.patch
>
>
> DFSInputStream#ReaderStrategy family don't look very good. This refactors a 
> little bit to make them make more sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy

2015-11-06 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8905:

Fix Version/s: (was: HDFS-7285)

> Refactor DFSInputStream#ReaderStrategy
> --
>
> Key: HDFS-8905
> URL: https://issues.apache.org/jira/browse/HDFS-8905
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HDFS-8905-HDFS-7285-v1.patch, HDFS-8905-v2.patch
>
>
> DFSInputStream#ReaderStrategy family don't look very good. This refactors a 
> little bit to make them make more sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy

2015-11-06 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8905:

Status: Patch Available  (was: Open)

> Refactor DFSInputStream#ReaderStrategy
> --
>
> Key: HDFS-8905
> URL: https://issues.apache.org/jira/browse/HDFS-8905
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HDFS-8905-HDFS-7285-v1.patch, HDFS-8905-v2.patch
>
>
> DFSInputStream#ReaderStrategy family don't look very good. This refactors a 
> little bit to make them make more sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9318) considerLoad factor can be improved

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994589#comment-14994589
 ] 

Hudson commented on HDFS-9318:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #648 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/648/])
HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: 
rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256)
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java


> considerLoad factor can be improved
> ---
>
> Key: HDFS-9318
> URL: https://issues.apache.org/jira/browse/HDFS-9318
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch
>
>
> Currently considerLoad avoids choosing nodes that are too active, so it helps 
> level the HDFS load across the cluster. Under normal conditions, this is 
> desired. However, when a cluster has a large percentage of nearly full nodes, 
> this can make it difficult to find good targets because the placement policy 
> wants to avoid the full nodes, but considerLoad wants to avoid the busy 
> less-full nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994587#comment-14994587
 ] 

Hudson commented on HDFS-9236:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #648 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/648/])
HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: 
rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java


> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Fix For: 2.8.0
>
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, 
> HDFS-9236.006.patch, HDFS-9236.007.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994588#comment-14994588
 ] 

Hudson commented on HDFS-6481:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #648 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/648/])
HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 
0b18e5e8c69b40c9a446fff448d38e0dd10cb45e)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> DatanodeManager#getDatanodeStorageInfos() should check the length of 
> storageIDs
> ---
>
> Key: HDFS-6481
> URL: https://issues.apache.org/jira/browse/HDFS-6481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.7.2
>
> Attachments: h6481_20151105.patch, hdfs-6481-v1.txt
>
>
> Ian Brooks reported the following stack trace:
> {code}
> 2014-06-03 13:05:03,915 WARN  [DataStreamer for file 
> /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200
>  block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] 
> hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
>  0
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
> at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> 

[jira] [Commented] (HDFS-9258) NN should indicate which nodes are stale

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994749#comment-14994749
 ] 

Hadoop QA commented on HDFS-9258:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 16s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-hdfs-project/hadoop-hdfs (total was 451, now 452). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 36s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 52s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 22s 
{color} | {color:red} Patch generated 56 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 146m 33s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.TestFileCreationClient |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.TestFileCreation |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
| JDK v1.7.0_79 Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshot |
|   | hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes 

[jira] [Updated] (HDFS-9398) Make ByteArraryManager log message in one-line format

2015-11-06 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9398:

Description: Per discussion in [HDFS-8971], the {{ByteArrayManager}} should 
use one-line message. It's for sure easy to read, especially in case of 
multiple-threads. The easy fix is to use the old format before [HDFS-8971].  
(was: Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use 
one-line message in ByteArrayManager. It's for sure easy to read, especially in 
case of multiple-threads. The easy fix is to use the old format before 
[HDFS-8971].)

> Make ByteArraryManager log message in one-line format
> -
>
> Key: HDFS-9398
> URL: https://issues.apache.org/jira/browse/HDFS-9398
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line 
> message. It's for sure easy to read, especially in case of multiple-threads. 
> The easy fix is to use the old format before [HDFS-8971].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9398) Make ByteArraryManager log message in one-line format

2015-11-06 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-9398:
---

 Summary: Make ByteArraryManager log message in one-line format
 Key: HDFS-9398
 URL: https://issues.apache.org/jira/browse/HDFS-9398
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Mingliang Liu
Assignee: Mingliang Liu


Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line 
message in ByteArrayManager. It's for sure easy to read, especially in case of 
multiple-threads. The easy fix is to use the old format before [HDFS-8971].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9398) Make ByteArraryManager log message in one-line format

2015-11-06 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9398:

Issue Type: Improvement  (was: Bug)

> Make ByteArraryManager log message in one-line format
> -
>
> Key: HDFS-9398
> URL: https://issues.apache.org/jira/browse/HDFS-9398
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line 
> message in ByteArrayManager. It's for sure easy to read, especially in case 
> of multiple-threads. The easy fix is to use the old format before [HDFS-8971].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9399) Ability to disable HDFS browsing via browseDirectory.jsp , make it configurable

2015-11-06 Thread Raghu C Doppalapudi (JIRA)
Raghu C Doppalapudi created HDFS-9399:
-

 Summary: Ability to disable HDFS browsing via browseDirectory.jsp 
, make it configurable
 Key: HDFS-9399
 URL: https://issues.apache.org/jira/browse/HDFS-9399
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Raghu C Doppalapudi
Assignee: Raghu C Doppalapudi
Priority: Minor


Currently there is no config property available in HDFS to disable file 
browsing capability. make it configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994596#comment-14994596
 ] 

Hudson commented on HDFS-6481:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1371 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1371/])
HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 
0b18e5e8c69b40c9a446fff448d38e0dd10cb45e)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> DatanodeManager#getDatanodeStorageInfos() should check the length of 
> storageIDs
> ---
>
> Key: HDFS-6481
> URL: https://issues.apache.org/jira/browse/HDFS-6481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.7.2
>
> Attachments: h6481_20151105.patch, hdfs-6481-v1.txt
>
>
> Ian Brooks reported the following stack trace:
> {code}
> 2014-06-03 13:05:03,915 WARN  [DataStreamer for file 
> /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200
>  block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] 
> hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
>  0
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
> at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> 

[jira] [Updated] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs

2015-11-06 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-6481:

Fix Version/s: (was: 2.7.2)
   2.7.3

> DatanodeManager#getDatanodeStorageInfos() should check the length of 
> storageIDs
> ---
>
> Key: HDFS-6481
> URL: https://issues.apache.org/jira/browse/HDFS-6481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.7.3
>
> Attachments: h6481_20151105.patch, hdfs-6481-v1.txt
>
>
> Ian Brooks reported the following stack trace:
> {code}
> 2014-06-03 13:05:03,915 WARN  [DataStreamer for file 
> /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200
>  block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] 
> hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
>  0
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
> at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)
> 2014-06-03 13:05:48,489 ERROR [RpcServer.handler=22,port=16020] wal.FSHLog: 
> syncer encountered error, will retry. txid=211
> 

[jira] [Commented] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994725#comment-14994725
 ] 

Hadoop QA commented on HDFS-7163:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 5s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
1s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 50s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 9s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 29s 
{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-hdfs-project (total was 58, now 59). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 47s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 48m 41s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 128m 17s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-06 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771091/HDFS-7163.003.patch |
| JIRA Issue | HDFS-7163 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux 

[jira] [Commented] (HDFS-9395) getContentSummary is audit logged as success even if failed

2015-11-06 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994654#comment-14994654
 ] 

Kihwal Lee commented on HDFS-9395:
--

It's by design? HDFS-5163

> getContentSummary is audit logged as success even if failed
> ---
>
> Key: HDFS-9395
> URL: https://issues.apache.org/jira/browse/HDFS-9395
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kuhu Shukla
>
> Audit logging is in the fainally block along with the lock unlocking, so it 
> is always logged as success even for cases like FileNotFoundException is 
> thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-06 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994706#comment-14994706
 ] 

Jing Zhao commented on HDFS-9129:
-

The latest patch looks good to me overall. Here're my comments:
# Let's still define {{BlockManagerSafeMode#status}} as private field, and 
provide getter/setter if necessary. In this way we can have better control of 
its value. Similarly for {{blockTotal}} and {{blockSafe}}.
# The following two initializations may be wrong: with the patch the safemode 
object is created when contructing BlockManager, before loading fsimage and 
editlog from disk.
{code}
private final long startTime = monotonicNow();
{code}
{code}
private long lastStatusReport = monotonicNow();
{code}
# {{shouldIncrementallyTrackBlocks}} is actually determined by {{haEnabled}} 
thus looks like it can be declared as final and {{isSafeModeTrackingBlocks}} 
can be simplified.
# {{BlockManagerSafeMode#setBlockTotal}} currently does two things: 1) updating 
threshold numbers, and 2) triggering mode check. We can separate #2 out of this 
method, and then {{activate}} does not need to do an unnecessary check.
# {{reached}} can be renamed to {{reachedTime}}
# In the old safemode semantic, once entering the extension state, NN never 
comes back to the normal safemode state, but can keep waiting in the extension 
state if the threshold is not met again. The current implementation changes 
this semantic. It's better to avoid this change here.
{code}
case EXTENSION:
  if (!areThresholdsMet()) {
// EXTENSION -> PENDING_THRESHOLD
status = BMSafeModeStatus.PENDING_THRESHOLD;
  } 
{code}
# The following code can be simplified.
{code}
if (status == BMSafeModeStatus.OFF) {
  return;
}
if (!shouldIncrementallyTrackBlocks) {
  return;
}
{code}
# In {{adjustBlockTotals}}, the {{setBlockTotal}} call should be out of the 
synchronized block.
{code}
synchronized (this) {
  ...
  blockSafe += deltaSafe;
  setBlockTotal(blockTotal + deltaTotal);
}
{code}
# Not caused by this patch, but since {{doConsistencyCheck}} sometimes is not 
protected by any lock (e.g., {{computeDatanodeWork}}), the total number of 
blocks retrieved from blockManager and used by the consistency check can be 
inaccurate. So I think here we can replace the AssertionError to a warning log 
message.
# Let's still name the first parameter of {{incrementSafeBlockCount}} as 
"storageNum".
# In {{decrementSafeBlockCount}}, {{checkSafeMode}} only needs to be called 
when the first time the live replica number drops below the safe number. Thus 
{{checkSafeMode}} should be called within the if.
{code}
  if (blockManager.countNodes(b).liveReplicas() == safeReplication - 1) {
this.blockSafe--;
  }
  assert blockSafe >= 0;
  checkSafeMode();
{code}

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2261) AOP unit tests are not getting compiled or run

2015-11-06 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994700#comment-14994700
 ] 

Haohui Mai commented on HDFS-2261:
--

Rebase on the latest trunk.

> AOP unit tests are not getting compiled or run 
> ---
>
> Key: HDFS-2261
> URL: https://issues.apache.org/jira/browse/HDFS-2261
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0-alpha, 2.0.4-alpha
> Environment: 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console
> -compile-fault-inject ant target 
>Reporter: Giridharan Kesavan
>Priority: Minor
> Attachments: HDFS-2261.000.patch, hdfs-2261.patch
>
>
> The tests in src/test/aop are not getting compiled or run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9398) Make ByteArraryManager log message in one-line format

2015-11-06 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9398:

Attachment: HDFS-9398.000.patch

> Make ByteArraryManager log message in one-line format
> -
>
> Key: HDFS-9398
> URL: https://issues.apache.org/jira/browse/HDFS-9398
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9398.000.patch
>
>
> Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line 
> message. It's for sure easy to read, especially in case of multiple-threads. 
> The easy fix is to use the old format before [HDFS-8971].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in client package

2015-11-06 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994784#comment-14994784
 ] 

Mingliang Liu commented on HDFS-8971:
-

Thanks for your suggestion [~szetszwo]. I filed [HDFS-9398] to track the effort 
of reverting changes in {{ByteArrayManager}} regarding the log message.

> Remove guards when calling LOG.debug() and LOG.trace() in client package
> 
>
> Key: HDFS-8971
> URL: https://issues.apache.org/jira/browse/HDFS-8971
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8971.000.patch, HDFS-8971.001.patch
>
>
> We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to 
> {{hadoop-hdfs-client}} module in JIRA 
> [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and 
> [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951], and 
> {{BlockReader}} in 
> [HDFS-8925|https://issues.apache.org/jira/browse/HDFS-8925]. Meanwhile, we 
> also replaced the _log4j_ log with _slf4j_ logger. There were existing code 
> in the client package to guard the log when calling {{LOG.debug()}} and 
> {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this:
> {code:title=Trace with guards|borderStyle=solid}
> 724if (LOG.isTraceEnabled()) {
> 725  LOG.trace(this + ": found waitable for " + key);
> 726}
> {code}
> In _slf4j_, this kind of guard is not necessary. We should clean the code by 
> removing the guard from the client package.
> {code:title=Trace without guards|borderStyle=solid}
> 724LOG.trace("{}: found waitable for {}", this, key);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9398) Make ByteArraryManager log message in one-line format

2015-11-06 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9398:

Status: Patch Available  (was: Open)

> Make ByteArraryManager log message in one-line format
> -
>
> Key: HDFS-9398
> URL: https://issues.apache.org/jira/browse/HDFS-9398
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9398.000.patch
>
>
> Per discussion in [HDFS-8971], the {{ByteArrayManager}} should use one-line 
> message. It's for sure easy to read, especially in case of multiple-threads. 
> The easy fix is to use the old format before [HDFS-8971].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9364) Unnecessary DNS resolution attempts when creating NameNodeProxies

2015-11-06 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994859#comment-14994859
 ] 

Xiao Chen commented on HDFS-9364:
-

Thanks [~zhz], attached patch 4 with the fix.

> Unnecessary DNS resolution attempts when creating NameNodeProxies
> -
>
> Key: HDFS-9364
> URL: https://issues.apache.org/jira/browse/HDFS-9364
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, performance
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9364.001.patch, HDFS-9364.002.patch, 
> HDFS-9364.003.patch, HDFS-9364.004.patch
>
>
> When creating NameNodeProxies, we always try to DNS-resolve namenode URIs. 
> This is unnecessary if the URI is logical, and may be significantly slow if 
> the DNS is having problems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9318) considerLoad factor can be improved

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994976#comment-14994976
 ] 

Hudson commented on HDFS-9318:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2518 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2518/])
HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: 
rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256)
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java


> considerLoad factor can be improved
> ---
>
> Key: HDFS-9318
> URL: https://issues.apache.org/jira/browse/HDFS-9318
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch
>
>
> Currently considerLoad avoids choosing nodes that are too active, so it helps 
> level the HDFS load across the cluster. Under normal conditions, this is 
> desired. However, when a cluster has a large percentage of nearly full nodes, 
> this can make it difficult to find good targets because the placement policy 
> wants to avoid the full nodes, but considerLoad wants to avoid the busy 
> less-full nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994974#comment-14994974
 ] 

Hudson commented on HDFS-9236:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2518 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2518/])
HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: 
rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java


> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Fix For: 2.8.0
>
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, 
> HDFS-9236.006.patch, HDFS-9236.007.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994975#comment-14994975
 ] 

Hudson commented on HDFS-6481:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2518 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2518/])
HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 
0b18e5e8c69b40c9a446fff448d38e0dd10cb45e)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


> DatanodeManager#getDatanodeStorageInfos() should check the length of 
> storageIDs
> ---
>
> Key: HDFS-6481
> URL: https://issues.apache.org/jira/browse/HDFS-6481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.7.3
>
> Attachments: h6481_20151105.patch, hdfs-6481-v1.txt
>
>
> Ian Brooks reported the following stack trace:
> {code}
> 2014-06-03 13:05:03,915 WARN  [DataStreamer for file 
> /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200
>  block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] 
> hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
>  0
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
> at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> 

[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995032#comment-14995032
 ] 

Hudson commented on HDFS-9379:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #639 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/639/])
HDFS-9379. Make NNThroughputBenchmark support more than 10 datanodes. (arp: rev 
2801b42a7e178ad6a0e6b0f29f22f3571969c530)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java


> Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
> --
>
> Key: HDFS-9379
> URL: https://issues.apache.org/jira/browse/HDFS-9379
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9379.000.patch
>
>
> Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on 
> sorted {{datanodes}} array in the lexicographical order of datanode's 
> {{xferAddr}}.
> * There is an assertion of datanode's {{xferAddr}} lexicographical order when 
> filling the {{datanodes}}, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152].
> * When searching the datanode by {{DatanodeInfo}}, it uses binary search 
> against the {{datanodes}} array, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187]
> In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In 
> {{NNThroughputBenchmark}}, the port is simply _the index of the tiny 
> datanode_ plus one.
> The problem here is that, when there are more than 9 tiny datanodes 
> ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will 
> be invalid as the string value of datanode index is not in lexicographical 
> order any more. For example, 
> {code}
> ...
> 192.168.54.40:8
> 192.168.54.40:9
> 192.168.54.40:10
> 192.168.54.40:11
> ...
> {code}
> {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will 
> fail and the binary search won't work.
> The simple fix is to calculate the datanode index by port directly, instead 
> of using binary search.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994989#comment-14994989
 ] 

Hudson commented on HDFS-9379:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8770 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8770/])
HDFS-9379. Make NNThroughputBenchmark support more than 10 datanodes. (arp: rev 
2801b42a7e178ad6a0e6b0f29f22f3571969c530)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java


> Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
> --
>
> Key: HDFS-9379
> URL: https://issues.apache.org/jira/browse/HDFS-9379
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9379.000.patch
>
>
> Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on 
> sorted {{datanodes}} array in the lexicographical order of datanode's 
> {{xferAddr}}.
> * There is an assertion of datanode's {{xferAddr}} lexicographical order when 
> filling the {{datanodes}}, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152].
> * When searching the datanode by {{DatanodeInfo}}, it uses binary search 
> against the {{datanodes}} array, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187]
> In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In 
> {{NNThroughputBenchmark}}, the port is simply _the index of the tiny 
> datanode_ plus one.
> The problem here is that, when there are more than 9 tiny datanodes 
> ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will 
> be invalid as the string value of datanode index is not in lexicographical 
> order any more. For example, 
> {code}
> ...
> 192.168.54.40:8
> 192.168.54.40:9
> 192.168.54.40:10
> 192.168.54.40:11
> ...
> {code}
> {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will 
> fail and the binary search won't work.
> The simple fix is to calculate the datanode index by port directly, instead 
> of using binary search.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9318) considerLoad factor can be improved

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995016#comment-14995016
 ] 

Hudson commented on HDFS-9318:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #579 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/579/])
HDFS-9318. considerLoad factor can be improved. Contributed by Kuhu (kihwal: 
rev bf6aa30a156b3c5cac5469014a5989e0dfdc7256)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


> considerLoad factor can be improved
> ---
>
> Key: HDFS-9318
> URL: https://issues.apache.org/jira/browse/HDFS-9318
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch
>
>
> Currently considerLoad avoids choosing nodes that are too active, so it helps 
> level the HDFS load across the cluster. Under normal conditions, this is 
> desired. However, when a cluster has a large percentage of nearly full nodes, 
> this can make it difficult to find good targets because the placement policy 
> wants to avoid the full nodes, but considerLoad wants to avoid the busy 
> less-full nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995014#comment-14995014
 ] 

Hudson commented on HDFS-9236:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #579 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/579/])
HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: 
rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java


> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Fix For: 2.8.0
>
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, 
> HDFS-9236.006.patch, HDFS-9236.007.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995062#comment-14995062
 ] 

Hudson commented on HDFS-9379:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #649 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/649/])
HDFS-9379. Make NNThroughputBenchmark support more than 10 datanodes. (arp: rev 
2801b42a7e178ad6a0e6b0f29f22f3571969c530)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java


> Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
> --
>
> Key: HDFS-9379
> URL: https://issues.apache.org/jira/browse/HDFS-9379
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9379.000.patch
>
>
> Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on 
> sorted {{datanodes}} array in the lexicographical order of datanode's 
> {{xferAddr}}.
> * There is an assertion of datanode's {{xferAddr}} lexicographical order when 
> filling the {{datanodes}}, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152].
> * When searching the datanode by {{DatanodeInfo}}, it uses binary search 
> against the {{datanodes}} array, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187]
> In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In 
> {{NNThroughputBenchmark}}, the port is simply _the index of the tiny 
> datanode_ plus one.
> The problem here is that, when there are more than 9 tiny datanodes 
> ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will 
> be invalid as the string value of datanode index is not in lexicographical 
> order any more. For example, 
> {code}
> ...
> 192.168.54.40:8
> 192.168.54.40:9
> 192.168.54.40:10
> 192.168.54.40:11
> ...
> {code}
> {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will 
> fail and the binary search won't work.
> The simple fix is to calculate the datanode index by port directly, instead 
> of using binary search.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2261) AOP unit tests are not getting compiled or run

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995038#comment-14995038
 ] 

Hadoop QA commented on HDFS-2261:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 26 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 4s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 16s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 28s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 29s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 58s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 13s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s 
{color} | {color:red} Patch generated 58 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 188m 28s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | hadoop.fs.shell.TestCopyPreserveFlag |
|   | hadoop.ha.TestZKFailoverController |
|   | hadoop.metrics2.impl.TestGangliaMetrics |
|   | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
| JDK v1.7.0_79 Failed junit tests | hadoop.fs.shell.TestCopyPreserveFlag |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | 

[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995047#comment-14995047
 ] 

Hudson commented on HDFS-9379:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2579 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2579/])
HDFS-9379. Make NNThroughputBenchmark support more than 10 datanodes. (arp: rev 
2801b42a7e178ad6a0e6b0f29f22f3571969c530)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
> --
>
> Key: HDFS-9379
> URL: https://issues.apache.org/jira/browse/HDFS-9379
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9379.000.patch
>
>
> Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on 
> sorted {{datanodes}} array in the lexicographical order of datanode's 
> {{xferAddr}}.
> * There is an assertion of datanode's {{xferAddr}} lexicographical order when 
> filling the {{datanodes}}, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152].
> * When searching the datanode by {{DatanodeInfo}}, it uses binary search 
> against the {{datanodes}} array, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187]
> In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In 
> {{NNThroughputBenchmark}}, the port is simply _the index of the tiny 
> datanode_ plus one.
> The problem here is that, when there are more than 9 tiny datanodes 
> ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will 
> be invalid as the string value of datanode index is not in lexicographical 
> order any more. For example, 
> {code}
> ...
> 192.168.54.40:8
> 192.168.54.40:9
> 192.168.54.40:10
> 192.168.54.40:11
> ...
> {code}
> {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will 
> fail and the binary search won't work.
> The simple fix is to calculate the datanode index by port directly, instead 
> of using binary search.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995068#comment-14995068
 ] 

Hadoop QA commented on HDFS-9267:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 5s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 12s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs introduced 1 new FindBugs 
issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 129m 13s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_60. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 182m 48s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_79. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 2m 27s 
{color} | {color:red} Patch generated 57 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 337m 19s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice$BlockPoolSliceReplicaIterator$DirIterator.next()
 can't throw NoSuchElementException  At BlockPoolSlice.java:At 
BlockPoolSlice.java:[line 456] |
| JDK v1.8.0_60 Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.qjournal.TestSecureNNWithQJM |
|   | hadoop.hdfs.TestReplication |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.TestDFSStripedOutputStream |
|   | hadoop.hdfs.server.namenode.TestSecurityTokenEditLog |
|   | 

[jira] [Commented] (HDFS-9379) Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995073#comment-14995073
 ] 

Hudson commented on HDFS-9379:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1372 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1372/])
HDFS-9379. Make NNThroughputBenchmark support more than 10 datanodes. (arp: rev 
2801b42a7e178ad6a0e6b0f29f22f3571969c530)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Make NNThroughputBenchmark$BlockReportStats support more than 10 datanodes
> --
>
> Key: HDFS-9379
> URL: https://issues.apache.org/jira/browse/HDFS-9379
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9379.000.patch
>
>
> Currently, the {{NNThroughputBenchmark}} test {{BlockReportStats}} relies on 
> sorted {{datanodes}} array in the lexicographical order of datanode's 
> {{xferAddr}}.
> * There is an assertion of datanode's {{xferAddr}} lexicographical order when 
> filling the {{datanodes}}, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1152].
> * When searching the datanode by {{DatanodeInfo}}, it uses binary search 
> against the {{datanodes}} array, see [the 
> code|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java#L1187]
> In {{DatanodeID}}, the {{xferAddr}} is defined as {{host:port}}. In 
> {{NNThroughputBenchmark}}, the port is simply _the index of the tiny 
> datanode_ plus one.
> The problem here is that, when there are more than 9 tiny datanodes 
> ({{numThreads}}), the lexicographical order of datanode's {{xferAddr}} will 
> be invalid as the string value of datanode index is not in lexicographical 
> order any more. For example, 
> {code}
> ...
> 192.168.54.40:8
> 192.168.54.40:9
> 192.168.54.40:10
> 192.168.54.40:11
> ...
> {code}
> {{192.168.54.40:9}} is greater than {{192.168.54.40:10}}. The assertion will 
> fail and the binary search won't work.
> The simple fix is to calculate the datanode index by port directly, instead 
> of using binary search.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9381) When same block came for replication for Striped mode, we can move that block to PendingReplications

2015-11-06 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-9381:
--
Description: 
Currently I noticed that we are just returning null if block already exists in 
pendingReplications in replication flow for striped blocks.

{code}
if (block.isStriped()) {
  if (pendingNum > 0) {
// Wait the previous recovery to finish.
return null;
  }
{code}

 Here if we just return null and if neededReplications contains only fewer 
blocks(basically by default if less than numliveNodes*2), then same blocks can 
be picked again from neededReplications from next loop as we are not removing 
element from neededReplications. Since this replication process need to take 
fsnamesystmem lock and do, we may spend some time unnecessarily in every loop. 

So my suggestion/improvement is:
 Instead of just returning null, how about incrementing pendingReplications for 
this block and remove from neededReplications? and also another point to 
consider here is, to add into pendingReplications, generally we need target and 
it is nothing but to which node we issued replication command. Later when after 
replication success and DN reported it, block will be removed from 
pendingReplications from NN addBlock. 

 So since this is newly picked block from neededReplications, we would not have 
selected target yet. So which target to be passed to pendingReplications if we 
add this block? One Option I am thinking is, how about just passing srcNode 
itself as target for this special condition? So, anyway if the block is really 
missed, srcNode will not report it. So this block will not be removed from 
pending replications, so that when it is timed out, it will be considered for 
replication again and that time it will find actual target to replicate while 
processing as part of regular replication flow.


  was:
Currently I noticed that we are just returning null if block already exists in 
pendingReplications in replication flow for striped blocks.

{code}
if (block.isStriped()) {
  if (pendingNum > 0) {
// Wait the previous recovery to finish.
return null;
  }
{code}

 Here if neededReplications contains only fewer blocks(basically by default if 
less than numliveNodes*2), then same blocks can be picked again from 
neededReplications if we just return null as we are not removing element from 
neededReplications. Since this replication process need to take fsnamesystmem 
lock and do, we may spend some time unnecessarily in every loop. 

So my suggestion/improvement is:
 Instead of just returning null, how about incrementing pendingReplications for 
this block and remove from neededReplications? and also another point to 
consider here is, to add into pendingReplications, generally we need target and 
it is nothing to which node we issued replication command. Later when after 
replication success and DN reported it, block will be removed from 
pendingReplications from NN addBlock. 

 So since this is newly picked block from neededReplications, we would not have 
selected target yet. So which target to be passed to pendingReplications if we 
add this block.. One Option I am thinking is, how about just passing srcNode 
itself as target for this special condition? So, anyway if block is really 
missed, srcNode anyway will not report it. So this block will not be removed 
from pending replications, so that when it timeout, it will be considered for 
replication and that time it will find actual target to replicate.





 


 So  


> When same block came for replication for Striped mode, we can move that block 
> to PendingReplications
> 
>
> Key: HDFS-9381
> URL: https://issues.apache.org/jira/browse/HDFS-9381
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Affects Versions: 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> Currently I noticed that we are just returning null if block already exists 
> in pendingReplications in replication flow for striped blocks.
> {code}
> if (block.isStriped()) {
>   if (pendingNum > 0) {
> // Wait the previous recovery to finish.
> return null;
>   }
> {code}
>  Here if we just return null and if neededReplications contains only fewer 
> blocks(basically by default if less than numliveNodes*2), then same blocks 
> can be picked again from neededReplications from next loop as we are not 
> removing element from neededReplications. Since this replication process need 
> to take fsnamesystmem lock and do, we may spend some time unnecessarily in 
> every loop. 
> So my suggestion/improvement is:
>  Instead of just returning null, how about incrementing 

[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995015#comment-14995015
 ] 

Hudson commented on HDFS-6481:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #579 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/579/])
HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 
0b18e5e8c69b40c9a446fff448d38e0dd10cb45e)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java


> DatanodeManager#getDatanodeStorageInfos() should check the length of 
> storageIDs
> ---
>
> Key: HDFS-6481
> URL: https://issues.apache.org/jira/browse/HDFS-6481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.7.3
>
> Attachments: h6481_20151105.patch, hdfs-6481-v1.txt
>
>
> Ian Brooks reported the following stack trace:
> {code}
> 2014-06-03 13:05:03,915 WARN  [DataStreamer for file 
> /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200
>  block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] 
> hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
>  0
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
> at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> 

[jira] [Commented] (HDFS-9267) TestDiskError should get stored replicas through FsDatasetTestUtils.

2015-11-06 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995112#comment-14995112
 ] 

Lei (Eddy) Xu commented on HDFS-9267:
-

Will fix these tests in the next patch.

> TestDiskError should get stored replicas through FsDatasetTestUtils.
> 
>
> Key: HDFS-9267
> URL: https://issues.apache.org/jira/browse/HDFS-9267
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-9267.00.patch, HDFS-9267.01.patch, 
> HDFS-9267.02.patch, HDFS-9267.03.patch
>
>
> {{TestDiskError#testReplicationError}} scans local directories to verify 
> blocks and metadata files, which leaks the details of {{FsDataset}} 
> implementation. 
> This JIRA will abstract the "scanning" operation to {{FsDatasetTestUtils}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9336) deleteSnapshot throws NPE when snapshotname is null

2015-11-06 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993340#comment-14993340
 ] 

Brahma Reddy Battula commented on HDFS-9336:


Test failures are unrelated,Kindly review!!

> deleteSnapshot throws NPE when snapshotname is null
> ---
>
> Key: HDFS-9336
> URL: https://issues.apache.org/jira/browse/HDFS-9336
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9336-002.patch, HDFS-9336.patch
>
>
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$DeleteSnapshotRequestProto$Builder.setSnapshotName(ClientNamenodeProtocolProtos.java:17509)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.deleteSnapshot(ClientNamenodeProtocolTranslatorPB.java:1005)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy15.deleteSnapshot(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.deleteSnapshot(DFSClient.java:2106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1660)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.deleteSnapshot(DistributedFileSystem.java:1677)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testWebHdfsAllowandDisallowSnapshots(TestWebHDFS.java:380)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9384) TestWebHdfsContentLength intermittently hangs and fails due to TCP conversation mismatch between client and server.

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993341#comment-14993341
 ] 

Hudson commented on HDFS-9384:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #636 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/636/])
HDFS-9384. TestWebHdfsContentLength intermittently hangs and fails due 
(cnauroth: rev 66c096731052fb187dc49f5bcaec8432c4b92d0c)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsContentLength.java


> TestWebHdfsContentLength intermittently hangs and fails due to TCP 
> conversation mismatch between client and server.
> ---
>
> Key: HDFS-9384
> URL: https://issues.apache.org/jira/browse/HDFS-9384
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9384.001.patch, HDFS-9384.002.patch, 
> HDFS-9384.003.patch
>
>
> {{TestWebHdfsContentLength}} runs a simple hand-coded HTTP server in a 
> background thread to simulate some WebHDFS server responses.  In some 
> environments (notably Windows), I have observed that the test can hang and 
> fail intermittently.  The root cause is that the server fails to fully 
> consume the client's input.  This causes a mismatch in the TCP conversation 
> state, and ultimately the client side hangs, then aborts after the 60-second 
> socket timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8287) DFSStripedOutputStream.writeChunk should not wait for writing parity

2015-11-06 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki reassigned HDFS-8287:


Assignee: Kai Sasaki  (was: Kai Sasaki)

> DFSStripedOutputStream.writeChunk should not wait for writing parity 
> -
>
> Key: HDFS-8287
> URL: https://issues.apache.org/jira/browse/HDFS-8287
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Kai Sasaki
> Attachments: HDFS-8287-HDFS-7285.00.patch, 
> HDFS-8287-HDFS-7285.01.patch, HDFS-8287-HDFS-7285.02.patch, 
> HDFS-8287-HDFS-7285.03.patch, HDFS-8287-HDFS-7285.04.patch, 
> HDFS-8287-HDFS-7285.05.patch, HDFS-8287-HDFS-7285.06.patch, 
> HDFS-8287-HDFS-7285.07.patch, HDFS-8287-HDFS-7285.08.patch, 
> HDFS-8287-HDFS-7285.09.patch, HDFS-8287-HDFS-7285.10.patch, 
> HDFS-8287-HDFS-7285.11.patch, HDFS-8287-HDFS-7285.WIP.patch, 
> HDFS-8287-performance-report.pdf, HDFS-8287.12.patch, HDFS-8287.13.patch, 
> HDFS-8287.14.patch, HDFS-8287.15.patch, h8287_20150911.patch, jstack-dump.txt
>
>
> When a stripping cell is full, writeChunk computes and generates parity 
> packets.  It sequentially calls waitAndQueuePacket so that user client cannot 
> continue to write data until it finishes.
> We should allow user client to continue writing instead but not blocking it 
> when writing parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8197) [umbrella] System tests for EC feature

2015-11-06 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki reassigned HDFS-8197:


Assignee: Kai Sasaki  (was: Kai Sasaki)

> [umbrella] System tests for EC feature
> --
>
> Key: HDFS-8197
> URL: https://issues.apache.org/jira/browse/HDFS-8197
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>  Labels: system-tests, test
>
> This is umbrella JIRA for system test of EC feature.
> All sub-tasks and test cases are listed under this ticket. All items which 
> are assumed to be tested are here.
> * Create/Delete EC File
> * Create/Delete ECZone
> * teragen against EC files
> * terasort against EC files
> * teravalidate against EC files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7717) Erasure Coding: distribute replication to stripping erasure coding conversion work to DataNode

2015-11-06 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki reassigned HDFS-7717:


Assignee: Kai Sasaki  (was: Kai Sasaki)

> Erasure Coding: distribute replication to stripping erasure coding conversion 
> work to DataNode
> --
>
> Key: HDFS-7717
> URL: https://issues.apache.org/jira/browse/HDFS-7717
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Kai Sasaki
>
> In *stripping* erasure coding case, we need some approach to distribute 
> conversion work between replication and stripping erasure coding to DataNode. 
> It can be NameNode, or a tool utilizing MR just like the current distcp, or 
> another one like the balancer/mover. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8613) Add option to list up allowed hosts that can do any operation to NameNode.

2015-11-06 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki reassigned HDFS-8613:


Assignee: Kai Sasaki  (was: Kai Sasaki)

> Add option to list up allowed hosts that can do any operation to NameNode.
> --
>
> Key: HDFS-8613
> URL: https://issues.apache.org/jira/browse/HDFS-8613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Minor
>
> Current NameNode receives all operations through client protocol from any 
> hosts. 
> However, some critical operations such as {{format}} should be restricted 
> with not only Kerberos authentication but also with host names in order to 
> prevent us from formatting NameNode by mistake. It is better to add option to 
> write some allowed hosts which can do any operations to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8020) Erasure Coding: restore BlockGroup and schema info from stripping coding command

2015-11-06 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki reassigned HDFS-8020:


Assignee: Kai Sasaki  (was: Kai Sasaki)

> Erasure Coding: restore BlockGroup and schema info from stripping coding 
> command
> 
>
> Key: HDFS-8020
> URL: https://issues.apache.org/jira/browse/HDFS-8020
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Kai Sasaki
>
> As a task of HDFS-7344, to process *stripping* coding commands from NameNode 
> or other scheduler services/tools, we need to first be able to restore 
> BlockGroup and schema information in DataNode, which will be used to 
> construct and perform coding work using {{ErasureCoder}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8931) Erasure Coding: Notify exception to client side from ParityGenerator.

2015-11-06 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki reassigned HDFS-8931:


Assignee: Kai Sasaki  (was: Kai Sasaki)

> Erasure Coding: Notify exception to client side from ParityGenerator.
> -
>
> Key: HDFS-8931
> URL: https://issues.apache.org/jira/browse/HDFS-8931
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>  Labels: EC
> Fix For: HDFS-7285
>
>
> Following HDFS-8287. 
> Current client thread catch up the exception from {{ParityGenerator}}. In 
> order to handle properly, 
> 1. Put together handling logic into UncaughtExceptionHandler.
> 2. Notify exception to client side from UncaughtExceptionHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8196) Erasure Coding related information on NameNode UI

2015-11-06 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki reassigned HDFS-8196:


Assignee: Kai Sasaki  (was: Kai Sasaki)

> Erasure Coding related information on NameNode UI
> -
>
> Key: HDFS-8196
> URL: https://issues.apache.org/jira/browse/HDFS-8196
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-7285
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>  Labels: NameNode, WebUI
>
> NameNode WebUI shows EC related information and metrics. 
> This is depend on [HDFS-7674|https://issues.apache.org/jira/browse/HDFS-7674].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-11-06 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993370#comment-14993370
 ] 

Yi Liu commented on HDFS-9276:
--

In the test, you get the token1/token2 for current login user, suppose it's 
UserA.   When you use the token in other places, you should doAs that user 
"UserA", right?  You doAs a different user "Test" (a remote login user), that 
means the service (spark executor)  will use UserA's delegation token to access 
HDFS on behalf of "Test" user, is it right?


> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Commented] (HDFS-9384) TestWebHdfsContentLength intermittently hangs and fails due to TCP conversation mismatch between client and server.

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993391#comment-14993391
 ] 

Hudson commented on HDFS-9384:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2576 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2576/])
HDFS-9384. TestWebHdfsContentLength intermittently hangs and fails due 
(cnauroth: rev 66c096731052fb187dc49f5bcaec8432c4b92d0c)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsContentLength.java


> TestWebHdfsContentLength intermittently hangs and fails due to TCP 
> conversation mismatch between client and server.
> ---
>
> Key: HDFS-9384
> URL: https://issues.apache.org/jira/browse/HDFS-9384
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9384.001.patch, HDFS-9384.002.patch, 
> HDFS-9384.003.patch
>
>
> {{TestWebHdfsContentLength}} runs a simple hand-coded HTTP server in a 
> background thread to simulate some WebHDFS server responses.  In some 
> environments (notably Windows), I have observed that the test can hang and 
> fail intermittently.  The root cause is that the server fails to fully 
> consume the client's input.  This causes a mismatch in the TCP conversation 
> state, and ultimately the client side hangs, then aborts after the 60-second 
> socket timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-11-06 Thread Liangliang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993425#comment-14993425
 ] 

Liangliang Gu commented on HDFS-9276:
-

I think Current User's Name instead of "Test" should be used.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at 

[jira] [Updated] (HDFS-8968) New benchmark throughput tool for striping erasure coding

2015-11-06 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HDFS-8968:
-
Attachment: HDFS-8968.4.patch

Address Rakesh's comments.
A new test is added to test the new benchmark tool, in which the tool runs with 
a MiniDFSCluster.

> New benchmark throughput tool for striping erasure coding
> -
>
> Key: HDFS-8968
> URL: https://issues.apache.org/jira/browse/HDFS-8968
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-8968-HDFS-7285.1.patch, 
> HDFS-8968-HDFS-7285.2.patch, HDFS-8968.3.patch, HDFS-8968.4.patch
>
>
> We need a new benchmark tool to measure the throughput of client writing and 
> reading considering cases or factors:
> * 3-replica or striping;
> * write or read, stateful read or positional read;
> * which erasure coder;
> * striping cell size;
> * concurrent readers/writers using processes or threads.
> The tool should be easy to use and better to avoid unnecessary local 
> environment impact, like local disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9384) TestWebHdfsContentLength intermittently hangs and fails due to TCP conversation mismatch between client and server.

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993394#comment-14993394
 ] 

Hudson commented on HDFS-9384:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1369 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1369/])
HDFS-9384. TestWebHdfsContentLength intermittently hangs and fails due 
(cnauroth: rev 66c096731052fb187dc49f5bcaec8432c4b92d0c)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsContentLength.java


> TestWebHdfsContentLength intermittently hangs and fails due to TCP 
> conversation mismatch between client and server.
> ---
>
> Key: HDFS-9384
> URL: https://issues.apache.org/jira/browse/HDFS-9384
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9384.001.patch, HDFS-9384.002.patch, 
> HDFS-9384.003.patch
>
>
> {{TestWebHdfsContentLength}} runs a simple hand-coded HTTP server in a 
> background thread to simulate some WebHDFS server responses.  In some 
> environments (notably Windows), I have observed that the test can hang and 
> fail intermittently.  The root cause is that the server fails to fully 
> consume the client's input.  This causes a mismatch in the TCP conversation 
> state, and ultimately the client side hangs, then aborts after the 60-second 
> socket timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-11-06 Thread Liangliang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liangliang Gu updated HDFS-9276:

Attachment: HDFS-9276.13.patch

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at 

[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-11-06 Thread Liangliang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993426#comment-14993426
 ] 

Liangliang Gu commented on HDFS-9276:
-

I think Current User's Name instead of "Test" should be used.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at 

[jira] [Commented] (HDFS-9328) Formalize coding standards for libhdfs++ and put them in a README.txt

2015-11-06 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993686#comment-14993686
 ] 

Bob Hansen commented on HDFS-9328:
--

I'm sure you don't disagree with the principle in (4), that we should use 
compiler-specific statements unless necessary.  Including a concrete example of 
where we need to (short circuit reads) even before we get there is, I think, a 
good thing.

I'll accept the clang-format.

> Formalize coding standards for libhdfs++ and put them in a README.txt
> -
>
> Key: HDFS-9328
> URL: https://issues.apache.org/jira/browse/HDFS-9328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
>Priority: Blocker
> Attachments: HDFS-9328.HDFS-8707.000.patch
>
>
> We have 2-3 people working on this project full time and hopefully more 
> people will start contributing.  In order to efficiently scale we need a 
> single, easy to find, place where developers can check to make sure they are 
> following the coding standards of this project to both save their time and 
> save the time of people doing code reviews.
> The most practical place to do this seems like a README file in libhdfspp/. 
> The foundation of the standards is google's C++ guide found here: 
> https://google-styleguide.googlecode.com/svn/trunk/cppguide.html
> Any exceptions to google's standards or additional restrictions need to be 
> explicitly enumerated so there is one single point of reference for all 
> libhdfs++ code standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9328) Formalize coding standards for libhdfs++ and put them in a README.txt

2015-11-06 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994193#comment-14994193
 ] 

James Clampffer commented on HDFS-9328:
---

Good idea on the markdown.

I'd really like this to be a complete set of rules to avoid new rule surprises 
down the road. I used short circuit as an example because I happened to know 
that that'd be an exception.  There's plenty of other places where I could see 
adding that sort of stuff if I was only concerned about x86-64.

I'd hate for someone to work really hard on a patch that does some really cool 
but platform specific optimizations and then have the idea shot down during 
code review.



> Formalize coding standards for libhdfs++ and put them in a README.txt
> -
>
> Key: HDFS-9328
> URL: https://issues.apache.org/jira/browse/HDFS-9328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
>Priority: Blocker
> Attachments: HDFS-9328.HDFS-8707.000.patch
>
>
> We have 2-3 people working on this project full time and hopefully more 
> people will start contributing.  In order to efficiently scale we need a 
> single, easy to find, place where developers can check to make sure they are 
> following the coding standards of this project to both save their time and 
> save the time of people doing code reviews.
> The most practical place to do this seems like a README file in libhdfspp/. 
> The foundation of the standards is google's C++ guide found here: 
> https://google-styleguide.googlecode.com/svn/trunk/cppguide.html
> Any exceptions to google's standards or additional restrictions need to be 
> explicitly enumerated so there is one single point of reference for all 
> libhdfs++ code standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9328) Formalize coding standards for libhdfs++ and put them in a README.txt

2015-11-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994229#comment-14994229
 ] 

Steve Loughran commented on HDFS-9328:
--

think it's Power. Nobody owns up to Itanium as nobody has the power budget to 
build up a rack of  enough nodes for 3x redundancy to work as a storage 
mechanism

> Formalize coding standards for libhdfs++ and put them in a README.txt
> -
>
> Key: HDFS-9328
> URL: https://issues.apache.org/jira/browse/HDFS-9328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
>Priority: Blocker
> Attachments: HDFS-9328.HDFS-8707.000.patch
>
>
> We have 2-3 people working on this project full time and hopefully more 
> people will start contributing.  In order to efficiently scale we need a 
> single, easy to find, place where developers can check to make sure they are 
> following the coding standards of this project to both save their time and 
> save the time of people doing code reviews.
> The most practical place to do this seems like a README file in libhdfspp/. 
> The foundation of the standards is google's C++ guide found here: 
> https://google-styleguide.googlecode.com/svn/trunk/cppguide.html
> Any exceptions to google's standards or additional restrictions need to be 
> explicitly enumerated so there is one single point of reference for all 
> libhdfs++ code standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in client package

2015-11-06 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994284#comment-14994284
 ] 

Mingliang Liu commented on HDFS-8971:
-

Thanks for reporting this [~szetszwo]. I totally agree with you that we should 
consider one-line message in {{ByteArrayManager}}. It's for sure easy to read, 
especially in case of multiple-threads. Perhaps we can simply revert the 
changes in this class? I revisited the patch and other classes should be fine. 

> Remove guards when calling LOG.debug() and LOG.trace() in client package
> 
>
> Key: HDFS-8971
> URL: https://issues.apache.org/jira/browse/HDFS-8971
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8971.000.patch, HDFS-8971.001.patch
>
>
> We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to 
> {{hadoop-hdfs-client}} module in JIRA 
> [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and 
> [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951], and 
> {{BlockReader}} in 
> [HDFS-8925|https://issues.apache.org/jira/browse/HDFS-8925]. Meanwhile, we 
> also replaced the _log4j_ log with _slf4j_ logger. There were existing code 
> in the client package to guard the log when calling {{LOG.debug()}} and 
> {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this:
> {code:title=Trace with guards|borderStyle=solid}
> 724if (LOG.isTraceEnabled()) {
> 725  LOG.trace(this + ": found waitable for " + key);
> 726}
> {code}
> In _slf4j_, this kind of guard is not necessary. We should clean the code by 
> removing the guard from the client package.
> {code:title=Trace without guards|borderStyle=solid}
> 724LOG.trace("{}: found waitable for {}", this, key);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-06 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163.003.patch

Fixed the checkstyle and findbugs warnings. None of the unit tests listed above 
failed in my own build environment.

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.

2015-11-06 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9394:

Status: Patch Available  (was: Open)

> branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader 
> initialization, because HftpFileSystem is missing.
> 
>
> Key: HDFS-9394
> URL: https://issues.apache.org/jira/browse/HDFS-9394
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Chris Nauroth
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-9394.000.branch-2.patch
>
>
> On branch-2, hadoop-hdfs-client contains a {{FileSystem}} service descriptor 
> that lists {{HftpFileSystem}} and {{HsftpFileSystem}}.  These classes do not 
> reside in hadoop-hdfs-client.  Instead, they reside in hadoop-hdfs.  If the 
> application has hadoop-hdfs-client.jar on the classpath, but not 
> hadoop-hdfs.jar, then this can cause a {{ServiceConfigurationError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-9258) NN should indicate which nodes are stale

2015-11-06 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9258 started by Kuhu Shukla.
-
> NN should indicate which nodes are stale
> 
>
> Key: HDFS-9258
> URL: https://issues.apache.org/jira/browse/HDFS-9258
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Kuhu Shukla
>
> Determining why the NN is not coming out of safemode is difficult - is it a 
> bug or pending block reports?  If the number of nodes appears sufficient, but 
> there are missing blocks, it would be nice to know which nodes haven't block 
> reported (stale).  Instead of forcing the NN to leave safemode prematurely, 
> the SE can first force block reports from stale nodes.
> The datanode report and the web ui's node list should contain this 
> information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.

2015-11-06 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9394:

Attachment: HDFS-9394.000.branch-2.patch

Thank you [~cnauroth] for reporting this. As [~wheat9] said, when we separated 
the classes to {{hadoop-hdfs-client}}, we tried to address this in [HDFS-9166]. 
I think the original patch should work just fine, but it was probably not fully 
committed.

Hopefully the fix is simple. Let's see if the v0 patch works.

> branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader 
> initialization, because HftpFileSystem is missing.
> 
>
> Key: HDFS-9394
> URL: https://issues.apache.org/jira/browse/HDFS-9394
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Chris Nauroth
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-9394.000.branch-2.patch
>
>
> On branch-2, hadoop-hdfs-client contains a {{FileSystem}} service descriptor 
> that lists {{HftpFileSystem}} and {{HsftpFileSystem}}.  These classes do not 
> reside in hadoop-hdfs-client.  Instead, they reside in hadoop-hdfs.  If the 
> application has hadoop-hdfs-client.jar on the classpath, but not 
> hadoop-hdfs.jar, then this can cause a {{ServiceConfigurationError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.

2015-11-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994235#comment-14994235
 ] 

Yongjun Zhang commented on HDFS-9249:
-

Thanks [~jojochuang] for the new rev, +1 pending jenkins.


> NPE thrown if an IOException is thrown in NameNode.
> -
>
> Key: HDFS-9249
> URL: https://issues.apache.org/jira/browse/HDFS-9249
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-9249.001.patch, HDFS-9249.002.patch, 
> HDFS-9249.003.patch, HDFS-9249.004.patch, HDFS-9249.005.patch, 
> HDFS-9249.006.patch
>
>
> This issue was found when running test case 
> TestBackupNode.testCheckpointNode, but upon closer look, the problem is not 
> due to the test case.
> Looks like an IOException was thrown in
> try {
>   initializeGenericKeys(conf, nsId, namenodeId);
>   initialize(conf);
>   try {
> haContext.writeLock();
> state.prepareToEnterState(haContext);
> state.enterState(haContext);
>   } finally {
> haContext.writeUnlock();
>   }
> causing the namenode to stop, but the namesystem was not yet properly 
> instantiated, causing NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what 
> caused the IOException, I suggest make this a supportability JIRA to log the 
> exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode 
> (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint]
> 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started 
> (again)
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is 
> hdfs://localhost:37835
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use 
> localhost:37835 to access this namenode/service.
> 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster
> 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem 
> (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for 
> active state
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time 
> for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of 
> syncs: 4 SyncTimes(ms): 2 1 
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, 
> exiting
> 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003
> 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_001-003
> 

[jira] [Updated] (HDFS-9258) NN should indicate which nodes are stale

2015-11-06 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated HDFS-9258:
--
Attachment: HDFS-9258-v1.patch

Added isStale to jmx info. Added an {{isStale()}} call to be on DNInfo and 
replaced the old one where ever it was possible. Also 
{{chooseDatanodesForCaching()}} was a static call which is called only once 
from  {{addNewPendingCached()}} which is non-static. Hence moving 
chooseDatanodesForCaching to non-static method.

> NN should indicate which nodes are stale
> 
>
> Key: HDFS-9258
> URL: https://issues.apache.org/jira/browse/HDFS-9258
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Kuhu Shukla
> Attachments: HDFS-9258-v1.patch
>
>
> Determining why the NN is not coming out of safemode is difficult - is it a 
> bug or pending block reports?  If the number of nodes appears sufficient, but 
> there are missing blocks, it would be nice to know which nodes haven't block 
> reported (stale).  Instead of forcing the NN to leave safemode prematurely, 
> the SE can first force block reports from stale nodes.
> The datanode report and the web ui's node list should contain this 
> information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9236) Missing sanity check for block size during block recovery

2015-11-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9236:

Target Version/s: 2.8.0  (was: 2.7.3)

> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch, 
> HDFS-9236.006.patch, HDFS-9236.007.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994263#comment-14994263
 ] 

Hudson commented on HDFS-6481:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8768 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8768/])
HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 
0b18e5e8c69b40c9a446fff448d38e0dd10cb45e)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> DatanodeManager#getDatanodeStorageInfos() should check the length of 
> storageIDs
> ---
>
> Key: HDFS-6481
> URL: https://issues.apache.org/jira/browse/HDFS-6481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.7.2
>
> Attachments: h6481_20151105.patch, hdfs-6481-v1.txt
>
>
> Ian Brooks reported the following stack trace:
> {code}
> 2014-06-03 13:05:03,915 WARN  [DataStreamer for file 
> /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200
>  block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] 
> hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
>  0
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
> at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> 

[jira] [Updated] (HDFS-9258) NN should indicate which nodes are stale

2015-11-06 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated HDFS-9258:
--
Status: Patch Available  (was: In Progress)

> NN should indicate which nodes are stale
> 
>
> Key: HDFS-9258
> URL: https://issues.apache.org/jira/browse/HDFS-9258
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Kuhu Shukla
> Attachments: HDFS-9258-v1.patch
>
>
> Determining why the NN is not coming out of safemode is difficult - is it a 
> bug or pending block reports?  If the number of nodes appears sufficient, but 
> there are missing blocks, it would be nice to know which nodes haven't block 
> reported (stale).  Instead of forcing the NN to leave safemode prematurely, 
> the SE can first force block reports from stale nodes.
> The datanode report and the web ui's node list should contain this 
> information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9318) considerLoad factor can be improved

2015-11-06 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994281#comment-14994281
 ] 

Kihwal Lee commented on HDFS-9318:
--

+1 lgtm

> considerLoad factor can be improved
> ---
>
> Key: HDFS-9318
> URL: https://issues.apache.org/jira/browse/HDFS-9318
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9318-v1.patch, HDFS-9318-v2.patch
>
>
> Currently considerLoad avoids choosing nodes that are too active, so it helps 
> level the HDFS load across the cluster. Under normal conditions, this is 
> desired. However, when a cluster has a large percentage of nearly full nodes, 
> this can make it difficult to find good targets because the placement policy 
> wants to avoid the full nodes, but considerLoad wants to avoid the busy 
> less-full nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs

2015-11-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994298#comment-14994298
 ] 

Hudson commented on HDFS-6481:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #637 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/637/])
HDFS-6481. DatanodeManager#getDatanodeStorageInfos() should check the (arp: rev 
0b18e5e8c69b40c9a446fff448d38e0dd10cb45e)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCommitBlockSynchronization.java


> DatanodeManager#getDatanodeStorageInfos() should check the length of 
> storageIDs
> ---
>
> Key: HDFS-6481
> URL: https://issues.apache.org/jira/browse/HDFS-6481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.7.2
>
> Attachments: h6481_20151105.patch, hdfs-6481-v1.txt
>
>
> Ian Brooks reported the following stack trace:
> {code}
> 2014-06-03 13:05:03,915 WARN  [DataStreamer for file 
> /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200
>  block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] 
> hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
>  0
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
> at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> 

[jira] [Updated] (HDFS-9383) TestByteArrayManager#testByteArrayManager fails

2015-11-06 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-9383:
--
Assignee: Tsz Wo Nicholas Sze

Thanks for filing the issue.  Let me take a look.

> TestByteArrayManager#testByteArrayManager fails
> ---
>
> Key: HDFS-9383
> URL: https://issues.apache.org/jira/browse/HDFS-9383
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Tsz Wo Nicholas Sze
> Attachments: hdfs-9383.log
>
>
> This was seen in the trunk builds
> https://builds.apache.org/job/Hadoop-Hdfs-trunk
> {noformat}
> Running org.apache.hadoop.hdfs.util.TestByteArrayManager
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.539 sec <<< 
> FAILURE!
>  - in org.apache.hadoop.hdfs.util.TestByteArrayManager
> testByteArrayManager(org.apache.hadoop.hdfs.util.TestByteArrayManager)  Time 
> elapsed: 5.409 sec  <<< FAILURE!
> java.lang.AssertionError: expected null, but was:<[32: 2/64, free=5]>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.hdfs.util.TestByteArrayManager.testByteArrayManager(TestByteArrayManager.java:384)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9328) Formalize coding standards for libhdfs++ and put them in a README.txt

2015-11-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993777#comment-14993777
 ] 

Steve Loughran commented on HDFS-9328:
--

+1 for markdown, consider putting into Hadoop site docs proper.

mention to not assume x86 arch/little endian, non-aligned word access; these 
are the eternal troublespots in native code patches

> Formalize coding standards for libhdfs++ and put them in a README.txt
> -
>
> Key: HDFS-9328
> URL: https://issues.apache.org/jira/browse/HDFS-9328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
>Priority: Blocker
> Attachments: HDFS-9328.HDFS-8707.000.patch
>
>
> We have 2-3 people working on this project full time and hopefully more 
> people will start contributing.  In order to efficiently scale we need a 
> single, easy to find, place where developers can check to make sure they are 
> following the coding standards of this project to both save their time and 
> save the time of people doing code reviews.
> The most practical place to do this seems like a README file in libhdfspp/. 
> The foundation of the standards is google's C++ guide found here: 
> https://google-styleguide.googlecode.com/svn/trunk/cppguide.html
> Any exceptions to google's standards or additional restrictions need to be 
> explicitly enumerated so there is one single point of reference for all 
> libhdfs++ code standards.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6481) DatanodeManager#getDatanodeStorageInfos() should check the length of storageIDs

2015-11-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993788#comment-14993788
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6481:
---

The findbugs warning and the failed tests are not related to the patch.

> DatanodeManager#getDatanodeStorageInfos() should check the length of 
> storageIDs
> ---
>
> Key: HDFS-6481
> URL: https://issues.apache.org/jira/browse/HDFS-6481
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Ted Yu
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: h6481_20151105.patch, hdfs-6481-v1.txt
>
>
> Ian Brooks reported the following stack trace:
> {code}
> 2014-06-03 13:05:03,915 WARN  [DataStreamer for file 
> /user/hbase/WALs/,16020,1401716790638/%2C16020%2C1401716790638.1401796562200
>  block BP-2121456822-10.143.38.149-1396953188241:blk_1074073683_332932] 
> hdfs.DFSClient: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException):
>  0
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:467)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2779)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:594)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:430)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy13.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy14.getAdditionalDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
> at com.sun.proxy.$Proxy15.getAdditionalDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)
> 2014-06-03 13:05:48,489 ERROR [RpcServer.handler=22,port=16020] wal.FSHLog: 
> syncer encountered error, will retry. txid=211
> 

[jira] [Commented] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in client package

2015-11-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993829#comment-14993829
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8971:
---

Hi, I appreciate using the new log API in order to make the code shorter in 
general but it may not be possible to remove the isDebugEnabled guards and 
keeping the same debug message format at the same time.

It is hard to read the debug messages printed by ByteArrayManager after the 
change here since some one-line messages are printed in multiple lines.  It is 
even worse when there are multiple threads.  See the output of 
TestByteArrayManager (or the attached log in HDFS-9383).

> Remove guards when calling LOG.debug() and LOG.trace() in client package
> 
>
> Key: HDFS-8971
> URL: https://issues.apache.org/jira/browse/HDFS-8971
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8971.000.patch, HDFS-8971.001.patch
>
>
> We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to 
> {{hadoop-hdfs-client}} module in JIRA 
> [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and 
> [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951], and 
> {{BlockReader}} in 
> [HDFS-8925|https://issues.apache.org/jira/browse/HDFS-8925]. Meanwhile, we 
> also replaced the _log4j_ log with _slf4j_ logger. There were existing code 
> in the client package to guard the log when calling {{LOG.debug()}} and 
> {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this:
> {code:title=Trace with guards|borderStyle=solid}
> 724if (LOG.isTraceEnabled()) {
> 725  LOG.trace(this + ": found waitable for " + key);
> 726}
> {code}
> In _slf4j_, this kind of guard is not necessary. We should clean the code by 
> removing the guard from the client package.
> {code:title=Trace without guards|borderStyle=solid}
> 724LOG.trace("{}: found waitable for {}", this, key);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.

2015-11-06 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9249:
--
Attachment: HDFS-9249.006.patch

Removed an extra space to meet code style expectation.
The ASF licenses warnings are false positives.

> NPE thrown if an IOException is thrown in NameNode.
> -
>
> Key: HDFS-9249
> URL: https://issues.apache.org/jira/browse/HDFS-9249
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-9249.001.patch, HDFS-9249.002.patch, 
> HDFS-9249.003.patch, HDFS-9249.004.patch, HDFS-9249.005.patch, 
> HDFS-9249.006.patch
>
>
> This issue was found when running test case 
> TestBackupNode.testCheckpointNode, but upon closer look, the problem is not 
> due to the test case.
> Looks like an IOException was thrown in
> try {
>   initializeGenericKeys(conf, nsId, namenodeId);
>   initialize(conf);
>   try {
> haContext.writeLock();
> state.prepareToEnterState(haContext);
> state.enterState(haContext);
>   } finally {
> haContext.writeUnlock();
>   }
> causing the namenode to stop, but the namesystem was not yet properly 
> instantiated, causing NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what 
> caused the IOException, I suggest make this a supportability JIRA to log the 
> exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode 
> (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint]
> 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started 
> (again)
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is 
> hdfs://localhost:37835
> 2015-10-14 19:45:07,808 INFO namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use 
> localhost:37835 to access this namenode/service.
> 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster
> 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem 
> (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for 
> active state
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
> (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time 
> for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of 
> syncs: 4 SyncTimes(ms): 2 1 
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
> (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, 
> exiting
> 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001
>  -> 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003
> 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager 
> (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_001
>  -> 
> 

[jira] [Commented] (HDFS-9383) TestByteArrayManager#testByteArrayManager fails

2015-11-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993856#comment-14993856
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9383:
---

After the change in HDFS-8971, the debug log (i.e. the posted log) became very 
long and very hard to read.  We probably need to fix the log first.

> TestByteArrayManager#testByteArrayManager fails
> ---
>
> Key: HDFS-9383
> URL: https://issues.apache.org/jira/browse/HDFS-9383
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Tsz Wo Nicholas Sze
> Attachments: hdfs-9383.log
>
>
> This was seen in the trunk builds
> https://builds.apache.org/job/Hadoop-Hdfs-trunk
> {noformat}
> Running org.apache.hadoop.hdfs.util.TestByteArrayManager
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.539 sec <<< 
> FAILURE!
>  - in org.apache.hadoop.hdfs.util.TestByteArrayManager
> testByteArrayManager(org.apache.hadoop.hdfs.util.TestByteArrayManager)  Time 
> elapsed: 5.409 sec  <<< FAILURE!
> java.lang.AssertionError: expected null, but was:<[32: 2/64, free=5]>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.hdfs.util.TestByteArrayManager.testByteArrayManager(TestByteArrayManager.java:384)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9375) Set balancer bandwidth for specific node

2015-11-06 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9375:

Attachment: HDFS-9375.002.patch

Thanks [~brahmareddy] suggestions. I change the usage of exist 
setBalancerBandwidth command.{{[-setBalancerBandwidth  
]}} 
1.if the param datanode_host is null,its function is the same as before, is 
setting for every node.
2.if the detained_host has one or more hostname values with separator "," ,it 
will set bandwidth for these specific nodes.
I update the patch.

> Set balancer bandwidth for specific node
> 
>
> Key: HDFS-9375
> URL: https://issues.apache.org/jira/browse/HDFS-9375
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9375.001.patch, HDFS-9375.002.patch
>
>
> Now even though the balancer is impove a lot, but In some cases that is still 
> slow. For example when the cluster is extended,the new nodes all need 
> balancer datas from existed nodes.In order to improve the balancer 
> velocity,generally,we will use {{setBalancerBandwidth}} of {{dfsadmin}} 
> command.But this is set for every node, obviously,we can increase more 
> bandwidth for new nodes because these nodes lacking of data.When the new 
> nodes balancer data enough,we can let new nodes to work.So we can define a 
> new clientDatanode interface to set specific node's bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in client package

2015-11-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993864#comment-14993864
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8971:
---

- Before the change
{code}
2015-11-06 23:56:12,610 [pool-1-thread-3] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:logDebugMessage(48)) - allocate(225): count=2, 
belowThreshold, return byte[256]
2015-11-06 23:56:12,610 [pool-1-thread-6] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:logDebugMessage(48)) - allocate(208): count=3, 
belowThreshold, return byte[256]
2015-11-06 23:56:12,610 [pool-1-thread-5] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:logDebugMessage(48)) - allocate(7): count=1, 
belowThreshold, return byte[32]
2015-11-06 23:56:12,610 [pool-1-thread-8] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:logDebugMessage(48)) - recycle: array.length=128, 
freeQueueSize=-1
{code}
- After the change
{code}
2015-11-06 23:50:52,202 [pool-1-thread-2] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(228)
2015-11-06 23:50:52,202 [pool-1-thread-1] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(110)
2015-11-06 23:50:52,204 [pool-1-thread-2] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(342)) - : count=1, belowThreshold
2015-11-06 23:50:52,205 [pool-1-thread-1] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(342)) - : count=1, belowThreshold
2015-11-06 23:50:52,206 [pool-1-thread-2] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(347)) - , return byte[256]
2015-11-06 23:50:52,206 [pool-1-thread-1] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(347)) - , return byte[128]
2015-11-06 23:50:52,299 [pool-1-thread-4] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(38)
2015-11-06 23:50:52,300 [pool-1-thread-7] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(63)
2015-11-06 23:50:52,299 [pool-1-thread-5] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(183)
2015-11-06 23:50:52,300 [pool-1-thread-11] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(87)
2015-11-06 23:50:52,300 [pool-1-thread-10] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(136)
2015-11-06 23:50:52,300 [pool-1-thread-9] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(71)
2015-11-06 23:50:52,300 [pool-1-thread-7] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(342)) - : count=2, belowThreshold
2015-11-06 23:50:52,301 [pool-1-thread-13] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:release(362)) - recycle: array.length=128
2015-11-06 23:50:52,303 [pool-1-thread-13] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:release(372)) - , freeQueueSize=-1
2015-11-06 23:50:52,300 [pool-1-thread-8] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(211)
2015-11-06 23:50:52,303 [pool-1-thread-8] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(342)) - : count=4, belowThreshold
2015-11-06 23:50:52,300 [pool-1-thread-6] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(114)
2015-11-06 23:50:52,300 [pool-1-thread-4] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(342)) - : count=1, belowThreshold
2015-11-06 23:50:52,300 [pool-1-thread-3] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(14)
2015-11-06 23:50:52,303 [pool-1-thread-4] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(347)) - , return byte[64]
2015-11-06 23:50:52,303 [pool-1-thread-6] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(342)) - : count=4, belowThreshold
2015-11-06 23:50:52,303 [pool-1-thread-8] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(347)) - , return byte[256]
2015-11-06 23:50:52,301 [pool-1-thread-18] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(143)
2015-11-06 23:50:52,301 [pool-1-thread-17] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(180)
2015-11-06 23:50:52,301 [pool-1-thread-16] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(133)
2015-11-06 23:50:52,301 [pool-1-thread-15] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:newByteArray(328)) - allocate(121)
2015-11-06 23:50:52,301 [pool-1-thread-14] DEBUG util.ByteArrayManager 
(ByteArrayManager.java:release(362)) - recycle: array.length=256
{code}

> Remove guards when calling LOG.debug() and LOG.trace() in client package
> 
>
> Key: HDFS-8971
> URL: https://issues.apache.org/jira/browse/HDFS-8971
> Project: 

[jira] [Commented] (HDFS-6101) TestReplaceDatanodeOnFailure fails occasionally

2015-11-06 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993887#comment-14993887
 ] 

Wei-Chiu Chuang commented on HDFS-6101:
---

Hi [~walter.k.su] Thanks for your comments!
I am sorry I was swamped by other tasks and wasn't able to get up to this. I am 
looking to make a new revision based on your suggestions.

Thanks.

> TestReplaceDatanodeOnFailure fails occasionally
> ---
>
> Key: HDFS-6101
> URL: https://issues.apache.org/jira/browse/HDFS-6101
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-6101.001.patch, HDFS-6101.002.patch, 
> HDFS-6101.003.patch, TestReplaceDatanodeOnFailure.log
>
>
> Exception details in a comment below.
> The failure repros on both OS X and Linux if I run the test ~10 times in a 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >