date:20150731


 [ 
https://issues.apache.org/jira/browse/HDFS-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu resolved HDFS-8847.
-
Resolution: Fixed

 change TestHDFSContractAppend to not override testRenameFileBeingAppended 
 method.
 -

 Key: HDFS-8847
 URL: https://issues.apache.org/jira/browse/HDFS-8847
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0


 change TestHDFSContractAppend to not override testRenameFileBeingAppended 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.


 [ 
https://issues.apache.org/jira/browse/HDFS-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reopened HDFS-8847:
-

 change TestHDFSContractAppend to not override testRenameFileBeingAppended 
 method.
 -

 Key: HDFS-8847
 URL: https://issues.apache.org/jira/browse/HDFS-8847
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0


 change TestHDFSContractAppend to not override testRenameFileBeingAppended 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them

2015-07-31 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650156#comment-14650156
 ] 

Ming Ma commented on HDFS-8480:
---

Make sense. It is easier to check in an edit log with an older version. Thanks 
[~zhz] and [~cmccabe].

 Fix performance and timeout issues in HDFS-7929 by using hard-links to 
 preserve old edit logs instead of copying them
 -

 Key: HDFS-8480
 URL: https://issues.apache.org/jira/browse/HDFS-8480
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, 
 HDFS-8480.02.patch, HDFS-8480.03.patch


 HDFS-7929 copies existing edit logs to the storage directory of the upgraded 
 {{NameNode}}. This slows down the upgrade process. This JIRA aims to use 
 hard-linking instead of per-op copying to achieve the same goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.


[ 
https://issues.apache.org/jira/browse/HDFS-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650152#comment-14650152
 ] 

zhihai xu commented on HDFS-8847:
-

The patch from HADOOP-12268 
(https://issues.apache.org/jira/secure/attachment/12748104/HADOOP-12268.001.patch)
 has change at hdfs project, which is in TestHDFSContractAppend.java.
I committed the change in TestHDFSContractAppend.java to trunk and branch-2.

 change TestHDFSContractAppend to not override testRenameFileBeingAppended 
 method.
 -

 Key: HDFS-8847
 URL: https://issues.apache.org/jira/browse/HDFS-8847
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0


 change TestHDFSContractAppend to not override testRenameFileBeingAppended 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.


 [ 
https://issues.apache.org/jira/browse/HDFS-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu resolved HDFS-8847.
-
  Resolution: Fixed
Hadoop Flags: Reviewed

 change TestHDFSContractAppend to not override testRenameFileBeingAppended 
 method.
 -

 Key: HDFS-8847
 URL: https://issues.apache.org/jira/browse/HDFS-8847
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0


 change TestHDFSContractAppend to not override testRenameFileBeingAppended 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.


 [ 
https://issues.apache.org/jira/browse/HDFS-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HDFS-8847:

Fix Version/s: 2.8.0

 change TestHDFSContractAppend to not override testRenameFileBeingAppended 
 method.
 -

 Key: HDFS-8847
 URL: https://issues.apache.org/jira/browse/HDFS-8847
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.8.0


 change TestHDFSContractAppend to not override testRenameFileBeingAppended 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8846) Create edit log files with old layout version for upgrade testing

2015-07-31 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650166#comment-14650166
 ] 

Ming Ma commented on HDFS-8846:
---

2.5 should be fine. Although people still use 2.3 or 2.4, this test tries to 
verify old edits can be replayed during upgrade; not necessarily 
compatibilities around edit log formats.

If you unpack the existing fsimage tgz files under 
{{hadoop-hdfs-project/hadoop-hdfs/src/test/resources}}, they have the proper 
namenode dir contents that include VERSION file, etc.. Just wonder if you are 
going to create something similar except that the fsimage is empty.

 Create edit log files with old layout version for upgrade testing
 -

 Key: HDFS-8846
 URL: https://issues.apache.org/jira/browse/HDFS-8846
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 Per discussion under HDFS-8480, we should create some edit log files with old 
 layout version, to test whether they can be correctly handled in upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block

2015-07-31 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650070#comment-14650070
 ] 

Yi Liu edited comment on HDFS-6682 at 8/1/15 12:32 AM:
---

Thanks Allen, Andrew and Akira for the discussion.

Our original intention is to solve issue which is good, thank you for working 
on it.  About the discussion itself, Andrew's suggestion is good, and another 
option is to record latest time of 
{{UnderReplicatedBlocks#chooseUnderReplicatedBlocks}}, and we already have 
metrics about the 
{{underReplicatedBlocksCount/pendingReplicationBlocksCount/scheduledReplicationBlocksCount}},
 so we can know whether/how long the under replica list is handled since last 
time if we really want to see.   My point is not worth to record whole under 
replicated list for this metric.

On the other hand, we have {{UnderReplicatedBlocks}} and 
{{PendingReplicationBlocks}}, right? Replication monitor thread will 
periodically pick up some under replicated blocks, unless the NN stops (e.g, 
full gc), compute replication work will always happen in some CPU time slice, 
of course it could be slow since there maybe many things need to be handled in 
NN (e.g. many requests). But actually if NN is slow, we have many ways to know 
it.  About Akira's comment about the metric is also about the entire HDFS 
cluster, we talk DataNode here, I think more correctly thing it's to record the 
timeout number of pending replication blocks ({{PendingReplicationBlocks}}) if 
network is very busy or target DNs corrupted if we want to get the Cluster 
health from replication blocks' review,   {{UnderReplicatedBlocks}} can't stand 
for that.

So if we want to have some metrics about the replicated blocks in NN, let's 
find some lightweight way as suggested, thanks.



was (Author: hitliuyi):
Thanks Allen, Andrew and Akira for the discussion.

Our original intention is to solve issue which is good, thank you for working 
on it.  About the discussion itself, Andrew's suggestion is good, and another 
option is to record latest time of 
{{UnderReplicatedBlocks#chooseUnderReplicatedBlocks}}, and we already have 
metrics about the 
{{underReplicatedBlocksCount/pendingReplicationBlocksCount/scheduledReplicationBlocksCount}},
 so we can know whether/how long the under replica list is handled since last 
time if we really want to see.   My point is not worth to record whole under 
replicated list for this metric.

On way other hand, we have {{UnderReplicatedBlocks}} and 
{{PendingReplicationBlocks}}, right? Replication monitor thread will 
periodically pick up some under replicated blocks, unless the NN stops (e.g, 
full gc), compute replication work will always happen in some CPU time slice, 
of course it could be slow since there maybe many things need to be handled in 
NN (e.g. many requests). But actually if NN is slow, we have many ways to know 
it.  About Akira's comment about the metric is also about the entire HDFS 
cluster, we talk DataNode here, I think more correctly thing it's to record the 
timeout number of pending replication blocks ({{PendingReplicationBlocks}}) if 
network is very busy or target DNs corrupted if we want to get the Cluster 
health from replication blocks' review,   {{UnderReplicatedBlocks}} can't stand 
for that.

So if we want to have some metrics about the replicated blocks in NN, let's 
find some lightweight way as suggested, thanks.


 Add a metric to expose the timestamp of the oldest under-replicated block
 -

 Key: HDFS-6682
 URL: https://issues.apache.org/jira/browse/HDFS-6682
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: metrics
 Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, 
 HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch


 In the following case, the data in the HDFS is lost and a client needs to put 
 the same file again.
 # A Client puts a file to HDFS
 # A DataNode crashes before replicating a block of the file to other DataNodes
 I propose a metric to expose the timestamp of the oldest 
 under-replicated/corrupt block. That way client can know what file to retain 
 for the re-try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp


[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650080#comment-14650080
 ] 

Hadoop QA commented on HDFS-8828:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  0s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 58s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 29s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 29s | The applied patch generated  
19 new checkstyle issues (total was 120, now 139). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 52s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 45s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   7m 18s | Tests passed in 
hadoop-distcp. |
| | |  48m  8s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747636/HDFS-8828.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d311a38 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11881/artifact/patchprocess/diffcheckstylehadoop-distcp.txt
 |
| hadoop-distcp test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11881/artifact/patchprocess/testrun_hadoop-distcp.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11881/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11881/console |


This message was automatically generated.

 Utilize Snapshot diff report to build copy list in distcp
 -

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Attachments: HDFS-8828.001.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8155) Support OAuth2 in WebHDFS

2015-07-31 Thread Jakob Homan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-8155:
--
Attachment: HDFS-8155-1.patch

First patch for review.  We've been testing a version of this code for a few 
months and it's working well.

Two types of OAuth code grants (client credentials and refresh/access tokens 
provided by the conf) are supported by default and other code grants are user 
implementable.  I had planned on using Apache Oltu for this, but that project 
doesn't seem very active and it's main benefit - special case support for 
oauth2 providers like github/twitter/fb, etc. - is of marginal benefit for 
WebHDFS and could easily be implemented by the user if necessary.

I didn't end up using the Authenticator client class because it's too closely 
tied to the spnego implementation, but after this goes in it will be a good 
idea to make that class more generic and use it for the oauth stuff as well.

 Support OAuth2 in WebHDFS
 -

 Key: HDFS-8155
 URL: https://issues.apache.org/jira/browse/HDFS-8155
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: webhdfs
Reporter: Jakob Homan
Assignee: Jakob Homan
 Attachments: HDFS-8155-1.patch


 WebHDFS should be able to accept OAuth2 credentials.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy


[ 
https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650045#comment-14650045
 ] 

Hudson commented on HDFS-6860:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8253 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8253/])
HDFS-6860. BlockStateChange logs are too noisy. Contributed by Chang Li and 
Xiaoyu Yao. (xyao: rev d311a38a6b32bbb210bd8748cfb65463e9c0740e)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java


 BlockStateChange logs are too noisy
 ---

 Key: HDFS-6860
 URL: https://issues.apache.org/jira/browse/HDFS-6860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Chang Li
  Labels: BB2015-05-TBR, newbie
 Fix For: 2.8.0

 Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, 
 HDFS6860.patch


 Block State Change logs are too noisy at the default INFO level and affect NN 
 performance on busy clusters.
 Most of these state changes can be logged at debug level instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block

2015-07-31 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650070#comment-14650070
 ] 

Yi Liu commented on HDFS-6682:
--

Thanks Allen, Andrew and Akira for the discussion.

Our original intention is to solve issue which is good, thank you for working 
on it.  About the discussion itself, Andrew's suggestion is good, and another 
option is to record latest time of 
{{UnderReplicatedBlocks#chooseUnderReplicatedBlocks}}, and we already have 
metrics about the 
{{underReplicatedBlocksCount/pendingReplicationBlocksCount/scheduledReplicationBlocksCount}},
 so we can know whether/how long the under replica list is handled since last 
time if we really want to see.   My point is not worth to record whole under 
replicated list for this metric.

On way other hand, we have {{UnderReplicatedBlocks}} and 
{{PendingReplicationBlocks}}, right? Replication monitor thread will 
periodically pick up some under replicated blocks, unless the NN stops (e.g, 
full gc), compute replication work will always happen in some CPU time slice, 
of course it could be slow since there maybe many things need to be handled in 
NN (e.g. many requests). But actually if NN is slow, we have many ways to know 
it.  About Akira's comment about the metric is also about the entire HDFS 
cluster, we talk DataNode here, I think more correctly thing it's to record the 
timeout number of pending replication blocks ({{PendingReplicationBlocks}}) if 
network is very busy or target DNs corrupted if we want to get the Cluster 
health from replication blocks' review,   {{UnderReplicatedBlocks}} can't stand 
for that.

So if we want to have some metrics about the replicated blocks in NN, let's 
find some lightweight way as suggested, thanks.


 Add a metric to expose the timestamp of the oldest under-replicated block
 -

 Key: HDFS-6682
 URL: https://issues.apache.org/jira/browse/HDFS-6682
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: metrics
 Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, 
 HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch


 In the following case, the data in the HDFS is lost and a client needs to put 
 the same file again.
 # A Client puts a file to HDFS
 # A DataNode crashes before replicating a block of the file to other DataNodes
 I propose a metric to expose the timestamp of the oldest 
 under-replicated/corrupt block. That way client can know what file to retain 
 for the re-try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8823) Move replication factor into individual blocks

2015-07-31 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650073#comment-14650073
 ] 

Haohui Mai commented on HDFS-8823:
--

This jira is focus on replication factor. I suggest opening another jira if you 
want to discuss moving storage policy.

Although it looks like no consensuses have been reached yet, I encourage you to 
submit a patch to demonstrate your idea. Comments on such a high level can be 
quite vague and speculative. Code talks.

 Move replication factor into individual blocks
 --

 Key: HDFS-8823
 URL: https://issues.apache.org/jira/browse/HDFS-8823
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8823.000.patch


 This jira proposes to record the replication factor in the {{BlockInfo}} 
 class. The changes have two advantages:
 * Decoupling the namespace and the block management layer. It is a 
 prerequisite step to move block management off the heap or to a separate 
 process.
 * Increased flexibility on replicating blocks. Currently the replication 
 factors of all blocks have to be the same. The replication factors of these 
 blocks are equal to the highest replication factor across all snapshots. The 
 changes will allow blocks in a file to have different replication factor, 
 potentially saving some space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize


[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650094#comment-14650094
 ] 

Hadoop QA commented on HDFS-8220:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732230/HDFS-8220-HDFS-7285.008.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / ba90c02 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11883/console |


This message was automatically generated.

 Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
 doesn't satisfy BlockGroupSize
 ---

 Key: HDFS-8220
 URL: https://issues.apache.org/jira/browse/HDFS-8220
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
 HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285.005.patch, 
 HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, 
 HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch


 During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
 validate the available datanodes against the {{BlockGroupSize}}. Please see 
 the exception to understand more:
 {code}
 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
 DataStreamer Exception
 java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
 (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
 java.io.IOException: DataStreamer Exception: 
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   ... 1 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-8155) Support OAuth2 in WebHDFS

2015-07-31 Thread Jakob Homan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan reassigned HDFS-8155:
-

Assignee: Jakob Homan  (was: Kai Zheng)

 Support OAuth2 in WebHDFS
 -

 Key: HDFS-8155
 URL: https://issues.apache.org/jira/browse/HDFS-8155
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: webhdfs
Reporter: Jakob Homan
Assignee: Jakob Homan

 WebHDFS should be able to accept OAuth2 credentials.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8840) Inconsistent log level practice


 [ 
https://issues.apache.org/jira/browse/HDFS-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagadesh Kiran N updated HDFS-8840:
---
Status: Patch Available  (was: Open)

 Inconsistent log level practice
 ---

 Key: HDFS-8840
 URL: https://issues.apache.org/jira/browse/HDFS-8840
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.1, 2.5.2, 2.5.1, 2.6.0
Reporter: songwanging
Assignee: Jagadesh Kiran N
Priority: Minor
 Attachments: HDFS-8840-00.patch


 In method checkLogsAvailableForRead() of class: 
 hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java
 The log level is not correct, after checking LOG.isDebugEnabled(), we 
 should use LOG.debug(msg, e);, while now we use  LOG.fatal(msg, e);. Log 
 level is inconsistent.
 the source code of this method is:
 private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long 
 curTxIdOnOtherNode) {
   ...
 } catch (IOException e) {
...
   if (LOG.isDebugEnabled()) {
 LOG.fatal(msg, e);
   } else {
 LOG.fatal(msg);
   }
   return false;
 }
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby


[ 
https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649222#comment-14649222
 ] 

Hudson commented on HDFS-8821:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #262 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/262/])
HDFS-8821. Explain message Operation category X is not supported in state 
standby. Contributed by Gautam Gopalakrishnan. (harsh: rev 
c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Explain message Operation category X is not supported in state standby 
 -

 Key: HDFS-8821
 URL: https://issues.apache.org/jira/browse/HDFS-8821
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Gautam Gopalakrishnan
Assignee: Gautam Gopalakrishnan
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch


 There is one message specifically that causes many users to question the 
 health of their HDFS cluster, namely Operation category READ/WRITE is not 
 supported in state standby.
 HDFS-3447 is an attempt to lower the logging severity for StandbyException 
 related messages but it is not resolved yet. So this jira is an attempt to 
 explain this particular message so it appears less scary.
 The text is question 3.17 in the Hadoop Wiki FAQ
 ref: 
 https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby


[ 
https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649229#comment-14649229
 ] 

Hudson commented on HDFS-8821:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2219 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2219/])
HDFS-8821. Explain message Operation category X is not supported in state 
standby. Contributed by Gautam Gopalakrishnan. (harsh: rev 
c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java


 Explain message Operation category X is not supported in state standby 
 -

 Key: HDFS-8821
 URL: https://issues.apache.org/jira/browse/HDFS-8821
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Gautam Gopalakrishnan
Assignee: Gautam Gopalakrishnan
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch


 There is one message specifically that causes many users to question the 
 health of their HDFS cluster, namely Operation category READ/WRITE is not 
 supported in state standby.
 HDFS-3447 is an attempt to lower the logging severity for StandbyException 
 related messages but it is not resolved yet. So this jira is an attempt to 
 explain this particular message so it appears less scary.
 The text is question 3.17 in the Hadoop Wiki FAQ
 ref: 
 https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8792) Improve BlockManager#postponedMisreplicatedBlocks and BlockManager#excessReplicateMap

2015-07-31 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649183#comment-14649183
 ] 

Yi Liu commented on HDFS-8792:
--

The test failure is not related.

 Improve BlockManager#postponedMisreplicatedBlocks and 
 BlockManager#excessReplicateMap
 -

 Key: HDFS-8792
 URL: https://issues.apache.org/jira/browse/HDFS-8792
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8792.001.patch, HDFS-8792.002.patch


 {{LightWeightHashSet}} requires fewer memory than java hashset. 
 Furthermore, for {{excessReplicateMap}}, we can use {{HashMap}} instead of 
 {{TreeMap}} instead, since no need to sort. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby


[ 
https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649274#comment-14649274
 ] 

Hudson commented on HDFS-8821:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2200 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2200/])
HDFS-8821. Explain message Operation category X is not supported in state 
standby. Contributed by Gautam Gopalakrishnan. (harsh: rev 
c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java


 Explain message Operation category X is not supported in state standby 
 -

 Key: HDFS-8821
 URL: https://issues.apache.org/jira/browse/HDFS-8821
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Gautam Gopalakrishnan
Assignee: Gautam Gopalakrishnan
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch


 There is one message specifically that causes many users to question the 
 health of their HDFS cluster, namely Operation category READ/WRITE is not 
 supported in state standby.
 HDFS-3447 is an attempt to lower the logging severity for StandbyException 
 related messages but it is not resolved yet. So this jira is an attempt to 
 explain this particular message so it appears less scary.
 The text is question 3.17 in the Hadoop Wiki FAQ
 ref: 
 https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8844) TestHDFSCLI does not cleanup the test directory

2015-07-31 Thread Akira AJISAKA (JIRA)

Akira AJISAKA created HDFS-8844:
---

 Summary: TestHDFSCLI does not cleanup the test directory
 Key: HDFS-8844
 URL: https://issues.apache.org/jira/browse/HDFS-8844
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Akira AJISAKA
Priority: Minor


If TestHDFSCLI is executed twice without {{mvn clean}}, the second try fails. 
Here are the failing test cases:
{noformat}
2015-07-31 21:35:17,654 [main] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(231)) - Failing tests:
2015-07-31 21:35:17,654 [main] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(232)) - --
2015-07-31 21:35:17,654 [main] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(238)) - 226: get: getting non 
existent(absolute path)
2015-07-31 21:35:17,654 [main] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(238)) - 227: get: getting non existent 
file(relative path)
2015-07-31 21:35:17,654 [main] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(238)) - 228: get: Test for hdfs:// path - 
getting non existent
2015-07-31 21:35:17,654 [main] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(238)) - 229: get: Test for Namenode's path - 
getting non existent
2015-07-31 21:35:17,654 [main] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(238)) - 250: copyToLocal: non existent 
relative path
2015-07-31 21:35:17,654 [main] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(238)) - 251: copyToLocal: non existent 
absolute path
2015-07-31 21:35:17,655 [main] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(238)) - 252: copyToLocal: Test for hdfs:// 
path - non existent file/directory
2015-07-31 21:35:17,655 [main] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(238)) - 253: copyToLocal: Test for 
Namenode's path - non existent file/directory
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8840) Inconsistent log level practice


 [ 
https://issues.apache.org/jira/browse/HDFS-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagadesh Kiran N updated HDFS-8840:
---
Attachment: HDFS-8840-00.patch

Uploaded the patch please review

 Inconsistent log level practice
 ---

 Key: HDFS-8840
 URL: https://issues.apache.org/jira/browse/HDFS-8840
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0, 2.5.1, 2.5.2, 2.7.1
Reporter: songwanging
Assignee: Jagadesh Kiran N
Priority: Minor
 Attachments: HDFS-8840-00.patch


 In method checkLogsAvailableForRead() of class: 
 hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java
 The log level is not correct, after checking LOG.isDebugEnabled(), we 
 should use LOG.debug(msg, e);, while now we use  LOG.fatal(msg, e);. Log 
 level is inconsistent.
 the source code of this method is:
 private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long 
 curTxIdOnOtherNode) {
   ...
 } catch (IOException e) {
...
   if (LOG.isDebugEnabled()) {
 LOG.fatal(msg, e);
   } else {
 LOG.fatal(msg);
   }
   return false;
 }
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8784) BlockInfo#numNodes should be numStorages


 [ 
https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagadesh Kiran N updated HDFS-8784:
---
Attachment: HDFS-8784-00.patch

Attached the patch .please review

 BlockInfo#numNodes should be numStorages
 

 Key: HDFS-8784
 URL: https://issues.apache.org/jira/browse/HDFS-8784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Jagadesh Kiran N
 Attachments: HDFS-8784-00.patch


 The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8802) dfs.checksum.type is not described in hdfs-default.xml

2015-07-31 Thread Gururaj Shetty (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649205#comment-14649205
 ] 

Gururaj Shetty commented on HDFS-8802:
--


The test failure 
{{org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSClientPeerWriteTimeout}}
 is handled in jira HDFS-8812 so can ignore the same. 

[~ozawa] kindly review the attached patch and let me know for any changes.

 dfs.checksum.type is not described in hdfs-default.xml
 --

 Key: HDFS-8802
 URL: https://issues.apache.org/jira/browse/HDFS-8802
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.1
Reporter: Tsuyoshi Ozawa
Assignee: Gururaj Shetty
 Attachments: HDFS-8802.patch, HDFS-8802_01.patch, HDFS-8802_02.patch


 It's a good timing to check other configurations about hdfs-default.xml here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local


[ 
https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649230#comment-14649230
 ] 

Hudson commented on HDFS-7192:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2219 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2219/])
HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. 
(Contributed by Arpit Agarwal) (arp: rev 
88d8736ddeff10a03acaa99a9a0ee99dcfabe590)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java


 DN should ignore lazyPersist hint if the writer is not local
 

 Key: HDFS-7192
 URL: https://issues.apache.org/jira/browse/HDFS-7192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.8.0

 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, 
 HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch


 The DN should ignore {{allowLazyPersist}} hint to 
 {{DataTransferProtocol#writeBlock}} if the writer is not local.
 Currently we don't restrict memory writes to local clients. For in-cluster 
 clients this is not an issue as single replica writes default to the local 
 DataNode. But clients outside the cluster can still send this hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local


[ 
https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649223#comment-14649223
 ] 

Hudson commented on HDFS-7192:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #262 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/262/])
HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. 
(Contributed by Arpit Agarwal) (arp: rev 
88d8736ddeff10a03acaa99a9a0ee99dcfabe590)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java


 DN should ignore lazyPersist hint if the writer is not local
 

 Key: HDFS-7192
 URL: https://issues.apache.org/jira/browse/HDFS-7192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.8.0

 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, 
 HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch


 The DN should ignore {{allowLazyPersist}} hint to 
 {{DataTransferProtocol#writeBlock}} if the writer is not local.
 Currently we don't restrict memory writes to local clients. For in-cluster 
 clients this is not an issue as single replica writes default to the local 
 DataNode. But clients outside the cluster can still send this hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8784) BlockInfo#numNodes should be numStorages


 [ 
https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagadesh Kiran N updated HDFS-8784:
---
Status: Patch Available  (was: Open)

 BlockInfo#numNodes should be numStorages
 

 Key: HDFS-8784
 URL: https://issues.apache.org/jira/browse/HDFS-8784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Jagadesh Kiran N
 Attachments: HDFS-8784-00.patch


 The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8840) Inconsistent log level practice


[ 
https://issues.apache.org/jira/browse/HDFS-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649245#comment-14649245
 ] 

songwanging commented on HDFS-8840:
---

Great, the patch looks good to me. It should be accepted.

 Inconsistent log level practice
 ---

 Key: HDFS-8840
 URL: https://issues.apache.org/jira/browse/HDFS-8840
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0, 2.5.1, 2.5.2, 2.7.1
Reporter: songwanging
Assignee: Jagadesh Kiran N
Priority: Minor
 Attachments: HDFS-8840-00.patch


 In method checkLogsAvailableForRead() of class: 
 hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java
 The log level is not correct, after checking LOG.isDebugEnabled(), we 
 should use LOG.debug(msg, e);, while now we use  LOG.fatal(msg, e);. Log 
 level is inconsistent.
 the source code of this method is:
 private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long 
 curTxIdOnOtherNode) {
   ...
 } catch (IOException e) {
...
   if (LOG.isDebugEnabled()) {
 LOG.fatal(msg, e);
   } else {
 LOG.fatal(msg);
   }
   return false;
 }
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local


[ 
https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649275#comment-14649275
 ] 

Hudson commented on HDFS-7192:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2200 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2200/])
HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. 
(Contributed by Arpit Agarwal) (arp: rev 
88d8736ddeff10a03acaa99a9a0ee99dcfabe590)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java


 DN should ignore lazyPersist hint if the writer is not local
 

 Key: HDFS-7192
 URL: https://issues.apache.org/jira/browse/HDFS-7192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.8.0

 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, 
 HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch


 The DN should ignore {{allowLazyPersist}} hint to 
 {{DataTransferProtocol#writeBlock}} if the writer is not local.
 Currently we don't restrict memory writes to local clients. For in-cluster 
 clients this is not an issue as single replica writes default to the local 
 DataNode. But clients outside the cluster can still send this hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local


[ 
https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649380#comment-14649380
 ] 

Hudson commented on HDFS-7192:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #270 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/270/])
HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. 
(Contributed by Arpit Agarwal) (arp: rev 
88d8736ddeff10a03acaa99a9a0ee99dcfabe590)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java


 DN should ignore lazyPersist hint if the writer is not local
 

 Key: HDFS-7192
 URL: https://issues.apache.org/jira/browse/HDFS-7192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.8.0

 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, 
 HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch


 The DN should ignore {{allowLazyPersist}} hint to 
 {{DataTransferProtocol#writeBlock}} if the writer is not local.
 Currently we don't restrict memory writes to local clients. For in-cluster 
 clients this is not an issue as single replica writes default to the local 
 DataNode. But clients outside the cluster can still send this hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8840) Inconsistent log level practice


[ 
https://issues.apache.org/jira/browse/HDFS-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649362#comment-14649362
 ] 

Hadoop QA commented on HDFS-8840:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m  7s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 10s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 14s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 31s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 42s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 11s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 138m 14s | Tests failed in hadoop-hdfs. |
| | | 184m 33s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748164/HDFS-8840-00.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 93d50b7 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11877/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11877/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11877/console |


This message was automatically generated.

 Inconsistent log level practice
 ---

 Key: HDFS-8840
 URL: https://issues.apache.org/jira/browse/HDFS-8840
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0, 2.5.1, 2.5.2, 2.7.1
Reporter: songwanging
Assignee: Jagadesh Kiran N
Priority: Minor
 Attachments: HDFS-8840-00.patch


 In method checkLogsAvailableForRead() of class: 
 hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java
 The log level is not correct, after checking LOG.isDebugEnabled(), we 
 should use LOG.debug(msg, e);, while now we use  LOG.fatal(msg, e);. Log 
 level is inconsistent.
 the source code of this method is:
 private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long 
 curTxIdOnOtherNode) {
   ...
 } catch (IOException e) {
...
   if (LOG.isDebugEnabled()) {
 LOG.fatal(msg, e);
   } else {
 LOG.fatal(msg);
   }
   return false;
 }
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby


[ 
https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649379#comment-14649379
 ] 

Hudson commented on HDFS-8821:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #270 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/270/])
HDFS-8821. Explain message Operation category X is not supported in state 
standby. Contributed by Gautam Gopalakrishnan. (harsh: rev 
c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Explain message Operation category X is not supported in state standby 
 -

 Key: HDFS-8821
 URL: https://issues.apache.org/jira/browse/HDFS-8821
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Gautam Gopalakrishnan
Assignee: Gautam Gopalakrishnan
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch


 There is one message specifically that causes many users to question the 
 health of their HDFS cluster, namely Operation category READ/WRITE is not 
 supported in state standby.
 HDFS-3447 is an attempt to lower the logging severity for StandbyException 
 related messages but it is not resolved yet. So this jira is an attempt to 
 explain this particular message so it appears less scary.
 The text is question 3.17 in the Hadoop Wiki FAQ
 ref: 
 https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8840) Inconsistent log level practice


 [ 
https://issues.apache.org/jira/browse/HDFS-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8840:
-
Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

Thanks [~jagadesh.kiran] for working on this. The original logic looks correct 
to me. 
The goal is to log a fatal error when catching IOException. If debug is 
enabled, the fatal log will include additional exception information. 
I will resolve this as not a problem. Please reopen if you disagree.

 Inconsistent log level practice
 ---

 Key: HDFS-8840
 URL: https://issues.apache.org/jira/browse/HDFS-8840
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0, 2.5.1, 2.5.2, 2.7.1
Reporter: songwanging
Assignee: Jagadesh Kiran N
Priority: Minor
 Attachments: HDFS-8840-00.patch


 In method checkLogsAvailableForRead() of class: 
 hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java
 The log level is not correct, after checking LOG.isDebugEnabled(), we 
 should use LOG.debug(msg, e);, while now we use  LOG.fatal(msg, e);. Log 
 level is inconsistent.
 the source code of this method is:
 private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long 
 curTxIdOnOtherNode) {
   ...
 } catch (IOException e) {
...
   if (LOG.isDebugEnabled()) {
 LOG.fatal(msg, e);
   } else {
 LOG.fatal(msg);
   }
   return false;
 }
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones

[
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649618#comment-14649618
]

Andrew Wang commented on HDFS-8833:
---

Supporting reencode-on-rename is difficult, and IMO more difficult than what
the Mover does for HSM, which is why it's not scoped for phase 1 and we're
trying to avoid sticking strictly to current StoragePolicy semantics. However,
if we later add support for reencode-on-rename, we can compatibly add an
inherit mode by setting the behavior-on-create on the parent directory.
e.g. if you set the dir to inherit-on-create, files would set their policy to
inherit. Else if set to parent-on-create, they would explicitly set it to
the parent's policy.

I also think the APIs are not that dissimilar; as I said above, the proposal
for EC is essentially SP without an inherit mode. Alternatively, you can think
of it as files always having an explicit SP set rather than inherit.

We could even completely integrate EC into SP if we add behavior-on-create to
the SP framework. We could allow setting SP on dirs with either behavior
(pretty easy change), and only allow creating dirs with a parent-on-create
policy for now.

Thoughts?

Erasure coding: store EC schema and cell size in INodeFile and eliminate
notion of EC zones
---

Key: HDFS-8833
URL: https://issues.apache.org/jira/browse/HDFS-8833
Project: Hadoop HDFS
Issue Type: Sub-task
Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

We have [discussed |
https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
storing EC schema with files instead of EC zones and recently revisited the
discussion under HDFS-8059.
As a recap, the _zone_ concept has severe limitations including renaming and
nested configuration. Those limitations are valid in encryption for security
reasons and it doesn't make sense to carry them over in EC.
This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For
simplicity, we should first implement it as an xattr and consider memory
optimizations (such as moving it to file header) as a follow-on. We should
also disable changing EC policy on a non-empty file / dir in the first phase.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8784) BlockInfo#numNodes should be numStorages


[ 
https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649514#comment-14649514
 ] 

Zhe Zhang commented on HDFS-8784:
-

Thanks Jagadesh for working on this! Looks like a clean refactor. Could you 
also update the Javadoc of the method?

 BlockInfo#numNodes should be numStorages
 

 Key: HDFS-8784
 URL: https://issues.apache.org/jira/browse/HDFS-8784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Jagadesh Kiran N
 Attachments: HDFS-8784-00.patch


 The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8747) Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones

[
https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649486#comment-14649486
]

Xiaoyu Yao commented on HDFS-8747:
--

Thanks [~andrew.wang] for reviewing.

bq. Have you thought about simply allowing rename between EZs with the same
settings? This would be a much smaller and easier change with similar
properties. Your proposal I think is still better in terms of ease-of-use and
also ensuring security invariants around key rolling (if/when we implement
that).

Yes. We've discussed this simpler work around. But there are many limitations
such as security invariants you mentioned above. We don't want to limit
different EZs to share the same zone key just to support rename as they may
have different policies. Encryption zone as a security concept should be
managed consistently with a single entity. Based on that, support adding
additional roots to encryption zone is a natural enhancement and better
solution.

bq. If we keep the APIs superuser-only, how does a normal user add their trash
folder to an EZ? Same for scratch folders, e.g. if the Hive user is not a
superuser.

I think we should keep this API as superuser-only. It can still be useful even
though we keep it as superuser only. The trash folder/scratch folder can be
per-created and added to encryption zone by super user as needed. This removes
the limitation for hive scratch folder, which currently has to be configured
under the single root of the encryption zone. We can discuss more on this for
HDFS-8831.

Provide Better Scratch Space and Soft Delete Support for HDFS Encryption
Zones
--

Key: HDFS-8747
URL: https://issues.apache.org/jira/browse/HDFS-8747
Project: Hadoop HDFS
Issue Type: Bug
Components: encryption
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf,
HDFS-8747-07292015.pdf

HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to
allow create encryption zone on top of a single HDFS directory. Files under
the root directory of the encryption zone will be encrypted/decrypted
transparently upon HDFS client write or read operations.
Generally, it does not support rename(without data copying) across encryption
zones or between encryption zone and non-encryption zone because different
security settings of encryption zones. However, there are certain use cases
where efficient rename support is desired. This JIRA is to propose better
support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft
Delete” (a.k.a. trash) with HDFS encryption zones.
“Scratch Space” is widely used in Hadoop jobs, which requires efficient
rename support. Temporary files from MR jobs are usually stored in staging
area outside encryption zone such as “/tmp” directory and then rename to
targeted directories as specified once the data is ready to be further
processed.
Below is a summary of supported/unsupported cases from latest Hadoop:
* Rename within the encryption zone is supported
* Rename the entire encryption zone by moving the root directory of the zone
is allowed.
* Rename sub-directory/file from encryption zone to non-encryption zone is
not allowed.
* Rename sub-directory/file from encryption zone A to encryption zone B is
not allowed.
* Rename from non-encryption zone to encryption zone is not allowed.
“Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that
helps prevent accidental deletion of files and directories. If trash is
enabled and a file or directory is deleted using the Hadoop shell, the file
is moved to the .Trash directory of the user's home directory instead of
being deleted. Deleted files are initially moved (renamed) to the Current
sub-directory of the .Trash directory with original path being preserved.
Files and directories in the trash can be restored simply by moving them to a
location outside the .Trash directory.
Due to the limited rename support, delete sub-directory/file within
encryption zone with trash feature is not allowed. Client has to use
-skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved
the error message but without a complete solution to the problem.
We propose to solve the problem by generalizing the mapping between
encryption zone and its underlying HDFS directories from 1:1 today to 1:N.
The encryption zone should allow non-overlapped directories such as scratch
space or soft delete trash locations to be added/removed dynamically after
creation. This way, rename for scratch space and soft delete can be
better supported without breaking the assumption that rename is only

[jira] [Updated] (HDFS-6860) BlockStateChange logs are too noisy


 [ 
https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-6860:
-
Attachment: HDFS-6860.00.patch

The original patch does not apply after switching to slf4j with HDFS-7112. I 
rebase and fix some missing ones in the original patch.

 BlockStateChange logs are too noisy
 ---

 Key: HDFS-6860
 URL: https://issues.apache.org/jira/browse/HDFS-6860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Chang Li
  Labels: BB2015-05-TBR, newbie
 Attachments: HDFS-6860.00.patch, HDFS6860.patch, HDFS6860.patch


 Block State Change logs are too noisy at the default INFO level and affect NN 
 performance on busy clusters.
 Most of these state changes can be logged at debug level instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop

2015-07-31 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649490#comment-14649490
 ] 

Akira AJISAKA commented on HDFS-7916:
-

If HDFS-7704 is backported to a branch, this issue should be backported to the 
same branch as well.

 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for 
 infinite loop
 --

 Key: HDFS-7916
 URL: https://issues.apache.org/jira/browse/HDFS-7916
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.0
Reporter: Vinayakumar B
Assignee: Rushabh S Shah
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-7916-01.patch, HDFS-7916-1.patch


 if any badblock found, then BPSA for StandbyNode will go for infinite times 
 to report it.
 {noformat}2015-03-11 19:43:41,528 WARN 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
 stobdtserver3/10.224.54.70:18010
 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed 
 to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
 at 
 org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones


 [ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8833:

Summary: Erasure coding: store EC schema and cell size in INodeFile and 
eliminate notion of EC zones  (was: Erasure coding: store EC schema and cell 
size with INodeFile and eliminate EC zones)

 Erasure coding: store EC schema and cell size in INodeFile and eliminate 
 notion of EC zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones

[
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649566#comment-14649566
]

Tsz Wo Nicholas Sze commented on HDFS-8833:
---

{code}
... Under the scope of this JIRA, the file's EC policy won't be changed. If it
was created under EC zone A it will carry EC policy A with it when being moved.
Could you explain a bit more why If yes, we could eliminate EC zones.
Otherwise, we should keep EC zone.?
{code}
This is semantic different from StoragePolicy. We should use the same semantic
as StoragePolicy. Let's keep EC zone for the moment.

{code}
As a follow-on we could enable an inherit mode similar as StoragePolicy.
{code}
No, we cannot change semantic over time.

Erasure coding: store EC schema and cell size in INodeFile and eliminate
notion of EC zones
---

Key: HDFS-8833
URL: https://issues.apache.org/jira/browse/HDFS-8833
Project: Hadoop HDFS
Issue Type: Sub-task
Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8653) Code cleanup for DatanodeManager, DatanodeDescriptor and DatanodeStorageInfo


[ 
https://issues.apache.org/jira/browse/HDFS-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649807#comment-14649807
 ] 

Zhe Zhang commented on HDFS-8653:
-

[~szetszwo] Majority of this patch is just code cleanup, such as removing 
unnecessary type info when creating generics {{ArrayList}}.

Only logics change is to add a few {{null}} checker in {{DataodeStorageInfo}} 
and it was from HDFS-8323.

 Code cleanup for DatanodeManager, DatanodeDescriptor and DatanodeStorageInfo
 

 Key: HDFS-8653
 URL: https://issues.apache.org/jira/browse/HDFS-8653
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.8.0

 Attachments: HDFS-8653.00.patch


 While updating the {{blockmanagement}} module to distribute erasure coding 
 recovery work to Datanode, the HDFS-7285 branch also did some code cleanup 
 that should be merged into trunk independently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8202) Improve end to end striping file test to add erasure recovering test

2015-07-31 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-8202:
---
Summary: Improve end to end striping file test to add erasure recovering 
test  (was: Improve end to end stirpping file test to add erasure recovering 
test)

 Improve end to end striping file test to add erasure recovering test
 

 Key: HDFS-8202
 URL: https://issues.apache.org/jira/browse/HDFS-8202
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Fix For: HDFS-7285

 Attachments: HDFS-8202-HDFS-7285.003.patch, 
 HDFS-8202-HDFS-7285.004.patch, HDFS-8202-HDFS-7285.005.patch, 
 HDFS-8202-HDFS-7285.006.patch, HDFS-8202.001.patch, HDFS-8202.002.patch


 This to follow on HDFS-8201 to add erasure recovering test in the end to end 
 stripping file test:
 * After writing certain blocks to the test file, delete some block file;
 * Read the file content and compare, see if any recovering issue, or verify 
 the erasure recovering works or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy


[ 
https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649686#comment-14649686
 ] 

Xiaoyu Yao commented on HDFS-6860:
--

Thanks [~arpitagarwal] for the review. Do you mean keep the processReport 
related log at *INFO* level? 

 BlockStateChange logs are too noisy
 ---

 Key: HDFS-6860
 URL: https://issues.apache.org/jira/browse/HDFS-6860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Chang Li
  Labels: BB2015-05-TBR, newbie
 Attachments: HDFS-6860.00.patch, HDFS6860.patch, HDFS6860.patch


 Block State Change logs are too noisy at the default INFO level and affect NN 
 performance on busy clusters.
 Most of these state changes can be logged at debug level instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8845) DiskChecker should not traverse entire tree


 [ 
https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated HDFS-8845:
---
Attachment: HDFS-8845.patch

 DiskChecker should not traverse entire tree
 ---

 Key: HDFS-8845
 URL: https://issues.apache.org/jira/browse/HDFS-8845
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8845.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8845) DiskChecker should not traverse entire tree


 [ 
https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated HDFS-8845:
---
Status: Patch Available  (was: Open)

 DiskChecker should not traverse entire tree
 ---

 Key: HDFS-8845
 URL: https://issues.apache.org/jira/browse/HDFS-8845
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8845.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy

2015-07-31 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649747#comment-14649747
 ] 

Arpit Agarwal commented on HDFS-6860:
-

+1 pending Jenkins thanks for taking over this [~xyao].

 BlockStateChange logs are too noisy
 ---

 Key: HDFS-6860
 URL: https://issues.apache.org/jira/browse/HDFS-6860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Chang Li
  Labels: BB2015-05-TBR, newbie
 Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, 
 HDFS6860.patch


 Block State Change logs are too noisy at the default INFO level and affect NN 
 performance on busy clusters.
 Most of these state changes can be logged at debug level instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy

2015-07-31 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649670#comment-14649670
 ] 

Arpit Agarwal commented on HDFS-6860:
-

Thanks [~xyao], we should probably leave the Processing first storage report 
for  and the processReport messages at DEBUG.

Those are logged once per DN per block report and useful in practice.

. 

 BlockStateChange logs are too noisy
 ---

 Key: HDFS-6860
 URL: https://issues.apache.org/jira/browse/HDFS-6860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Chang Li
  Labels: BB2015-05-TBR, newbie
 Attachments: HDFS-6860.00.patch, HDFS6860.patch, HDFS6860.patch


 Block State Change logs are too noisy at the default INFO level and affect NN 
 performance on busy clusters.
 Most of these state changes can be logged at debug level instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test


 [ 
https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved HDFS-8202.
-
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: HDFS-7285
Target Version/s: HDFS-7285

+1 on the latest patch. I just committed to the branch. Thanks Xinwei for the 
contribution!

 Improve end to end stirpping file test to add erasure recovering test
 -

 Key: HDFS-8202
 URL: https://issues.apache.org/jira/browse/HDFS-8202
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Fix For: HDFS-7285

 Attachments: HDFS-8202-HDFS-7285.003.patch, 
 HDFS-8202-HDFS-7285.004.patch, HDFS-8202-HDFS-7285.005.patch, 
 HDFS-8202-HDFS-7285.006.patch, HDFS-8202.001.patch, HDFS-8202.002.patch


 This to follow on HDFS-8201 to add erasure recovering test in the end to end 
 stripping file test:
 * After writing certain blocks to the test file, delete some block file;
 * Read the file content and compare, see if any recovering issue, or verify 
 the erasure recovering works or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8845) DiskChecker should not traverse entire tree


 [ 
https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated HDFS-8845:
---
Description: DiskChecker should not traverse entire tree because it's 
causing heavy disk load on checkDiskError()

 DiskChecker should not traverse entire tree
 ---

 Key: HDFS-8845
 URL: https://issues.apache.org/jira/browse/HDFS-8845
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8845.patch


 DiskChecker should not traverse entire tree because it's causing heavy disk 
 load on checkDiskError()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6860) BlockStateChange logs are too noisy


 [ 
https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-6860:
-
Attachment: HDFS-6860.01.patch

Update the patch based on feedback. 
Delta from v00: Keep the block report processing related log at INFO level. 


 BlockStateChange logs are too noisy
 ---

 Key: HDFS-6860
 URL: https://issues.apache.org/jira/browse/HDFS-6860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Chang Li
  Labels: BB2015-05-TBR, newbie
 Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, 
 HDFS6860.patch


 Block State Change logs are too noisy at the default INFO level and affect NN 
 performance on busy clusters.
 Most of these state changes can be logged at debug level instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8845) DiskChecker should not traverse entire tree

2015-07-31 Thread Lei (Eddy) Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649748#comment-14649748
 ] 

Lei (Eddy) Xu commented on HDFS-8845:
-

Hi, [~lichangleo], after HDFS-6482, finalizedDir has two-level of subdirs 
({{finalized/subdir0/subdir23/blk_1234}}). Would this change lose the coverage 
of checkDir() on these subdirs?

 DiskChecker should not traverse entire tree
 ---

 Key: HDFS-8845
 URL: https://issues.apache.org/jira/browse/HDFS-8845
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8845.patch


 DiskChecker should not traverse entire tree because it's causing heavy disk 
 load on checkDiskError()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small


[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649788#comment-14649788
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8838:
---

Thanks Li for taking a look
1. We don't retry connecting to a datanode for the same datanode.  So, let's 
keep if for the moment.  If necessary, we can change it later on.
2. The length is included in the path and printed out in the log.

 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8838_20150729.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8747) Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones

[
https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649805#comment-14649805
]

Andrew Wang commented on HDFS-8747:
---

bq. Encryption zone as a security concept should be managed consistently with a
single entity. Based on that, support adding additional roots to encryption
zone is a natural enhancement and better solution.

SGTM, definitely like the idea of the EZ as a management unit.

bq. The trash folder/scratch folder can be per-created and added to encryption
zone by super user as needed

This is maybe viable for scratch, but not for trash. There can be many users on
a cluster accessing a variety of EZs, such that it's unmanageable for the
super-user to set up all the Trash folders beforehand.

Another question, how would this work if a user's homedir is already an EZ? Do
you plan to add support for nested encryption zones?

Provide Better Scratch Space and Soft Delete Support for HDFS Encryption
Zones
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block


[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649825#comment-14649825
 ] 

Andrew Wang commented on HDFS-6682:
---

Wondering if there's a lighterweight metric we could compute instead. [~aw] is 
this the entire queue being backed up, or a few super-old replicas that never 
get cleared? If it's the entire queue, maybe the rate of addition/removal from 
UnderReplicatedBlocks would be similarly useful, in addition to total size. 
Could provide sliding window metrics like NNTop. Doing this per-DN could also 
be interesting.

 Add a metric to expose the timestamp of the oldest under-replicated block
 -

 Key: HDFS-6682
 URL: https://issues.apache.org/jira/browse/HDFS-6682
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: metrics
 Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, 
 HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch


 In the following case, the data in the HDFS is lost and a client needs to put 
 the same file again.
 # A Client puts a file to HDFS
 # A DataNode crashes before replicating a block of the file to other DataNodes
 I propose a metric to expose the timestamp of the oldest 
 under-replicated/corrupt block. That way client can know what file to retain 
 for the re-try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy

2015-07-31 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649697#comment-14649697
 ] 

Arpit Agarwal commented on HDFS-6860:
-

Thanks I meant INFO. Apologize for not catching this in the earlier review.

 BlockStateChange logs are too noisy
 ---

 Key: HDFS-6860
 URL: https://issues.apache.org/jira/browse/HDFS-6860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Chang Li
  Labels: BB2015-05-TBR, newbie
 Attachments: HDFS-6860.00.patch, HDFS6860.patch, HDFS6860.patch


 Block State Change logs are too noisy at the default INFO level and affect NN 
 performance on busy clusters.
 Most of these state changes can be logged at debug level instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp

2015-07-31 Thread Yufei Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated HDFS-8828:
---
Status: Patch Available  (was: Open)

Submitted patch rev 001.

 Utilize Snapshot diff report to build copy list in distcp
 -

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Attachments: HDFS-8828.001.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8845) DiskChecker should not traverse entire tree


[ 
https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649808#comment-14649808
 ] 

Chang Li commented on HDFS-8845:


Hi  [~eddyxu], thanks for comments.  It's intentionally done for performance by 
a little bit trade off. 

 DiskChecker should not traverse entire tree
 ---

 Key: HDFS-8845
 URL: https://issues.apache.org/jira/browse/HDFS-8845
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8845.patch


 DiskChecker should not traverse entire tree because it's causing heavy disk 
 load on checkDiskError()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

2015-07-31 Thread Ming Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7587:
--
Attachment: HDFS-7587-branch-2.6.patch

For the 2.6.1 release effort, the backport isn't straightforward due to 
difference between 2.6 and 2.7. It has the following differences compared to 
the original patch.

* Include part of HDFS-7509 so that prepareFileForWrite has the expected 
function signature.
* Use Quota.Counts instead of QuotaCounts which is introduced in HDFS-7584.
* Skip the check for storage type specific quota introduced in HDFS-7584.
* Add the necessary definitions for INodesPath#length and 
FSDirectory#shouldSkipQuotaChecks.

 Edit log corruption can happen if append fails with a quota violation
 -

 Key: HDFS-7587
 URL: https://issues.apache.org/jira/browse/HDFS-7587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Kihwal Lee
Assignee: Jing Zhao
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: HDFS-7587-branch-2.6.patch, HDFS-7587.001.patch, 
 HDFS-7587.002.patch, HDFS-7587.003.patch, HDFS-7587.patch


 We have seen a standby namenode crashing due to edit log corruption. It was 
 complaining that {{OP_CLOSE}} cannot be applied because the file is not 
 under-construction.
 When a client was trying to append to the file, the remaining space quota was 
 very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
 inode was already converted for writing and a lease added. Since these were 
 not undone when the quota violation was detected, the file was left in 
 under-construction with an active lease without edit logging {{OP_ADD}}.
 A subsequent {{append()}} eventually caused a lease recovery after the soft 
 limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
 the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
 {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8845) DiskChecker should not traverse entire tree

Chang Li created HDFS-8845:
--

 Summary: DiskChecker should not traverse entire tree
 Key: HDFS-8845
 URL: https://issues.apache.org/jira/browse/HDFS-8845
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8845.patch





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small


 [ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8838:
--
Attachment: h8838_20150731.patch

h8838_20150731.patch: adds more tests and prints out all lengths.

 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8838_20150729.patch, h8838_20150731.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8846) Create edit log files with old layout version for upgrade testing

Zhe Zhang created HDFS-8846:
---

 Summary: Create edit log files with old layout version for upgrade 
testing
 Key: HDFS-8846
 URL: https://issues.apache.org/jira/browse/HDFS-8846
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang


Per discussion under HDFS-8480, we should create some edit log files with old 
layout version, to test whether they can be correctly handled in upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8747) Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones

[
https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649905#comment-14649905
]

Xiaoyu Yao commented on HDFS-8747:
--

bq. This is maybe viable for scratch, but not for trash. There can be many
users on a cluster accessing a variety of EZs, such that it's unmanageable for
the super-user to set up all the Trash folders beforehand.

Three solutions have been discussed in Design-Soft Delete section of the
spec. My initial take is on Option 1: Per User Trash Namespace, which is
mostly for compatibility and simplicity. If pre-create trash folder for many
users is a concern, Option 2: Global Trash Namespace which is similar to the
idea proposed in Hadoop-7310 can be used. It will not be compatible with
current Trash behavior where users find their deleted files under
/user/username/.Trash/Current/ These solutions can be implemented as
pluggable trash policy for admin to choose with configurable keys when the
default one may not be appropriate for their deployment.

bq. Another question, how would this work if a user's homedir is already an EZ?
Do you plan to add support for nested encryption zones?

No we don't plan to support nested encryption zones. If we take Option 1,
this will not be supported. But if we take Option 2, it will not be a problem
as the trash namespace for encryption zone will be separated from user's
homedir.

Provide Better Scratch Space and Soft Delete Support for HDFS Encryption
Zones
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8550) Erasure Coding: Fix FindBugs Multithreaded correctness Warning


[ 
https://issues.apache.org/jira/browse/HDFS-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649929#comment-14649929
 ] 

Zhe Zhang commented on HDFS-8550:
-

[~rakeshr] I wonder if the issue is still valid with HDFS-8386?

 Erasure Coding: Fix FindBugs Multithreaded correctness Warning
 --

 Key: HDFS-8550
 URL: https://issues.apache.org/jira/browse/HDFS-8550
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R

 Findbug warning:- Inconsistent synchronization of 
 org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time
 {code}
 Bug type IS2_INCONSISTENT_SYNC (click for details) 
 In class org.apache.hadoop.hdfs.DFSOutputStream
 Field org.apache.hadoop.hdfs.DFSOutputStream.streamer
 Synchronized 89% of the time
 Unsynchronized access at DFSOutputStream.java:[line 146]
 Unsynchronized access at DFSOutputStream.java:[line 859]
 Unsynchronized access at DFSOutputStream.java:[line 627]
 Unsynchronized access at DFSOutputStream.java:[line 630]
 Unsynchronized access at DFSOutputStream.java:[line 640]
 Unsynchronized access at DFSOutputStream.java:[line 342]
 Unsynchronized access at DFSOutputStream.java:[line 744]
 Unsynchronized access at DFSOutputStream.java:[line 903]
 Synchronized access at DFSOutputStream.java:[line 737]
 Synchronized access at DFSOutputStream.java:[line 913]
 Synchronized access at DFSOutputStream.java:[line 726]
 Synchronized access at DFSOutputStream.java:[line 756]
 Synchronized access at DFSOutputStream.java:[line 762]
 Synchronized access at DFSOutputStream.java:[line 757]
 Synchronized access at DFSOutputStream.java:[line 758]
 Synchronized access at DFSOutputStream.java:[line 762]
 Synchronized access at DFSOutputStream.java:[line 483]
 Synchronized access at DFSOutputStream.java:[line 486]
 Synchronized access at DFSOutputStream.java:[line 717]
 Synchronized access at DFSOutputStream.java:[line 719]
 Synchronized access at DFSOutputStream.java:[line 722]
 Synchronized access at DFSOutputStream.java:[line 408]
 Synchronized access at DFSOutputStream.java:[line 408]
 Synchronized access at DFSOutputStream.java:[line 423]
 Synchronized access at DFSOutputStream.java:[line 426]
 Synchronized access at DFSOutputStream.java:[line 411]
 Synchronized access at DFSOutputStream.java:[line 452]
 Synchronized access at DFSOutputStream.java:[line 452]
 Synchronized access at DFSOutputStream.java:[line 439]
 Synchronized access at DFSOutputStream.java:[line 439]
 Synchronized access at DFSOutputStream.java:[line 439]
 Synchronized access at DFSOutputStream.java:[line 670]
 Synchronized access at DFSOutputStream.java:[line 580]
 Synchronized access at DFSOutputStream.java:[line 574]
 Synchronized access at DFSOutputStream.java:[line 592]
 Synchronized access at DFSOutputStream.java:[line 583]
 Synchronized access at DFSOutputStream.java:[line 581]
 Synchronized access at DFSOutputStream.java:[line 621]
 Synchronized access at DFSOutputStream.java:[line 609]
 Synchronized access at DFSOutputStream.java:[line 621]
 Synchronized access at DFSOutputStream.java:[line 597]
 Synchronized access at DFSOutputStream.java:[line 612]
 Synchronized access at DFSOutputStream.java:[line 597]
 Synchronized access at DFSOutputStream.java:[line 588]
 Synchronized access at DFSOutputStream.java:[line 624]
 Synchronized access at DFSOutputStream.java:[line 612]
 Synchronized access at DFSOutputStream.java:[line 588]
 Synchronized access at DFSOutputStream.java:[line 632]
 Synchronized access at DFSOutputStream.java:[line 632]
 Synchronized access at DFSOutputStream.java:[line 616]
 Synchronized access at DFSOutputStream.java:[line 633]
 Synchronized access at DFSOutputStream.java:[line 657]
 Synchronized access at DFSOutputStream.java:[line 658]
 Synchronized access at DFSOutputStream.java:[line 695]
 Synchronized access at DFSOutputStream.java:[line 698]
 Synchronized access at DFSOutputStream.java:[line 784]
 Synchronized access at DFSOutputStream.java:[line 795]
 Synchronized access at DFSOutputStream.java:[line 801]
 Synchronized access at DFSOutputStream.java:[line 155]
 Synchronized access at DFSOutputStream.java:[line 158]
 Synchronized access at DFSOutputStream.java:[line 433]
 Synchronized access at DFSOutputStream.java:[line 886]
 Synchronized access at DFSOutputStream.java:[line 463]
 Synchronized access at DFSOutputStream.java:[line 469]
 Synchronized access at DFSOutputStream.java:[line 463]
 Synchronized access at DFSOutputStream.java:[line 470]
 Synchronized access at DFSOutputStream.java:[line 465]
 Synchronized access at DFSOutputStream.java:[line 749]
 Synchronized access at DFSStripedOutputStream.java:[line 260]
 Synchronized access at DFSStripedOutputStream.java:[line 325]
 Synchronized access at DFSStripedOutputStream.java:[line 325]

[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-07-31 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649952#comment-14649952
 ] 

Haohui Mai commented on HDFS-8344:
--

After looking through the code, instead of retrying for n times, a better 
approach might be set a timeout instead of retrying n times during lease 
recovery. It might be possible that multiple clients can try to recover the 
leases and quickly use up all the numbers of retries, causing the file to be 
closed too quickly.

That way the whole lease recovery process is bounded by time (in addition to 
SOFT_LIMIT and HARD_LIMIT we have today). And it also can guarantee that the 
lease recovery process always terminates. Thoughts?

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 2.8.0

 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
 HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab

2015-07-31 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649499#comment-14649499
 ] 

Haohui Mai commented on HDFS-6407:
--

The v10 patch allows sorting based on the status and the name of the data node.

[~benoyantony], [~nroberts]. Does the patch look good to you?

 new namenode UI, lost ability to sort columns in datanode tab
 -

 Key: HDFS-6407
 URL: https://issues.apache.org/jira/browse/HDFS-6407
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Nathan Roberts
Assignee: Haohui Mai
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: 002-datanodes-sorted-capacityUsed.png, 
 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, 
 HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.008.patch, 
 HDFS-6407.009.patch, HDFS-6407.010.patch, HDFS-6407.4.patch, 
 HDFS-6407.5.patch, HDFS-6407.6.patch, HDFS-6407.7.patch, HDFS-6407.patch, 
 browse_directory.png, datanodes.png, snapshots.png, sorting 2.png, sorting 
 table.png


 old ui supported clicking on column header to sort on that column. The new ui 
 seems to have dropped this very useful feature.
 There are a few tables in the Namenode UI to display  datanodes information, 
 directory listings and snapshots.
 When there are many items in the tables, it is useful to have ability to sort 
 on the different columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8784) BlockInfo#numNodes should be numStorages


[ 
https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649510#comment-14649510
 ] 

Hadoop QA commented on HDFS-8784:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 23s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  5s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 26s | The applied patch generated  1 
new checkstyle issues (total was 310, now 310). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 25s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 163m 23s | Tests passed in hadoop-hdfs. 
|
| | | 209m 32s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748168/HDFS-8784-00.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 93d50b7 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11878/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11878/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11878/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11878/console |


This message was automatically generated.

 BlockInfo#numNodes should be numStorages
 

 Key: HDFS-8784
 URL: https://issues.apache.org/jira/browse/HDFS-8784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Jagadesh Kiran N
 Attachments: HDFS-8784-00.patch


 The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block

2015-07-31 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649876#comment-14649876
 ] 

Allen Wittenauer commented on HDFS-6682:


We have no insight into how old a given replication might have been hanging 
around so no way to really answer that question.  We know it gets backed up 
during cascading DN failure events (thanks very slow NM memory checker+fast 
acting bad job+Linux OOM killer!), so I was always under the impression that 
it's just the whole queue is super busy vs. old ones never cleared.  Rate might 
be useful to at least tell us if it is stuck and/or a project on how long the 
queue will remain behind.

 Add a metric to expose the timestamp of the oldest under-replicated block
 -

 Key: HDFS-6682
 URL: https://issues.apache.org/jira/browse/HDFS-6682
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: metrics
 Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, 
 HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch


 In the following case, the data in the HDFS is lost and a client needs to put 
 the same file again.
 # A Client puts a file to HDFS
 # A DataNode crashes before replicating a block of the file to other DataNodes
 I propose a metric to expose the timestamp of the oldest 
 under-replicated/corrupt block. That way client can know what file to retain 
 for the re-try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize


[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649948#comment-14649948
 ] 

Zhe Zhang commented on HDFS-8220:
-

Quickly glanced through the current code; it doesn't seem we are handling the 
identified case. Shall we resume the work?

 Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
 doesn't satisfy BlockGroupSize
 ---

 Key: HDFS-8220
 URL: https://issues.apache.org/jira/browse/HDFS-8220
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
 HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285.005.patch, 
 HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, 
 HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch


 During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
 validate the available datanodes against the {{BlockGroupSize}}. Please see 
 the exception to understand more:
 {code}
 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
 DataStreamer Exception
 java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
 (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
 java.io.IOException: DataStreamer Exception: 
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   ... 1 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases

2015-07-31 Thread Sangjin Lee (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sangjin Lee updated HDFS-4882:
--
Labels: 2.6.1-candidate (was: )

Prevent the Namenode's LeaseManager from looping forever in checkLeases
---

Key: HDFS-4882
URL: https://issues.apache.org/jira/browse/HDFS-4882
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs-client, namenode
Affects Versions: 2.0.0-alpha, 2.5.1
Reporter: Zesheng Wu
Assignee: Ravi Prakash
Priority: Critical
Labels: 2.6.1-candidate
Fix For: 2.6.1

Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch,
HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch,
HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch

Scenario:
1. cluster with 4 DNs
2. the size of the file to be written is a little more than one block
3. write the first block to 3 DNs, DN1-DN2-DN3
4. all the data packets of first block is successfully acked and the client
sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
5. DN2 and DN3 are down
6. client recovers the pipeline, but no new DN is added to the pipeline
because of the current pipeline stage is PIPELINE_CLOSE
7. client continuously writes the last block, and try to close the file after
written all the data
8. NN finds that the penultimate block doesn't has enough replica(our
dfs.namenode.replication.min=2), and the client's close runs into indefinite
loop(HDFS-2936), and at the same time, NN makes the last block's state to
COMPLETE
9. shutdown the client
10. the file's lease exceeds hard limit
11. LeaseManager realizes that and begin to do lease recovery by call
fsnamesystem.internalReleaseLease()
12. but the last block's state is COMPLETE, and this triggers lease manager's
infinite loop and prints massive logs like this:
{noformat}
2013-06-05,17:42:25,695 INFO
org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder:
DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
limit
2013-06-05,17:42:25,695 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease.
Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
/user/h_wuzesheng/test.dat
2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block
blk_-7028017402720175688_1202597,
lastBLockState=COMPLETE
2013-06-05,17:42:25,695 INFO
org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery
for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM
APREDUCE_-1252656407_1, pendingcreates: 1]
{noformat}
(the 3rd line log is a debug log added by us)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-3443) Fix NPE when namenode transition to active during startup by adding checkNNStartup() in NameNodeRpcServer

2015-07-31 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HDFS-3443:
--
Labels: 2.6.1-candidate  (was: )

 Fix NPE when namenode transition to active during startup by adding 
 checkNNStartup() in NameNodeRpcServer
 -

 Key: HDFS-3443
 URL: https://issues.apache.org/jira/browse/HDFS-3443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: auto-failover, ha
Reporter: suja s
Assignee: Vinayakumar B
  Labels: 2.6.1-candidate
 Fix For: 2.6.1

 Attachments: HDFS-3443-003.patch, HDFS-3443-004.patch, 
 HDFS-3443-005.patch, HDFS-3443-006.patch, HDFS-3443-007.patch, 
 HDFS-3443_1.patch, HDFS-3443_1.patch


 Start NN
 Let NN standby services be started.
 Before the editLogTailer is initialised start ZKFC and allow the 
 activeservices start to proceed further.
 Here editLogTailer.catchupDuringFailover() will throw NPE.
 void startActiveServices() throws IOException {
 LOG.info(Starting services required for active state);
 writeLock();
 try {
   FSEditLog editLog = dir.fsImage.getEditLog();
   
   if (!editLog.isOpenForWrite()) {
 // During startup, we're already open for write during initialization.
 editLog.initJournalsForWrite();
 // May need to recover
 editLog.recoverUnclosedStreams();
 
 LOG.info(Catching up to latest edits from old active before  +
 taking over writer role in edits logs.);
 editLogTailer.catchupDuringFailover();
 {noformat}
 2012-05-18 16:51:27,585 WARN org.apache.hadoop.ipc.Server: IPC Server 
 Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from 
 XX.XX.XX.55:58003: output error
 2012-05-18 16:51:27,586 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
 8 on 8020, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive 
 from XX.XX.XX.55:58004: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:602)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978)
   at 
 org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
   at 
 org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
 2012-05-18 16:51:27,586 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 9 on 8020 caught an exception
 java.nio.channels.ClosedChannelException
   at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
   at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2092)
   at org.apache.hadoop.ipc.Server.access$2000(Server.java:107)
   at 
 org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:930)
   at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:994)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1738)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface


[ 
https://issues.apache.org/jira/browse/HDFS-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649834#comment-14649834
 ] 

Zhe Zhang commented on HDFS-8835:
-

Thanks for sharing the feedback [~szetszwo].

HDFS-8487 (as well as HDFS-8653, HDFS-8605) just try to divide-and-conquer the 
(inevitable) inconvenience for the community to understand and accept the EC 
change. I feel this way is easier than absorbing the huge change all at once. 
As shown below (copied from HDFS-8728 [discussion | 
https://issues.apache.org/jira/browse/HDFS-8728?focusedCommentId=14619043page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14619043])
 the overall EC change becomes much smaller and less intrusive after pushing 
these changes to trunk first (I will do the rebase after HDFS-8499 revert).

{code}
Current HDFS-7285: 
2532 insertions(+), 1156 deletions(-) in blockmanagement
1826 insertions(+), 444 deletions(-) in namenode

Rebased:
1251 insertions(+), 201 deletions(-) in blockmanagement
1324 insertions(+), 168 deletions(-) in namenode
{code}

That said, I understand that git rebasing is a relatively new workflow and 
people have different preferences in absorbing changes. So more feedbacks are 
very welcome.

 Convert BlockInfoUnderConstruction as an interface
 --

 Key: HDFS-8835
 URL: https://issues.apache.org/jira/browse/HDFS-8835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 Per discussion under HDFS-8499, this JIRA aims to convert 
 {{BlockInfoUnderConstruction}} as an interface and 
 {{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 
 branch will add {{BlockInfoStripedUnderConstruction}} as another 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface


[ 
https://issues.apache.org/jira/browse/HDFS-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649840#comment-14649840
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8835:
---

 HDFS-8487 (as well as HDFS-8653, HDFS-8605) just try to divide-and-conquer 
 the (inevitable) inconvenience for the community to understand and accept the 
 EC change. I feel this way is easier than absorbing the huge change all at 
 once.  ...

Please don't  do it anymore.  We probably should revert all these patch.  We 
should not sneak in branch code to trunk.  The entire branch should be reviewed 
together.

 Convert BlockInfoUnderConstruction as an interface
 --

 Key: HDFS-8835
 URL: https://issues.apache.org/jira/browse/HDFS-8835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 Per discussion under HDFS-8499, this JIRA aims to convert 
 {{BlockInfoUnderConstruction}} as an interface and 
 {{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 
 branch will add {{BlockInfoStripedUnderConstruction}} as another 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small


 [ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8838:
--
Attachment: (was: h8838_20150731.patch)

 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8838_20150729.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface

[
https://issues.apache.org/jira/browse/HDFS-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649906#comment-14649906
]

Andrew Wang commented on HDFS-8835:
---

There is a lot of past precedent for doing refactors in trunk. One of the first
EC-related changes was HDFS-7743 which renamed BlockInfo to BlockInfoContiguous
in trunk. [~szetszwo] you +1'd this change. The other BlockInfo refactors have
happened over many weeks and been reviewed by a variety of different committers
(Yi, Vinay, Jing, myself) so there has been no intent to sneak changes into
trunk. Considering the number of positive reviews, I would say doing refactors
in trunk has been met with general approval.

bq. Patches got committed to trunk neither means that everyone already has
understood the code

Everyone understanding the code is not a prerequisite for getting code
committed. Part of community over code is trusting the judgement of the other
committers on the project. Here multiple committers have positively reviewed
these refactors.

bq. Quite a few people told me that the recent change of HDFS-8487 does make
the code harder to understand. It makes the familiar code unfamiliar.

Considering that many of us have positively reviewed these refactors, maybe
harder to understand is a matter of opinion.

Zhe posted about plans to further simplify the code through use of composition.
Would this help with reviewing the sum of the changes? Maybe we should also
continue to discuss the design of the hierarchy over on HDFS-8499.

Convert BlockInfoUnderConstruction as an interface
--

Key: HDFS-8835
URL: https://issues.apache.org/jira/browse/HDFS-8835
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang

Per discussion under HDFS-8499, this JIRA aims to convert
{{BlockInfoUnderConstruction}} as an interface and
{{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285
branch will add {{BlockInfoStripedUnderConstruction}} as another
implementation.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy


[ 
https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649960#comment-14649960
 ] 

Hadoop QA commented on HDFS-6860:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 58s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 16s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 31s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 29s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 43s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  6s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 157m 31s | Tests failed in hadoop-hdfs. |
| | | 204m 56s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot |
| Timed out tests | org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748221/HDFS-6860.00.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d0e0ba8 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11879/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11879/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11879/console |


This message was automatically generated.

 BlockStateChange logs are too noisy
 ---

 Key: HDFS-6860
 URL: https://issues.apache.org/jira/browse/HDFS-6860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Chang Li
  Labels: BB2015-05-TBR, newbie
 Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, 
 HDFS6860.patch


 Block State Change logs are too noisy at the default INFO level and affect NN 
 performance on busy clusters.
 Most of these state changes can be logged at debug level instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-07-31 Thread Li Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648802#comment-14648802
 ] 

Li Bo commented on HDFS-8838:
-

hi, Nicholas
I think you can commit your patch first and I will update mine after that.
Some points :
1.  {{DFSStripedOutputStream#getNumBlockWriteRetry}} returns 0, which 
allows connecting to datanode only one time. I think we should allow the  
connecting to be retied for several times. One way is to store the located 
block getting from {{locateFollowingBlock()}}, and the following retries will 
use the store one, no need to call {{locateFollowingBlock()}} again.
2.  in {{TestDFSStripedOutputStreamWithFailure}}, you store the test length 
in {{LENGTHS}}. But when I read the code, I have to calculate the length by 
myself to see what kind the test is. So, how about adding some comments , or 
directly show the file length in parameter such as  {{testDatanodeFailure(4* 
cellSize +123)}}?


 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8838_20150729.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8840) Inconsistent log level practice

songwanging created HDFS-8840:
-

 Summary: Inconsistent log level practice
 Key: HDFS-8840
 URL: https://issues.apache.org/jira/browse/HDFS-8840
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.1, 2.5.2, 2.5.1, 2.6.0
Reporter: songwanging
Priority: Minor


In method checkLogsAvailableForRead() of class: 
hadoop-2.7.1-src\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org\apache\hadoop\hdfs\server\namenode\ha\BootstrapStandby.java

The log level is not correct, after checking LOG.isDebugEnabled(), we should 
use LOG.debug(msg, e);, while now we use  LOG.fatal(msg, e);. Log level is 
inconsistent.

the source code of this method is:
private boolean checkLogsAvailableForRead(FSImage image, long imageTxId, long 
curTxIdOnOtherNode) {

  ...
} catch (IOException e) {
   ...
  if (LOG.isDebugEnabled()) {
LOG.fatal(msg, e);
  } else {
LOG.fatal(msg);
  }
  return false;
}
  }




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8842) Catch throwable

songwanging created HDFS-8842:
-

 Summary: Catch throwable 
 Key: HDFS-8842
 URL: https://issues.apache.org/jira/browse/HDFS-8842
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: songwanging
Priority: Critical


We came across a few instances where the code catches Throwable, but fails to 
rethrow anything.
Throwable is the parent type of Exception and Error, so catching Throwable 
means catching both Exceptions as well as Errors. An Exception is something you 
could recover (like IOException), an Error is something more serious and 
usually you could'nt recover easily (like ClassNotFoundError) so it doesn't 
make much sense to catch an Error. 
We should convert Throwable to Exception.

For example:

In method tryGetPid(Process p) of class: 
hadoop-2.7.1-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\ha\ShellCommandFencer.java

code:
private static String tryGetPid(Process p) {
try {
...
} catch (Throwable t) {
  LOG.trace(Unable to determine pid for  + p, t);
  return null;
}
  }

In method uncaughtException(Thread t, Throwable e) of class: 
hadoop-2.7.1-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-common\src\main\java\org\apache\hadoop\yarn\YarnUncaughtExceptionHandler.java

code:
public void uncaughtException(Thread t, Throwable e) {
   ...
  try {
LOG.fatal(Thread  + t +  threw an Error.  Shutting down now..., e);
  } catch (Throwable err) {
//We don't want to not exit because of an issue with logging
  }
...
try {
  System.err.println(Halting due to Out Of Memory Error...);
} catch (Throwable err) {
  //Again we done want to exit because of logging issues.
}
 ...
}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block

2015-07-31 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648840#comment-14648840
 ] 

Akira AJISAKA commented on HDFS-6682:
-

bq. We have many ways to know about namenode health or in heavy load. 

This metric is to show the health not only for NameNode but also for the entire 
HDFS cluster.

 Add a metric to expose the timestamp of the oldest under-replicated block
 -

 Key: HDFS-6682
 URL: https://issues.apache.org/jira/browse/HDFS-6682
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: metrics
 Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, 
 HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch


 In the following case, the data in the HDFS is lost and a client needs to put 
 the same file again.
 # A Client puts a file to HDFS
 # A DataNode crashes before replicating a block of the file to other DataNodes
 I propose a metric to expose the timestamp of the oldest 
 under-replicated/corrupt block. That way client can know what file to retain 
 for the re-try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

[
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648858#comment-14648858
]

Zhe Zhang commented on HDFS-8833:
-

Thanks for the discussions guys!

[~walter.k.su] Good catch that we are still storing EC policy at directory
level. However, a directory is no longer a zone, based on the expected
properties of a _zone_, as Nicholas [summarized |
https://issues.apache.org/jira/browse/HDFS-8833?focusedCommentId=14648073page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14648073].
I'll update the JIRA summary soon.

I like the hybrid solution Andrew proposed. Looks like a good long term
solution. [~vinayrpet] Let me know if it addresses the memory overhead concern
you commented on.

bq. What is the semantic of moving a file under EC zone A to EC zone B? Would
the file be changed from EC scheme A to EC schema B? If yes, we could eliminate
EC zones. Otherwise, we should keep EC zone.
Thanks for the example Nicholas. Under the scope of this JIRA, the file's EC
policy won't be changed. If it was created under EC zone A it will carry EC
policy A with it when being moved. Could you explain a bit more why If yes, we
could eliminate EC zones. Otherwise, we should keep EC zone.?

As a follow-on we could enable an inherit mode similar as StoragePolicy.

Erasure coding: store EC schema and cell size with INodeFile and eliminate EC
zones
---

Key: HDFS-8833
URL: https://issues.apache.org/jira/browse/HDFS-8833
Project: Hadoop HDFS
Issue Type: Sub-task
Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8841) Catch throwable return null

songwanging created HDFS-8841:
-

 Summary: Catch throwable return null
 Key: HDFS-8841
 URL: https://issues.apache.org/jira/browse/HDFS-8841
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: songwanging
Priority: Minor


In method map of class: 
\hadoop-2.7.1-src\hadoop-tools\hadoop-extras\src\main\java\org\apache\hadoop\tools\DistCpV1.java.

This method has this code:

 public void map(LongWritable key,
FilePair value,
OutputCollectorWritableComparable?, Text out,
Reporter reporter) throws IOException {
 ...
} catch (Throwable ex) {
  // ignore, we are just cleaning up
  LOG.debug(Ignoring cleanup exception, ex);
}
   
  }
} 
...
}

Throwable is the parent type of Exception and Error, so catching Throwable 
means catching both Exceptions as well as Errors. An Exception is something you 
could recover (like IOException), an Error is something more serious and 
usually you could'nt recover easily (like ClassNotFoundError) so it doesn't 
make much sense to catch an Error.

We should  convert to catch Exception instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8784) BlockInfo#numNodes should be numStorages


[ 
https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648825#comment-14648825
 ] 

Jagadesh Kiran N commented on HDFS-8784:


Hi [~kanaka] as discussed iam assigning this to me 

 BlockInfo#numNodes should be numStorages
 

 Key: HDFS-8784
 URL: https://issues.apache.org/jira/browse/HDFS-8784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Jagadesh Kiran N

 The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local


[ 
https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649131#comment-14649131
 ] 

Hudson commented on HDFS-7192:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1003 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1003/])
HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. 
(Contributed by Arpit Agarwal) (arp: rev 
88d8736ddeff10a03acaa99a9a0ee99dcfabe590)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java


 DN should ignore lazyPersist hint if the writer is not local
 

 Key: HDFS-7192
 URL: https://issues.apache.org/jira/browse/HDFS-7192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.8.0

 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, 
 HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch


 The DN should ignore {{allowLazyPersist}} hint to 
 {{DataTransferProtocol#writeBlock}} if the writer is not local.
 Currently we don't restrict memory writes to local clients. For in-cluster 
 clients this is not an issue as single replica writes default to the local 
 DataNode. But clients outside the cluster can still send this hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby


[ 
https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649130#comment-14649130
 ] 

Hudson commented on HDFS-8821:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1003 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1003/])
HDFS-8821. Explain message Operation category X is not supported in state 
standby. Contributed by Gautam Gopalakrishnan. (harsh: rev 
c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Explain message Operation category X is not supported in state standby 
 -

 Key: HDFS-8821
 URL: https://issues.apache.org/jira/browse/HDFS-8821
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Gautam Gopalakrishnan
Assignee: Gautam Gopalakrishnan
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch


 There is one message specifically that causes many users to question the 
 health of their HDFS cluster, namely Operation category READ/WRITE is not 
 supported in state standby.
 HDFS-3447 is an attempt to lower the logging severity for StandbyException 
 related messages but it is not resolved yet. So this jira is an attempt to 
 explain this particular message so it appears less scary.
 The text is question 3.17 in the Hadoop Wiki FAQ
 ref: 
 https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8847) change TestHDFSContractAppend to not override testRenameFileBeingAppended method.

zhihai xu created HDFS-8847:
---

 Summary: change TestHDFSContractAppend to not override 
testRenameFileBeingAppended method.
 Key: HDFS-8847
 URL: https://issues.apache.org/jira/browse/HDFS-8847
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu


change TestHDFSContractAppend to not override testRenameFileBeingAppended 
method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8829) DataNode sets SO_RCVBUF explicitly is disabling tcp auto-tuning

2015-07-31 Thread He Tianyi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650141#comment-14650141
 ] 

He Tianyi commented on HDFS-8829:
-

Hi kanaka kumar avvaru,

I've applied the improvement to my cluster, and should be able to manage to 
produce a patch in next few days.
Can I work on this?

 DataNode sets SO_RCVBUF explicitly is disabling tcp auto-tuning
 ---

 Key: HDFS-8829
 URL: https://issues.apache.org/jira/browse/HDFS-8829
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.3.0, 2.6.0
Reporter: He Tianyi
Assignee: kanaka kumar avvaru

 {code:java}
   private void initDataXceiver(Configuration conf) throws IOException {
 // find free port or use privileged port provided
 TcpPeerServer tcpPeerServer;
 if (secureResources != null) {
   tcpPeerServer = new TcpPeerServer(secureResources);
 } else {
   tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout,
   DataNode.getStreamingAddr(conf));
 }
 
 tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
 {code}
 The last line sets SO_RCVBUF explicitly, thus disabling tcp auto-tuning on 
 some system.
 Shall we make this behavior configurable?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8245) Standby namenode doesn't process DELETED_BLOCK if the add block request is in edit log.


 [ 
https://issues.apache.org/jira/browse/HDFS-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated HDFS-8245:
---
Labels: 2.6.1-candidate BB2015-05-TBR  (was: BB2015-05-TBR)

 Standby namenode doesn't process DELETED_BLOCK if the add block request is in 
 edit log.
 ---

 Key: HDFS-8245
 URL: https://issues.apache.org/jira/browse/HDFS-8245
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
  Labels: 2.6.1-candidate, BB2015-05-TBR
 Fix For: 2.7.1

 Attachments: HDFS-8245-1.patch, HDFS-8245.patch


 The following series of events happened on Standby namenode :
 2015-04-09 07:47:21,735 \[Edit log tailer] INFO ha.EditLogTailer: Triggering 
 log roll on remote NameNode Active Namenode (ANN)
 2015-04-09 07:58:01,858 \[Edit log tailer] INFO ha.EditLogTailer: Triggering 
 log roll on remote NameNode ANN
 The following series of events happened on Active Namenode:,
 2015-04-09 07:47:21,747 \[IPC Server handler 99 on 8020] INFO 
 namenode.FSNamesystem: Roll Edit Log from Standby NN (SNN)
 2015-04-09 07:58:01,868 \[IPC Server handler 18 on 8020] INFO 
 namenode.FSNamesystem: Roll Edit Log from SNN
 The following series of events happened on datanode ( {color:red} datanodeA 
 {color}):
 2015-04-09 07:52:15,817 \[DataXceiver for client 
 DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1 at /:51078 
 \[Receiving block 
 BP-595383232--1360869396230:blk_1570321882_1102029183867]] INFO 
 datanode.DataNode: Receiving 
 BP-595383232--1360869396230:blk_1570321882_1102029183867 src: 
 /client:51078 dest: /{color:red}datanodeA:1004{color}
 2015-04-09 07:52:15,969 \[PacketResponder: 
 BP-595383232--1360869396230:blk_1570321882_1102029183867, 
 type=HAS_DOWNSTREAM_IN_PIPELINE] INFO DataNode.clienttrace: src: 
 /client:51078, dest: /{color:red}datanodeA:1004{color}, bytes: 20, op: 
 HDFS_WRITE, cliID: 
 DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1, offset: 0, 
 srvID: 356a8a98-826f-446d-8f4c-ce288c1f0a75, blockid: 
 BP-595383232--1360869396230:blk_1570321882_1102029183867, duration: 
 148948385
 2015-04-09 07:52:15,969 \[PacketResponder: 
 BP-595383232--1360869396230:blk_1570321882_1102029183867, 
 type=HAS_DOWNSTREAM_IN_PIPELINE] INFO datanode.DataNode: PacketResponder: 
 BP-595383232--1360869396230:blk_1570321882_1102029183867, 
 type=HAS_DOWNSTREAM_IN_PIPELINE terminating
 2015-04-09 07:52:25,970 \[DataXceiver for client /{color:red}datanodeB 
 {color}:52827 \[Copying block 
 BP-595383232--1360869396230:blk_1570321882_1102029183867]] INFO 
 datanode.DataNode: Copied 
 BP-595383232--1360869396230:blk_1570321882_1102029183867 to 
 {color:red}datanodeB{color}:52827
 2015-04-09 07:52:28,187 \[DataNode:   heartbeating to ANN:8020] INFO 
 impl.FsDatasetAsyncDiskService: Scheduling blk_1570321882_1102029183867 file 
 path/blk_1570321882 for deletion
 2015-04-09 07:52:28,188 \[Async disk worker #1482 for volume ] INFO 
 impl.FsDatasetAsyncDiskService: Deleted BP-595383232--1360869396230 
 blk_1570321882_1102029183867 file path/blk_1570321882
 Then we failover for upgrade and then the standby became active.
 When we did  ls command on this file, we got the following exception:
 15/04/09 22:07:39 WARN hdfs.BlockReaderFactory: I/O error constructing remote 
 block reader.
 java.io.IOException: Got error for OP_READ_BLOCK, self=/client:32947, 
 remote={color:red}datanodeA:1004{color}, for file filename, for pool 
 BP-595383232--1360869396230 block 1570321882_1102029183867
 at 
 org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:445)
 at 
 org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:410)
 at 
 org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:815)
 at 
 org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)
 at 
 org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:351)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
 at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
 at java.io.DataInputStream.read(DataInputStream.java:100)
 at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78)
 at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
 at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
 at 
 org.apache.hadoop.fs.shell.CopyCommands$Merge.processArguments(CopyCommands.java:97)
 at

[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode


 [ 
https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated HDFS-7980:
---
Labels: 2.6.1-candidate  (was: )

 Incremental BlockReport will dramatically slow down the startup of  a namenode
 --

 Key: HDFS-7980
 URL: https://issues.apache.org/jira/browse/HDFS-7980
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hui Zheng
Assignee: Walter Su
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, 
 HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch


 In the current implementation the datanode will call the 
 reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before 
 calling the bpNamenode.blockReport() method. So in a large(several thousands 
 of datanodes) and busy cluster it will slow down(more than one hour) the 
 startup of namenode. 
 {code}
 ListDatanodeCommand blockReport() throws IOException {
 // send block report if timer has expired.
 final long startTime = now();
 if (startTime - lastBlockReport = dnConf.blockReportInterval) {
   return null;
 }
 final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand();
 // Flush any block information that precedes the block report. Otherwise
 // we have a chance that we will miss the delHint information
 // or we will report an RBW replica after the BlockReport already reports
 // a FINALIZED one.
 reportReceivedDeletedBlocks();
 lastDeletedReport = startTime;
 .
 // Send the reports to the NN.
 int numReportsSent = 0;
 int numRPCs = 0;
 boolean success = false;
 long brSendStartTime = now();
 try {
   if (totalBlockCount  dnConf.blockReportSplitThreshold) {
 // Below split threshold, send all reports in a single message.
 DatanodeCommand cmd = bpNamenode.blockReport(
 bpRegistration, bpos.getBlockPoolId(), reports);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6860) BlockStateChange logs are too noisy


[ 
https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649970#comment-14649970
 ] 

Xiaoyu Yao commented on HDFS-6860:
--

Jenkins results:

* No unit test added because this is a log level only change.
* Test failure is unrelated and tracked by know JIRAs: HDFS-8772.

Thanks [~lichangleo] for the initial patch and [~arpitagarwal], [~andrew.wang] 
for the review. I commit the patch shortly. 

 BlockStateChange logs are too noisy
 ---

 Key: HDFS-6860
 URL: https://issues.apache.org/jira/browse/HDFS-6860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Chang Li
  Labels: BB2015-05-TBR, newbie
 Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, 
 HDFS6860.patch


 Block State Change logs are too noisy at the default INFO level and affect NN 
 performance on busy clusters.
 Most of these state changes can be logged at debug level instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8404) Pending block replication can get stuck using older genstamp


 [ 
https://issues.apache.org/jira/browse/HDFS-8404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated HDFS-8404:
---
Labels: 2.6.1-candidate  (was: )

 Pending block replication can get stuck using older genstamp
 

 Key: HDFS-8404
 URL: https://issues.apache.org/jira/browse/HDFS-8404
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0, 2.7.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: HDFS-8404-v0.patch, HDFS-8404-v1.patch


 If an under-replicated block gets into the pending-replication list, but 
 later the  genstamp of that block ends up being newer than the one originally 
 submitted for replication, the block will fail replication until the NN is 
 restarted. 
 It will be safer if processPendingReplications()  gets up-to-date blockinfo 
 before resubmitting replication work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8823) Move replication factor into individual blocks


[ 
https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650013#comment-14650013
 ] 

Zhe Zhang commented on HDFS-8823:
-

Thanks Haohui for the pointers; they are very helpful.

I commented on {{storagePolicy}} just because if we plan to store it in BM too, 
the combined mem overhead ({{rep factor}} + {{storagePolicy}}) probably won't 
be (as easily) absorbed by alignment. And I don't think we'll end up having 
{{rep factor}} in BM but not {{storagePolicy}} (pls correct me if I'm wrong). 
Looks like BM needs both pieces of info to make correct placement decision.

Given that the majority of blocks will have default {{rep factor}} and 
{{storagePolicy}}, maybe we can use some deduplication. For example, create a 
{{CustomizedBlockPolicies}} feature class and only add it to a {{BlockInfo}} 
when policies are customized.

 Move replication factor into individual blocks
 --

 Key: HDFS-8823
 URL: https://issues.apache.org/jira/browse/HDFS-8823
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8823.000.patch


 This jira proposes to record the replication factor in the {{BlockInfo}} 
 class. The changes have two advantages:
 * Decoupling the namespace and the block management layer. It is a 
 prerequisite step to move block management off the heap or to a separate 
 process.
 * Increased flexibility on replicating blocks. Currently the replication 
 factors of all blocks have to be the same. The replication factors of these 
 blocks are equal to the highest replication factor across all snapshots. The 
 changes will allow blocks in a file to have different replication factor, 
 potentially saving some space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8827) Erasure Coding: When namenode processes over replicated striped block, NPE will be occur in ReplicationMonitor


[ 
https://issues.apache.org/jira/browse/HDFS-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649980#comment-14649980
 ] 

Zhe Zhang commented on HDFS-8827:
-

Thanks for identifying the problem Fukudome-san! I took a quick look and the 
root cause doesn't seem straightforward. 

Do you mind creating a unit test generating the issue so we can all debug on 
the same basis? Thanks much!

 Erasure Coding: When namenode processes over replicated striped block, NPE 
 will be occur in ReplicationMonitor
 --

 Key: HDFS-8827
 URL: https://issues.apache.org/jira/browse/HDFS-8827
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Takuya Fukudome
Assignee: Takuya Fukudome
 Attachments: processing-over-replica-npe.log


 In our test cluster, when namenode processed over replicated striped blocks, 
 null pointer exception(NPE) occurred. This happened under below situation: 1) 
 some datanodes shutdown. 2) namenode recovers block group which lost internal 
 blocks. 3) restart the stopped datanodes. 4) namenode processes over 
 replicated striped blocks. 5) NPE occurs
 I think BlockPlacementPolicyDefault#chooseReplicaToDelete will return null in 
 this situation which causes this NPE problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8845) DiskChecker should not traverse entire tree


[ 
https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650001#comment-14650001
 ] 

Hadoop QA commented on HDFS-8845:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 39s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 46s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 21s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  2s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 159m 42s | Tests failed in hadoop-hdfs. |
| | | 203m 51s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748230/HDFS-8845.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d0e0ba8 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11880/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11880/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11880/console |


This message was automatically generated.

 DiskChecker should not traverse entire tree
 ---

 Key: HDFS-8845
 URL: https://issues.apache.org/jira/browse/HDFS-8845
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8845.patch


 DiskChecker should not traverse entire tree because it's causing heavy disk 
 load on checkDiskError()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8399) Erasure Coding: unit test the behaviour of BlockManager recovery work for the deleted blocks


[ 
https://issues.apache.org/jira/browse/HDFS-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649968#comment-14649968
 ] 

Zhe Zhang commented on HDFS-8399:
-

Thanks for the work Rakesh!

The added test looks a clean sanity check. Can we either add it to 
{{TestStripedINodeFile}} (preferably) or change to a more intuitive name? Other 
than that LGTM.

 Erasure Coding: unit test the behaviour of BlockManager recovery work for the 
 deleted blocks
 

 Key: HDFS-8399
 URL: https://issues.apache.org/jira/browse/HDFS-8399
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
  Labels: Test
 Attachments: HDFS-8399-HDFS-7285-00.patch, 
 HDFS-8399-HDFS-7285-01.patch


 Following exception occurred in the {{ReplicationMonitor}}. As per the 
 initial analysis, I could see the exception is coming for the blocks of the 
 deleted file.
 {code}
 2015-05-14 14:14:40,485 FATAL util.ExitUtil (ExitUtil.java:terminate(127)) - 
 Terminate called
 org.apache.hadoop.util.ExitUtil$ExitException: java.lang.AssertionError: 
 Absolute path required
   at 
 org.apache.hadoop.hdfs.server.namenode.INode.getPathNames(INode.java:744)
   at 
 org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:723)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath(FSDirectory.java:1655)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getECSchemaForPath(FSNamesystem.java:8435)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeRecoveryWorkForBlocks(BlockManager.java:1572)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockRecoveryWork(BlockManager.java:1402)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3894)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3846)
   at java.lang.Thread.run(Thread.java:722)
   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3865)
   at java.lang.Thread.run(Thread.java:722)
 Exception in thread 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@1255079
  org.apache.hadoop.util.ExitUtil$ExitException: java.lang.AssertionError: 
 Absolute path required
   at 
 org.apache.hadoop.hdfs.server.namenode.INode.getPathNames(INode.java:744)
   at 
 org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:723)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.getINodesInPath(FSDirectory.java:1655)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getECSchemaForPath(FSNamesystem.java:8435)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeRecoveryWorkForBlocks(BlockManager.java:1572)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockRecoveryWork(BlockManager.java:1402)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3894)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3846)
   at java.lang.Thread.run(Thread.java:722)
   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3865)
   at java.lang.Thread.run(Thread.java:722)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8486) DN startup may cause severe data loss


 [ 
https://issues.apache.org/jira/browse/HDFS-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated HDFS-8486:
---
Labels: 2.6.1-candidate  (was: )

 DN startup may cause severe data loss
 -

 Key: HDFS-8486
 URL: https://issues.apache.org/jira/browse/HDFS-8486
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 0.23.1, 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: HDFS-8486.patch, HDFS-8486.patch


 A race condition between block pool initialization and the directory scanner 
 may cause a mass deletion of blocks in multiple storages.
 If block pool initialization finds a block on disk that is already in the 
 replica map, it deletes one of the blocks based on size, GS, etc.  
 Unfortunately it _always_ deletes one of the blocks even if identical, thus 
 the replica map _must_ be empty when the pool is initialized.
 The directory scanner starts at a random time within its periodic interval 
 (default 6h).  If the scanner starts very early it races to populate the 
 replica map, causing the block pool init to erroneously delete blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6860) BlockStateChange logs are too noisy


 [ 
https://issues.apache.org/jira/browse/HDFS-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-6860:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

commit to 2.8.0

 BlockStateChange logs are too noisy
 ---

 Key: HDFS-6860
 URL: https://issues.apache.org/jira/browse/HDFS-6860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: Chang Li
  Labels: BB2015-05-TBR, newbie
 Fix For: 2.8.0

 Attachments: HDFS-6860.00.patch, HDFS-6860.01.patch, HDFS6860.patch, 
 HDFS6860.patch


 Block State Change logs are too noisy at the default INFO level and affect NN 
 performance on busy clusters.
 Most of these state changes can be logged at debug level instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8804) Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer allocation


[ 
https://issues.apache.org/jira/browse/HDFS-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650017#comment-14650017
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8804:
---

Some comments on the patch:
- Should getParityBuffer() be synchronized?  It seems that some code path from 
pread is not synchronized.
- close() should check whether curStripeBuf == null since close() can be called 
multiple times.

Some other suggestions can be implemented later:
* It is better to have multiple small data/parity buffers with size == cellSize 
so that it is more efficient for reusing the buffers.
* Should DirectBufferPool be singleton?  So that the pool can be shared.

 Erasure Coding: use DirectBufferPool in DFSStripedInputStream for buffer 
 allocation
 ---

 Key: HDFS-8804
 URL: https://issues.apache.org/jira/browse/HDFS-8804
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8804.000.patch


 Currently we directly allocate direct ByteBuffer in DFSStripedInputstream for 
 the stripe buffer and the buffers holding parity data. It's better to get 
 ByteBuffer from DirectBufferPool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small


 [ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8838:
--
Attachment: h8838_20150731.patch

 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8838_20150729.patch, h8838_20150731.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them