[jira] [Commented] (HDFS-8792) Improve BlockManager#postponedMisreplicatedBlocks and BlockManager#excessReplicateMap

2015-07-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647537#comment-14647537
 ] 

Hadoop QA commented on HDFS-8792:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 17s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 3  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  3s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 165m 14s | Tests failed in hadoop-hdfs. |
| | | 206m 25s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747948/HDFS-8792.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ddc867ce |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11874/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11874/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11874/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11874/console |


This message was automatically generated.

 Improve BlockManager#postponedMisreplicatedBlocks and 
 BlockManager#excessReplicateMap
 -

 Key: HDFS-8792
 URL: https://issues.apache.org/jira/browse/HDFS-8792
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8792.001.patch, HDFS-8792.002.patch


 {{LightWeightHashSet}} requires fewer memory than java hashset. 
 Furthermore, for {{excessReplicateMap}}, we can use {{HashMap}} instead of 
 {{TreeMap}} instead, since no need to sort. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test

2015-07-30 Thread Xinwei Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinwei Qin  updated HDFS-8202:
--
Attachment: HDFS-8202-HDFS-7285.006.patch

Thanks [~zhz] for the comments.
Upload 006 patch for review.
bq.TestWriteStripedFileWithFailure actually fails, could you debug the issue?
This issue is addressed by HDFS-8704, now, all the test can pass with the 
latest 003 patch in HDFS-8704.

 Improve end to end stirpping file test to add erasure recovering test
 -

 Key: HDFS-8202
 URL: https://issues.apache.org/jira/browse/HDFS-8202
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-8202-HDFS-7285.003.patch, 
 HDFS-8202-HDFS-7285.004.patch, HDFS-8202-HDFS-7285.005.patch, 
 HDFS-8202-HDFS-7285.006.patch, HDFS-8202.001.patch, HDFS-8202.002.patch


 This to follow on HDFS-8201 to add erasure recovering test in the end to end 
 stripping file test:
 * After writing certain blocks to the test file, delete some block file;
 * Read the file content and compare, see if any recovering issue, or verify 
 the erasure recovering works or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8816) Improve visualization for the Datanode tab in the NN UI

2015-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647579#comment-14647579
 ] 

Hudson commented on HDFS-8816:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #261 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/261/])
HDFS-8816. Improve visualization for the Datanode tab in the NN UI. Contributed 
by Haohui Mai. (wheat9: rev ddc867ceb9a76986e8379361753598cc48024376)
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/moment.min.js
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html
* hadoop-hdfs-project/hadoop-hdfs/pom.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/hadoop.css


 Improve visualization for the Datanode tab in the NN UI
 ---

 Key: HDFS-8816
 URL: https://issues.apache.org/jira/browse/HDFS-8816
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.8.0

 Attachments: HDFS-8816.000.patch, HDFS-8816.001.patch, 
 HDFS-8816.002.patch, HDFS-8816.003.patch, HDFS-8816.004.patch, HDFS-8816.png, 
 HDFS-8816.png, Screen Shot 2015-07-23 at 10.24.24 AM.png


 The information of the datanode tab in the NN UI is clogged. This jira 
 proposes to improve the visualization of the datanode tab in the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block

2015-07-30 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647666#comment-14647666
 ] 

Yi Liu commented on HDFS-6682:
--

{quote}
There's a big advantage to having this metric: it's extremely useful to know 
how backlogged the replication queue is as a determinant of namenode health on 
extremely large clusters.
{quote}
We have many ways to know about namenode health or in heavy load. 
It's not worth.

Thanks Akira for reverting.


 Add a metric to expose the timestamp of the oldest under-replicated block
 -

 Key: HDFS-6682
 URL: https://issues.apache.org/jira/browse/HDFS-6682
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: metrics
 Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, 
 HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch


 In the following case, the data in the HDFS is lost and a client needs to put 
 the same file again.
 # A Client puts a file to HDFS
 # A DataNode crashes before replicating a block of the file to other DataNodes
 I propose a metric to expose the timestamp of the oldest 
 under-replicated/corrupt block. That way client can know what file to retain 
 for the re-try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8816) Improve visualization for the Datanode tab in the NN UI

2015-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647498#comment-14647498
 ] 

Hudson commented on HDFS-8816:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #272 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/272/])
HDFS-8816. Improve visualization for the Datanode tab in the NN UI. Contributed 
by Haohui Mai. (wheat9: rev ddc867ceb9a76986e8379361753598cc48024376)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/hadoop.css
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js
* hadoop-hdfs-project/hadoop-hdfs/pom.xml
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/moment.min.js


 Improve visualization for the Datanode tab in the NN UI
 ---

 Key: HDFS-8816
 URL: https://issues.apache.org/jira/browse/HDFS-8816
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.8.0

 Attachments: HDFS-8816.000.patch, HDFS-8816.001.patch, 
 HDFS-8816.002.patch, HDFS-8816.003.patch, HDFS-8816.004.patch, HDFS-8816.png, 
 HDFS-8816.png, Screen Shot 2015-07-23 at 10.24.24 AM.png


 The information of the datanode tab in the NN UI is clogged. This jira 
 proposes to improve the visualization of the datanode tab in the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8816) Improve visualization for the Datanode tab in the NN UI

2015-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647784#comment-14647784
 ] 

Hudson commented on HDFS-8816:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #269 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/269/])
HDFS-8816. Improve visualization for the Datanode tab in the NN UI. Contributed 
by Haohui Mai. (wheat9: rev ddc867ceb9a76986e8379361753598cc48024376)
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/moment.min.js
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js
* hadoop-hdfs-project/hadoop-hdfs/pom.xml
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/hadoop.css
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve visualization for the Datanode tab in the NN UI
 ---

 Key: HDFS-8816
 URL: https://issues.apache.org/jira/browse/HDFS-8816
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.8.0

 Attachments: HDFS-8816.000.patch, HDFS-8816.001.patch, 
 HDFS-8816.002.patch, HDFS-8816.003.patch, HDFS-8816.004.patch, HDFS-8816.png, 
 HDFS-8816.png, Screen Shot 2015-07-23 at 10.24.24 AM.png


 The information of the datanode tab in the NN UI is clogged. This jira 
 proposes to improve the visualization of the datanode tab in the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8816) Improve visualization for the Datanode tab in the NN UI

2015-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647825#comment-14647825
 ] 

Hudson commented on HDFS-8816:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2218 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2218/])
HDFS-8816. Improve visualization for the Datanode tab in the NN UI. Contributed 
by Haohui Mai. (wheat9: rev ddc867ceb9a76986e8379361753598cc48024376)
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html
* hadoop-hdfs-project/hadoop-hdfs/pom.xml
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/moment.min.js
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/hadoop.css
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve visualization for the Datanode tab in the NN UI
 ---

 Key: HDFS-8816
 URL: https://issues.apache.org/jira/browse/HDFS-8816
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.8.0

 Attachments: HDFS-8816.000.patch, HDFS-8816.001.patch, 
 HDFS-8816.002.patch, HDFS-8816.003.patch, HDFS-8816.004.patch, HDFS-8816.png, 
 HDFS-8816.png, Screen Shot 2015-07-23 at 10.24.24 AM.png


 The information of the datanode tab in the NN UI is clogged. This jira 
 proposes to improve the visualization of the datanode tab in the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block

2015-07-30 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647679#comment-14647679
 ] 

Allen Wittenauer commented on HDFS-6682:


bq. We have many ways to know about namenode health or in heavy load. It's not 
worth.

They don't work for this use case.  We see this *every day*.  NN is healthy 
*except* the replication queue is backed up.

 Add a metric to expose the timestamp of the oldest under-replicated block
 -

 Key: HDFS-6682
 URL: https://issues.apache.org/jira/browse/HDFS-6682
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
  Labels: metrics
 Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, 
 HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch


 In the following case, the data in the HDFS is lost and a client needs to put 
 the same file again.
 # A Client puts a file to HDFS
 # A DataNode crashes before replicating a block of the file to other DataNodes
 I propose a metric to expose the timestamp of the oldest 
 under-replicated/corrupt block. That way client can know what file to retain 
 for the re-try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8816) Improve visualization for the Datanode tab in the NN UI

2015-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647693#comment-14647693
 ] 

Hudson commented on HDFS-8816:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2199 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2199/])
HDFS-8816. Improve visualization for the Datanode tab in the NN UI. Contributed 
by Haohui Mai. (wheat9: rev ddc867ceb9a76986e8379361753598cc48024376)
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html
* hadoop-hdfs-project/hadoop-hdfs/pom.xml
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/moment.min.js
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/hadoop.css


 Improve visualization for the Datanode tab in the NN UI
 ---

 Key: HDFS-8816
 URL: https://issues.apache.org/jira/browse/HDFS-8816
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.8.0

 Attachments: HDFS-8816.000.patch, HDFS-8816.001.patch, 
 HDFS-8816.002.patch, HDFS-8816.003.patch, HDFS-8816.004.patch, HDFS-8816.png, 
 HDFS-8816.png, Screen Shot 2015-07-23 at 10.24.24 AM.png


 The information of the datanode tab in the NN UI is clogged. This jira 
 proposes to improve the visualization of the datanode tab in the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8784) BlockInfo#numNodes should be numStorages

2015-07-30 Thread Jagadesh Kiran N (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagadesh Kiran N reassigned HDFS-8784:
--

Assignee: Jagadesh Kiran N  (was: kanaka kumar avvaru)

 BlockInfo#numNodes should be numStorages
 

 Key: HDFS-8784
 URL: https://issues.apache.org/jira/browse/HDFS-8784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Jagadesh Kiran N

 The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8653) Code cleanup for DatanodeManager, DatanodeDescriptor and DatanodeStorageInfo

2015-07-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648318#comment-14648318
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8653:
---

Which JIRA(s ) in HDFS-7285 the patch bases on?

 Code cleanup for DatanodeManager, DatanodeDescriptor and DatanodeStorageInfo
 

 Key: HDFS-8653
 URL: https://issues.apache.org/jira/browse/HDFS-8653
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.8.0

 Attachments: HDFS-8653.00.patch


 While updating the {{blockmanagement}} module to distribute erasure coding 
 recovery work to Datanode, the HDFS-7285 branch also did some code cleanup 
 that should be merged into trunk independently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface

2015-07-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648157#comment-14648157
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8835:
---

 ..., which are not directly related to EC. Pushing those changes to trunk 
 separately will make ease merge reviewing. ...

It probably won't help much since people also need to review the patch 
committed to trunk.  Patches got committed to trunk neither means that everyone 
already has understood the code, nor that the patches make the code simpler and 
easier.Quite a few people told me that the recent change of HDFS-8487 does 
make the code harder to understand.  It makes the familiar code unfamiliar.

 Convert BlockInfoUnderConstruction as an interface
 --

 Key: HDFS-8835
 URL: https://issues.apache.org/jira/browse/HDFS-8835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 Per discussion under HDFS-8499, this JIRA aims to convert 
 {{BlockInfoUnderConstruction}} as an interface and 
 {{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 
 branch will add {{BlockInfoStripedUnderConstruction}} as another 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local

2015-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648262#comment-14648262
 ] 

Hudson commented on HDFS-7192:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8246 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8246/])
HDFS-7192. DN should ignore lazyPersist hint if the writer is not local. 
(Contributed by Arpit Agarwal) (arp: rev 
88d8736ddeff10a03acaa99a9a0ee99dcfabe590)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataXceiverLazyPersistHint.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java


 DN should ignore lazyPersist hint if the writer is not local
 

 Key: HDFS-7192
 URL: https://issues.apache.org/jira/browse/HDFS-7192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.8.0

 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, 
 HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch


 The DN should ignore {{allowLazyPersist}} hint to 
 {{DataTransferProtocol#writeBlock}} if the writer is not local.
 Currently we don't restrict memory writes to local clients. For in-cluster 
 clients this is not an issue as single replica writes default to the local 
 DataNode. But clients outside the cluster can still send this hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local

2015-07-30 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-7192:

  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.8.0
Target Version/s:   (was: 2.8.0)
  Status: Resolved  (was: Patch Available)

Committed for 2.8.0.

 DN should ignore lazyPersist hint if the writer is not local
 

 Key: HDFS-7192
 URL: https://issues.apache.org/jira/browse/HDFS-7192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.8.0

 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, 
 HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch


 The DN should ignore {{allowLazyPersist}} hint to 
 {{DataTransferProtocol#writeBlock}} if the writer is not local.
 Currently we don't restrict memory writes to local clients. For in-cluster 
 clients this is not an issue as single replica writes default to the local 
 DataNode. But clients outside the cluster can still send this hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7192) DN should ignore lazyPersist hint if the writer is not local

2015-07-30 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-7192:

Attachment: HDFS-7192.05.patch

Thanks for the review [~xyao].

The Jenkins test failure is unrelated. The v05 patch fixes the checkstyle issue 
by renaming the parameter. I will commit the v05 patch shortly.


Delta wrt v04 patch.
{code}
-  final DataNode datanode, DataChecksum requestedChecksum,
+  final DataNode dn, DataChecksum requestedChecksum,
   CachingStrategy cachingStrategy,
   final boolean allowLazyPersist,
   final boolean pinning) throws IOException {
 return new BlockReceiver(block, storageType, in,
 inAddr, myAddr, stage, newGs, minBytesRcvd, maxBytesRcvd,
-clientname, srcDataNode, datanode, requestedChecksum,
+clientname, srcDataNode, dn, requestedChecksum,
{code}

 DN should ignore lazyPersist hint if the writer is not local
 

 Key: HDFS-7192
 URL: https://issues.apache.org/jira/browse/HDFS-7192
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-7192.01.patch, HDFS-7192.02.patch, 
 HDFS-7192.03.patch, HDFS-7192.04.patch, HDFS-7192.05.patch


 The DN should ignore {{allowLazyPersist}} hint to 
 {{DataTransferProtocol#writeBlock}} if the writer is not local.
 Currently we don't restrict memory writes to local clients. For in-cluster 
 clients this is not an issue as single replica writes default to the local 
 DataNode. But clients outside the cluster can still send this hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2015-07-30 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-742:
-
Status: Patch Available  (was: Open)

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Hairong Kuang
Assignee: Mit Desai
Priority: Minor
 Attachments: HDFS-742-trunk.patch, HDFS-742.patch


 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-742) A down DataNode makes Balancer to hang on repeatingly asking NameNode its partial block list

2015-07-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648660#comment-14648660
 ] 

Hadoop QA commented on HDFS-742:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 21s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 55s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 31s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 34s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 161m  5s | Tests failed in hadoop-hdfs. |
| | | 204m 12s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.fs.TestSymlinkHdfsFileSystem |
|   | hadoop.fs.viewfs.TestViewFileSystemHdfs |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747558/HDFS-742-trunk.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 88d8736 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11875/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11875/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11875/console |


This message was automatically generated.

 A down DataNode makes Balancer to hang on repeatingly asking NameNode its 
 partial block list
 

 Key: HDFS-742
 URL: https://issues.apache.org/jira/browse/HDFS-742
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Hairong Kuang
Assignee: Mit Desai
Priority: Minor
 Attachments: HDFS-742-trunk.patch, HDFS-742.patch


 We had a balancer that had not made any progress for a long time. It turned 
 out it was repeatingly asking Namenode for a partial block list of one 
 datanode, which was done while the balancer was running.
 NameNode should notify Balancer that the datanode is not available and 
 Balancer should stop asking for the datanode's block list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8821) Explain message Operation category X is not supported in state standby

2015-07-30 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-8821:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Thanks Gautam, test passed locally for me as well. Seems unrelated.

I've pushed this into trunk and branch-2; thank you for the continued 
contributions!

 Explain message Operation category X is not supported in state standby 
 -

 Key: HDFS-8821
 URL: https://issues.apache.org/jira/browse/HDFS-8821
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Gautam Gopalakrishnan
Assignee: Gautam Gopalakrishnan
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch


 There is one message specifically that causes many users to question the 
 health of their HDFS cluster, namely Operation category READ/WRITE is not 
 supported in state standby.
 HDFS-3447 is an attempt to lower the logging severity for StandbyException 
 related messages but it is not resolved yet. So this jira is an attempt to 
 explain this particular message so it appears less scary.
 The text is question 3.17 in the Hadoop Wiki FAQ
 ref: 
 https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab

2015-07-30 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-6407:
-
Attachment: HDFS-6407.010.patch

 new namenode UI, lost ability to sort columns in datanode tab
 -

 Key: HDFS-6407
 URL: https://issues.apache.org/jira/browse/HDFS-6407
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Nathan Roberts
Assignee: Haohui Mai
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: 002-datanodes-sorted-capacityUsed.png, 
 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, 
 HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.008.patch, 
 HDFS-6407.009.patch, HDFS-6407.010.patch, HDFS-6407.4.patch, 
 HDFS-6407.5.patch, HDFS-6407.6.patch, HDFS-6407.7.patch, HDFS-6407.patch, 
 browse_directory.png, datanodes.png, snapshots.png, sorting 2.png, sorting 
 table.png


 old ui supported clicking on column header to sort on that column. The new ui 
 seems to have dropped this very useful feature.
 There are a few tables in the Namenode UI to display  datanodes information, 
 directory listings and snapshots.
 When there are many items in the tables, it is useful to have ability to sort 
 on the different columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-30 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648646#comment-14648646
 ] 

Haohui Mai commented on HDFS-8833:
--

bq. Currently we are copying repl from file level to block level.(HDFS-8823)

I won't say this is a copy. The main reason why the replication factor is still 
there is that the {{FileStatus}} struct needs to report the replication factor. 
The BM only relies on information on the {{BlockInfo}} to make decisions on 
replication. I don't think that the EC branch will need to have the same 
compatibility concern.

 Erasure coding: store EC schema and cell size with INodeFile and eliminate EC 
 zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8821) Explain message Operation category X is not supported in state standby

2015-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648708#comment-14648708
 ] 

Hudson commented on HDFS-8821:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8247 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8247/])
HDFS-8821. Explain message Operation category X is not supported in state 
standby. Contributed by Gautam Gopalakrishnan. (harsh: rev 
c5caa25b8f2953e2b7a9d2c9dcbdbf1fed95c10b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java


 Explain message Operation category X is not supported in state standby 
 -

 Key: HDFS-8821
 URL: https://issues.apache.org/jira/browse/HDFS-8821
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Gautam Gopalakrishnan
Assignee: Gautam Gopalakrishnan
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8821-1.patch, HDFS-8821-2.patch


 There is one message specifically that causes many users to question the 
 health of their HDFS cluster, namely Operation category READ/WRITE is not 
 supported in state standby.
 HDFS-3447 is an attempt to lower the logging severity for StandbyException 
 related messages but it is not resolved yet. So this jira is an attempt to 
 explain this particular message so it appears less scary.
 The text is question 3.17 in the Hadoop Wiki FAQ
 ref: 
 https://wiki.apache.org/hadoop/FAQ#What_does_the_message_.22Operation_category_READ.2FWRITE_is_not_supported_in_state_standby.22_mean.3F



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8059) Erasure coding: revisit how to store EC schema and cellSize in NameNode

2015-07-30 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648644#comment-14648644
 ] 

Haohui Mai commented on HDFS-8059:
--

Sorry for the late reply.

bq. Well, we keep replication in the namespace and use it in the block layer; 
how is this any different?

I'm surprised you think that way. There are long efforts on separating the 
block manager out of the NN due to scalability concerns (which can be dated 
back since HDFS-2106). While we're not there yet but it is harmful to make new 
design choices that are contracted with the basic principle.

bq. I assume you will solve the replication issue somehow, and the same 
solution should work for EC schema.

Just quickly skim through the patch in this jira I think the current solution 
looks reasonable. For what needs to be done in trunk I'm putting up a  patch in 
 HDFS-8823.

 Erasure coding: revisit how to store EC schema and cellSize in NameNode
 ---

 Key: HDFS-8059
 URL: https://issues.apache.org/jira/browse/HDFS-8059
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8059.001.patch


 Move {{dataBlockNum}} and {{parityBlockNum}} from BlockInfoStriped to 
 INodeFile, and store them in {{FileWithStripedBlocksFeature}}.
 Ideally these two nums are the same for all striped blocks in a file, and 
 store them in BlockInfoStriped will waste NN memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab

2015-07-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648789#comment-14648789
 ] 

Hadoop QA commented on HDFS-6407:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 50s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | native |   3m  3s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  66m 52s | Tests failed in hadoop-hdfs. |
| | | 105m 20s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
| Timed out tests | org.apache.hadoop.hdfs.TestHDFSFileSystemContract |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12748116/HDFS-6407.010.patch |
| Optional Tests | javadoc javac unit |
| git revision | trunk / c5caa25 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11876/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11876/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11876/console |


This message was automatically generated.

 new namenode UI, lost ability to sort columns in datanode tab
 -

 Key: HDFS-6407
 URL: https://issues.apache.org/jira/browse/HDFS-6407
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Nathan Roberts
Assignee: Haohui Mai
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: 002-datanodes-sorted-capacityUsed.png, 
 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, 
 HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.008.patch, 
 HDFS-6407.009.patch, HDFS-6407.010.patch, HDFS-6407.4.patch, 
 HDFS-6407.5.patch, HDFS-6407.6.patch, HDFS-6407.7.patch, HDFS-6407.patch, 
 browse_directory.png, datanodes.png, snapshots.png, sorting 2.png, sorting 
 table.png


 old ui supported clicking on column header to sort on that column. The new ui 
 seems to have dropped this very useful feature.
 There are a few tables in the Namenode UI to display  datanodes information, 
 directory listings and snapshots.
 When there are many items in the tables, it is useful to have ability to sort 
 on the different columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp

2015-07-30 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated HDFS-8828:
---
Description: 
Some users reported huge time cost to build file copy list in distcp. (30 hours 
for 1.6M files). We can leverage snapshot diff report to build file copy list 
including files/dirs which are changes only between two snapshots (or a 
snapshot and a normal dir). It speed up the process in two folds: 1. less copy 
list building time. 2. less file copy MR jobs.

HDFS snapshot diff report provide information about file/directory creation, 
deletion, rename and modification between two snapshots or a snapshot and a 
normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
the default distcp. So it still relies on default distcp to building complete 
list of files under the source dir. This patch only puts creation and 
modification files into the copy list based on snapshot diff report. We can 
minimize the number of files to copy. 

  was:
Some users reported huge time cost to build file copy list in distcp. (30 hours 
with 1.6M files). We can leverage snapshot diff report to build file copy list 
including files/dirs which are changes only between two snapshots (or a 
snapshot and a normal dir). It speed up the process in two folds: 1. less copy 
list building time. 2. less file copy MR jobs.

HDFS snapshot diff report provide information about file/directory creation, 
deletion, rename and modification between two snapshots or a snapshot and a 
normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
the default distcp. So it still relies on default distcp to building copy list 
which will traverse all files under the source dir. This patch will build the 
copy list based on snapshot diff report. 


 Utilize Snapshot diff report to build copy list in distcp
 -

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Attachments: HDFS-8828.001.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8830) Support add/remove directories to an existing encryption zone

2015-07-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8830:
-
Attachment: HDFS-8830.05.patch

Delta from v4 patch: fix issue around rename added root dirs of encryption zone 
with additional unit tests added.

 Support add/remove directories to an existing encryption zone
 -

 Key: HDFS-8830
 URL: https://issues.apache.org/jira/browse/HDFS-8830
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HDFS-8830.01.patch, HDFS-8830.02.patch, 
 HDFS-8830.03.patch, HDFS-8830.04.patch, HDFS-8830.05.patch


 This is the first step toward better Scratch space and Soft Delete 
 support. We remove the assumption that the hdfs directory and encryption zone 
 is 1 to 1 mapped and can't be changed once created.
 The encryption zone creation part is kept As-Is from Hadoop 2.4. We 
 generalize the encryption zone and its directories from 1:1 to 1:many. This 
 way, other directories such as scratch can be added to/removed from 
 encryption zone as needed. Later on, files in these directories can be 
 renamed within the same encryption zone efficiently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8830) Support add/remove directories to an existing encryption zone

2015-07-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647353#comment-14647353
 ] 

Hadoop QA commented on HDFS-8830:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m 16s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 29s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m  9s | Post-patch findbugs 
hadoop-hdfs-project/hadoop-hdfs compilation is broken. |
| {color:red}-1{color} | findbugs |   4m 35s | Post-patch findbugs 
hadoop-hdfs-project/hadoop-hdfs-client compilation is broken. |
| {color:green}+1{color} | findbugs |   4m 35s | The patch does not introduce 
any new Findbugs (version ) warnings. |
| {color:red}-1{color} | common tests |   0m 25s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |   0m 27s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 26s | Tests passed in 
hadoop-hdfs-client. |
| | |  53m 41s | |
\\
\\
|| Reason || Tests ||
| Failed build | hadoop-common |
|   | hadoop-hdfs |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747938/HDFS-8830.05.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ddc867ce |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11873/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11873/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11873/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11873/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11873/console |


This message was automatically generated.

 Support add/remove directories to an existing encryption zone
 -

 Key: HDFS-8830
 URL: https://issues.apache.org/jira/browse/HDFS-8830
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HDFS-8830.01.patch, HDFS-8830.02.patch, 
 HDFS-8830.03.patch, HDFS-8830.04.patch, HDFS-8830.05.patch


 This is the first step toward better Scratch space and Soft Delete 
 support. We remove the assumption that the hdfs directory and encryption zone 
 is 1 to 1 mapped and can't be changed once created.
 The encryption zone creation part is kept As-Is from Hadoop 2.4. We 
 generalize the encryption zone and its directories from 1:1 to 1:many. This 
 way, other directories such as scratch can be added to/removed from 
 encryption zone as needed. Later on, files in these directories can be 
 renamed within the same encryption zone efficiently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8792) Use LightWeightHashSet for BlockManager#postponedMisreplicatedBlocks

2015-07-30 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8792:
-
Attachment: HDFS-8792.002.patch

Update a new patch. The new patch also improves {{excessReplicateMap}}, which 
we don't need {{sort}}, so no need to use a {{TreeMap}}.

{code}
   public final MapString, LightWeightLinkedSetBlockInfo excessReplicateMap 
=
-new TreeMap();
+new HashMap();
{code}

 Use LightWeightHashSet for BlockManager#postponedMisreplicatedBlocks
 

 Key: HDFS-8792
 URL: https://issues.apache.org/jira/browse/HDFS-8792
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8792.001.patch, HDFS-8792.002.patch


 {{LightWeightHashSet}} requires fewer memory than java hashset. 
 Furthermore, for {{excessReplicateMap}}, we can use {{HashMap}} instead of 
 {{TreeMap}} instead, since no need to sort. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-30 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647301#comment-14647301
 ] 

Walter Su commented on HDFS-8833:
-

I see. You want to store ECPolicy at {{file level}} and also save a copy at 
{{zone/dir level}}. ( Or store at zone and save a copy at file)
Currently StoragePolicy does the same thing. It save a copy at {{zone/dir 
level}} to ease the file creation. So we don't change the file creation API 
level.
To me, StoragePolicy also has {{zone}} concept. It's just that storage zone/dir 
has loose limitation.(You can move a {{WARM}} file into a {{COLD}} directory.)
ECPolicy just like StoragePolicy. A bit difference is:
You can change the StoragePolicy of a directory/file any time you want.
You can't change the ECPolicy of a file any time you want.(directory is fine, 
only affects file creation.)

The title eliminate EC zones maybe not correct.

As I said before, Three places to store schema and cellSize: {{zone level}}, 
{{file level}}, {{block level}}.
StoragePolicy exists at {{zone level}}, {{file level}}.
Currently we are copying repl from {{file level}} to {{block level}}.(HDFS-8823)
Is it ok to store ECPolicy at {{zone level}}, {{file level}}, {{block level}} 
them all? I think it's ok. Becasue
1. {{zone level}} consumes little memory. 
2. {{file level}} doesn't depends on {{zone level}}. If a zone is gone, the 
file still has a copy of the Policy.
3. namespace and storage layer should both have a copy. So two copies at least.
4. We give Policy an ID, so 3 copies consume little memory.

I agree with [~zhz] that we store a copy of ECPolicy at {{file level}} like 
StoragePolicy did, for now. And in the future we can figure out how to copy 
StoragePolicy, ECPolicy, Repl together from namespace to storage layer. 
(HDFS-8823 has some progress)

 Erasure coding: store EC schema and cell size with INodeFile and eliminate EC 
 zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8830) Support add/remove directories to an existing encryption zone

2015-07-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8830:
-
Attachment: (was: HDFS-8830.05.patch)

 Support add/remove directories to an existing encryption zone
 -

 Key: HDFS-8830
 URL: https://issues.apache.org/jira/browse/HDFS-8830
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HDFS-8830.01.patch, HDFS-8830.02.patch, 
 HDFS-8830.03.patch, HDFS-8830.04.patch, HDFS-8830.05.patch


 This is the first step toward better Scratch space and Soft Delete 
 support. We remove the assumption that the hdfs directory and encryption zone 
 is 1 to 1 mapped and can't be changed once created.
 The encryption zone creation part is kept As-Is from Hadoop 2.4. We 
 generalize the encryption zone and its directories from 1:1 to 1:many. This 
 way, other directories such as scratch can be added to/removed from 
 encryption zone as needed. Later on, files in these directories can be 
 renamed within the same encryption zone efficiently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8830) Support add/remove directories to an existing encryption zone

2015-07-30 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8830:
-
Attachment: HDFS-8830.05.patch

 Support add/remove directories to an existing encryption zone
 -

 Key: HDFS-8830
 URL: https://issues.apache.org/jira/browse/HDFS-8830
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HDFS-8830.01.patch, HDFS-8830.02.patch, 
 HDFS-8830.03.patch, HDFS-8830.04.patch, HDFS-8830.05.patch


 This is the first step toward better Scratch space and Soft Delete 
 support. We remove the assumption that the hdfs directory and encryption zone 
 is 1 to 1 mapped and can't be changed once created.
 The encryption zone creation part is kept As-Is from Hadoop 2.4. We 
 generalize the encryption zone and its directories from 1:1 to 1:many. This 
 way, other directories such as scratch can be added to/removed from 
 encryption zone as needed. Later on, files in these directories can be 
 renamed within the same encryption zone efficiently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-488) Implement moveToLocal HDFS command

2015-07-30 Thread Steven Capo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Capo reassigned HDFS-488:


Assignee: Steven Capo  (was: Ravi Phulari)

 Implement moveToLocal  HDFS command
 ---

 Key: HDFS-488
 URL: https://issues.apache.org/jira/browse/HDFS-488
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ravi Phulari
Assignee: Steven Capo
  Labels: newbie
 Attachments: Screen Shot 2014-07-23 at 12.28.23 PM 1.png


 Surprisingly  executing HDFS FsShell command -moveToLocal outputs  - Option 
 '-moveToLocal' is not implemented yet.
  
 {code}
 statepick-lm:Hadoop rphulari$ bin/hadoop fs -moveToLocal bt t
 Option '-moveToLocal' is not implemented yet.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8792) Use LightWeightHashSet for BlockManager#postponedMisreplicatedBlocks

2015-07-30 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647349#comment-14647349
 ] 

Yi Liu commented on HDFS-8792:
--

[~arpitagarwal], sorry for late response

{quote}
do you have any estimates of the memory saved by using LightWeightHashSet?
{quote}
Yes, compared to java {{HashSet}}, there are two advantages from memory point 
of review:
# Java {{HashSet}} internally uses a {{HashMap}}, so there is one more 
reference (4 bytes) for each entry compared to {{LightWeightHashSet}}, so we 
can save {{4 * size}} bytes of memory.
# In {{LightWeightHashSet}}, when elements become less, the size is shrinked a 
lot.

So we can see {{LightWeightHashSet}} is more better.  The main issue is 
{{LightWeightHashSet#LinkedSetIterator}} doesn't support {{remove}} currently, 
it's easy to support it (similar to java HashSet).By the way, currently in 
Hadoop, we use {{LightWeightHashSet}} for all big objects required hash set 
except this one which needs to use {{remove}}.

 Use LightWeightHashSet for BlockManager#postponedMisreplicatedBlocks
 

 Key: HDFS-8792
 URL: https://issues.apache.org/jira/browse/HDFS-8792
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8792.001.patch


 {{LightWeightHashSet}} requires fewer memory than java hashset. 
 Furthermore, for {{excessReplicateMap}}, we can use {{HashMap}} instead of 
 {{TreeMap}} instead, since no need to sort. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8792) Improve BlockManager#postponedMisreplicatedBlocks and BlockManager#excessReplicateMap

2015-07-30 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-8792:
-
Summary: Improve BlockManager#postponedMisreplicatedBlocks and 
BlockManager#excessReplicateMap  (was: Use LightWeightHashSet for 
BlockManager#postponedMisreplicatedBlocks)

 Improve BlockManager#postponedMisreplicatedBlocks and 
 BlockManager#excessReplicateMap
 -

 Key: HDFS-8792
 URL: https://issues.apache.org/jira/browse/HDFS-8792
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8792.001.patch, HDFS-8792.002.patch


 {{LightWeightHashSet}} requires fewer memory than java hashset. 
 Furthermore, for {{excessReplicateMap}}, we can use {{HashMap}} instead of 
 {{TreeMap}} instead, since no need to sort. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-30 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648107#comment-14648107
 ] 

Andrew Wang commented on HDFS-8833:
---

bq. StoragePolicy do not have such restriction since moving a WARM file into a 
COLD dir as well changes its underlining storage (mover will move the replicas 
in order to satisfy the policy.

IIUC this only happens if the file is set to inherit and picked up WARM from 
a parent directory. If the storage policy was set on the file itself, it would 
not change when renamed under a COLD directory.

The proposal is basically like storage policies, except there is no inherit 
mode.

 Erasure coding: store EC schema and cell size with INodeFile and eliminate EC 
 zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648073#comment-14648073
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8833:
---

 To me, StoragePolicy also has zone concept. It's just that storage zone/dir 
 has loose limitation.(You can move a WARM file into a COLD directory.) ...

When talking about zone, we mean that files in one zone cannot be (easily) 
moved/chnaged to another zone.  StoragePolicy do not have such restriction 
since moving a WARM file into a COLD dir as well changes its underlining 
storage (mover will move the replicas in order to satisfy the policy.)

However, we current cannot change the file encryption scheme and the erasure 
code scheme after a file is created.  So we need zones for them.

 Erasure coding: store EC schema and cell size with INodeFile and eliminate EC 
 zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them

2015-07-30 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648076#comment-14648076
 ] 

Colin Patrick McCabe commented on HDFS-8480:


Good catch, [~mingma].  If we make a change later to the edit log format, this 
test will break because we'll be writing out an edit log with the new format, 
but the old version ID.

I think we should simply check in a small binary edit log in order to do this 
testing.  This is similar to what we do with other ugprade scenario tests.  
As [~zhz] mentioned, it would be a lot of effort to support writing out edit 
logs in older formats (we have literally dozens of versions).

 Fix performance and timeout issues in HDFS-7929 by using hard-links to 
 preserve old edit logs instead of copying them
 -

 Key: HDFS-8480
 URL: https://issues.apache.org/jira/browse/HDFS-8480
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, 
 HDFS-8480.02.patch, HDFS-8480.03.patch


 HDFS-7929 copies existing edit logs to the storage directory of the upgraded 
 {{NameNode}}. This slows down the upgrade process. This JIRA aims to use 
 hard-linking instead of per-op copying to achieve the same goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648084#comment-14648084
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8833:
---

What is the semantic of moving a file under EC zone A to EC zone B?  Would the 
file be changed from EC scheme A to EC schema B?  If yes, we could eliminate EC 
zones.  Otherwise, we should keep EC zone.

 Erasure coding: store EC schema and cell size with INodeFile and eliminate EC 
 zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8747) Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones

2015-07-30 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648086#comment-14648086
 ] 

Andrew Wang commented on HDFS-8747:
---

Hi [~xyao], I like this idea overall, nice work. A few questions:

* Have you thought about simply allowing rename between EZs with the same 
settings? This would be a much smaller and easier change with similar 
properties. Your proposal I think is still better in terms of ease-of-use and 
also ensuring security invariants around key rolling (if/when we implement 
that).
* If we keep the APIs superuser-only, how does a normal user add their trash 
folder to an EZ? Same for scratch folders, e.g. if the Hive user is not a 
superuser.

 Provide Better Scratch Space and Soft Delete Support for HDFS Encryption 
 Zones
 --

 Key: HDFS-8747
 URL: https://issues.apache.org/jira/browse/HDFS-8747
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, 
 HDFS-8747-07292015.pdf


 HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to 
 allow create encryption zone on top of a single HDFS directory. Files under 
 the root directory of the encryption zone will be encrypted/decrypted 
 transparently upon HDFS client write or read operations. 
 Generally, it does not support rename(without data copying) across encryption 
 zones or between encryption zone and non-encryption zone because different 
 security settings of encryption zones. However, there are certain use cases 
 where efficient rename support is desired. This JIRA is to propose better 
 support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft 
 Delete” (a.k.a. trash) with HDFS encryption zones.
 “Scratch Space” is widely used in Hadoop jobs, which requires efficient 
 rename support. Temporary files from MR jobs are usually stored in staging 
 area outside encryption zone such as “/tmp” directory and then rename to 
 targeted directories as specified once the data is ready to be further 
 processed. 
 Below is a summary of supported/unsupported cases from latest Hadoop:
 * Rename within the encryption zone is supported
 * Rename the entire encryption zone by moving the root directory of the zone  
 is allowed.
 * Rename sub-directory/file from encryption zone to non-encryption zone is 
 not allowed.
 * Rename sub-directory/file from encryption zone A to encryption zone B is 
 not allowed.
 * Rename from non-encryption zone to encryption zone is not allowed.
 “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that 
 helps prevent accidental deletion of files and directories. If trash is 
 enabled and a file or directory is deleted using the Hadoop shell, the file 
 is moved to the .Trash directory of the user's home directory instead of 
 being deleted.  Deleted files are initially moved (renamed) to the Current 
 sub-directory of the .Trash directory with original path being preserved. 
 Files and directories in the trash can be restored simply by moving them to a 
 location outside the .Trash directory.
 Due to the limited rename support, delete sub-directory/file within 
 encryption zone with trash feature is not allowed. Client has to use 
 -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved 
 the error message but without a complete solution to the problem. 
 We propose to solve the problem by generalizing the mapping between 
 encryption zone and its underlying HDFS directories from 1:1 today to 1:N. 
 The encryption zone should allow non-overlapped directories such as scratch 
 space or soft delete trash locations to be added/removed dynamically after 
 creation. This way, rename for scratch space and soft delete can be 
 better supported without breaking the assumption that rename is only 
 supported within the zone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648088#comment-14648088
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8833:
---

Anyway, we don't seem to have a complete design yet for eliminating EC zones.  
If we really want to do it, we should discuss the design first before changing 
any code.

 Erasure coding: store EC schema and cell size with INodeFile and eliminate EC 
 zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-07-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648095#comment-14648095
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8838:
---

Hi [~libo-intel], HDFS-8704 seems to fix large files according to the JIRA 
summary but here is to fix small files.  Am I missing anything?

 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8838_20150729.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails

2015-07-30 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648099#comment-14648099
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8704:
---

 +for (int dn = 0; dn  1; dn++) {

Please try killing all the datanodes but not just one.  Otherwise, we cannot 
completely test the fix.

 Erasure Coding: client fails to write large file when one datanode fails
 

 Key: HDFS-8704
 URL: https://issues.apache.org/jira/browse/HDFS-8704
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, 
 HDFS-8704-HDFS-7285-003.patch


 I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
 corrupt, client succeeds to write a file smaller than a block group but fails 
 to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
 files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails

2015-07-30 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647239#comment-14647239
 ] 

Li Bo commented on HDFS-8704:
-

I have just update a second patch of this problem. The changes in this patch 
include:
1.  {{DFSStripedOutputStream}} and the failed status of 
{{StripedDataStreamer}}:  it’s not right to take different actions according to 
the current status of a streamer. When a streamer has failed, while the packet 
to be queued belongs to the next block(the streamer can successfully write to 
that block because the new datanode may be well), in this condition the packet 
should be handled as usual. When {{DFSStripedOutputStream}} finds a streamer is 
working well, it queue the packet to the streamer, but the streamer may fail 
before sending the packet. So I remove the logic of checking and setting the 
failed status of a streamer in {{DFSStripedOutputStream}}.  When a streamer 
fails, itself knows how to handle the failure.
2.  Extend the functionality of {{StripedDataStreamer}} : if error occurs, 
{{ StripedDataStreamer }} will first handle remaining trivial packets of 
current block, and then restart to waiting for a new block to be allocated to 
it. 
3.  Add a test to {{TestDFSStripedOutputStreamWithFailure}} which tests 
writing a file with two block groups.  

The unit test occasionally fails because only 8 block locations are given by 
namenode for the second block group. HDFS-8839 has been created to track this 
problem.


 Erasure Coding: client fails to write large file when one datanode fails
 

 Key: HDFS-8704
 URL: https://issues.apache.org/jira/browse/HDFS-8704
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch


 I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
 corrupt, client succeeds to write a file smaller than a block group but fails 
 to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
 files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647251#comment-14647251
 ] 

Zhe Zhang commented on HDFS-8833:
-

{code}
hdfs erasurecode [generic options]
[-setECPolicy [-s schemaName] [-c cellSize] path]
[-getECPolicy path]
 [-listSchemas]
[-usage [cmd ...]]
[-help [cmd ...]]

SetECPolicy command:
-setECPolicy [-s schemaName] [-c cellSize] path
SetECPolicy command is used to set an erasure coding policy, including codec 
schema and cell size, at the specified path. 
path: Refer to an empty and already created directory in HDFS. This is a 
mandatory parameter.
schemaName: This is an optional parameter, specified using ‘-s’ flag. Refer to 
the name of ECSchema to be used for erasure coding of direct children (files or 
subdirectories) of this directory. If not specified the system default ECSchema 
will be used.
cellSize: This is also an optional parameter, specified using ‘-c’ flag. Refer 
to the size of cells for striped blocks. If not specified the default cellsize 
(64k) will be used.
GetECPolicy command
[-getECPolicy path]
GetECPolicy command is used to get details of the erasure coding policy, 
including codec schema and cell size, for the specified path 
ListSchemas command: 
[-listSchemas]
Lists all ECSchemas supported for erasure coding. For SetECPolicy command, one 
of these ECSchema’s name should be provided.
{code}

Above is the proposed CLI. I took the current version and made a few 
modifications. ([~vinayrpet] could you take a look and see if I'm missing 
anything? Thanks..)

From admin's perspective the only difference is nested policies (renaming is a 
result of it). Renaming is just to deviate from the existing expectations on 
_zones_ established by encryption zones. From an implementation perspective 
the main difference is to store a copy of inherited EC policy in the XAttr of 
{{INodeFile}}. As Walter reiterated above about listing ECPolicies, I think 
it's very likely that we can limit the number of supported polices to 16~256 
so we can fit the policy ID (4~8 bits) in file header. 

 Erasure coding: store EC schema and cell size with INodeFile and eliminate EC 
 zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6697) Make NN lease soft and hard limits configurable

2015-07-30 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647262#comment-14647262
 ] 

Vinayakumar B commented on HDFS-6697:
-

bq. Can you clarify why in LeaseRenewer it needs to compare the configured soft 
limit with the default soft limit? Instead it can just set the renewal variable 
with the configured value.
I think the idea is to keep {{renawal}} interval to max 30 seconds(default). 
And if configured softlimit is positive and less than 60 seconds, that will 
have the priority.

Am I right [~andreina]?

 Make NN lease soft and hard limits configurable
 ---

 Key: HDFS-6697
 URL: https://issues.apache.org/jira/browse/HDFS-6697
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-6697.1.patch, HDFS-6697.2.patch, HDFS-6697.3.patch


 For testing, NameNodeAdapter allows test code to specify lease soft and hard 
 limit via setLeasePeriod directly on LeaseManager. But NamenodeProxies.java 
 still use the default values.
  
 It is useful if we can make NN lease soft and hard limit configurable via 
 Configuration. That will allow NamenodeProxies.java to use the configured 
 values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8835) Convert BlockInfoUnderConstruction as an interface

2015-07-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647269#comment-14647269
 ] 

Zhe Zhang commented on HDFS-8835:
-

Thanks Nicholas for the suggestion. This JIRA shares the same purpose with 
HDFS-8487. In HDFS-7285 we made a lot of changes to generalize {{BlockInfo}}, 
which are not directly related to EC. Pushing those changes to trunk separately 
will make ease merge reviewing (especially for reviewers not directly involved 
in branch dev).

So I'll hold on working on a patch, and leave this JIRA open to solicit 
feedbacks from merge reviewing. LMK your thoughts. Thanks.

 Convert BlockInfoUnderConstruction as an interface
 --

 Key: HDFS-8835
 URL: https://issues.apache.org/jira/browse/HDFS-8835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 Per discussion under HDFS-8499, this JIRA aims to convert 
 {{BlockInfoUnderConstruction}} as an interface and 
 {{BlockInfoContiguousUnderConstruction}} as its implementation. The HDFS-7285 
 branch will add {{BlockInfoStripedUnderConstruction}} as another 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8698) Add -direct flag option for fs copy so that user can choose not to create ._COPYING_ file

2015-07-30 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-8698:

Issue Type: Improvement  (was: New Feature)

 Add -direct flag option for fs copy so that user can choose not to create 
 ._COPYING_ file
 -

 Key: HDFS-8698
 URL: https://issues.apache.org/jira/browse/HDFS-8698
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.7.0
Reporter: Chen He
Assignee: J.Andreina

 Because CLI is using CommandWithDestination.java which add ._COPYING_ to 
 the tail of file name when it does the copy. For blobstore like S3 and Swift, 
 to create ._COPYING_ file and rename it is expensive. -direct flag can 
 allow user to avoiding the ._COPYING_ file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-30 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647996#comment-14647996
 ] 

Andrew Wang commented on HDFS-8833:
---

Great discussion [~zhz] [~walter.k.su] and [~vinayrpet], agree with everything 
so far. In terms of memory overhead, as long as it only increases memory usage 
of EC files, I'm okay with it.

One hybrid approach we could pursue is providing a set of hardcoded policies 
and then allow additional user-configured policies as an XAttr. Users that 
stick to the the hardcoded policies will have improved memory usage. We can 
also dedupe xattr values in addition to names as an additional optimization.

Since IIUC we only want to provide hardcoded policies in phase 1, this hybrid 
approach can come later.

 Erasure coding: store EC schema and cell size with INodeFile and eliminate EC 
 zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8823) Move replication factor into individual blocks

2015-07-30 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648137#comment-14648137
 ] 

Haohui Mai commented on HDFS-8823:
--

bq. Thanks for the work Haohui. As Walter Su pointed out , the patch copies 
rather than moves the replication factor. So a minor ask is to update the JIRA 
description.

Yes there is duplicate information but I don't feel this is a copy since the BM 
only relies on the information the BlockInfo for replication? I plan to further 
clean things up in subsequent patches.

bq. To give my two cents: replication factor, storage policy, and EC policy 
(potentially) are the same class of control knobs for HDFS users and we should 
probably consolidate the discussions on how to manage them in the context of 
separating NN and BM layers.

There are long efforts and consensus that BM layer and the namespace need to be 
separated (e.g., HDFS-2106, HDFS-5477 and HDFS-7836). Though progress is slow 
but we should not make design and implementation choices that contradicts with 
the effort. You might want to take a look of the related jiras and the code to 
get more contexts.

This is a nontrivial effort and IMO it is more constructive to discuss in 
separate jiras with concrete contexts. Speaking of what you've mentioned, this 
particular jira proposes to decouple BM and NN w.r.t replication factors in 
trunk. I've given my feedbacks in the HDFS-8833 already. There is no jira 
w.r.t. the storage policy yet. However, you're more than welcomed to take a 
crack and submit a patch if you want to investigate on storage policy.

 Move replication factor into individual blocks
 --

 Key: HDFS-8823
 URL: https://issues.apache.org/jira/browse/HDFS-8823
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8823.000.patch


 This jira proposes to record the replication factor in the {{BlockInfo}} 
 class. The changes have two advantages:
 * Decoupling the namespace and the block management layer. It is a 
 prerequisite step to move block management off the heap or to a separate 
 process.
 * Increased flexibility on replicating blocks. Currently the replication 
 factors of all blocks have to be the same. The replication factors of these 
 blocks are equal to the highest replication factor across all snapshots. The 
 changes will allow blocks in a file to have different replication factor, 
 potentially saving some space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8816) Improve visualization for the Datanode tab in the NN UI

2015-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647448#comment-14647448
 ] 

Hudson commented on HDFS-8816:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1002 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1002/])
HDFS-8816. Improve visualization for the Datanode tab in the NN UI. Contributed 
by Haohui Mai. (wheat9: rev ddc867ceb9a76986e8379361753598cc48024376)
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/hadoop.css
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.js
* hadoop-hdfs-project/hadoop-hdfs/pom.xml
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/moment.min.js


 Improve visualization for the Datanode tab in the NN UI
 ---

 Key: HDFS-8816
 URL: https://issues.apache.org/jira/browse/HDFS-8816
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.8.0

 Attachments: HDFS-8816.000.patch, HDFS-8816.001.patch, 
 HDFS-8816.002.patch, HDFS-8816.003.patch, HDFS-8816.004.patch, HDFS-8816.png, 
 HDFS-8816.png, Screen Shot 2015-07-23 at 10.24.24 AM.png


 The information of the datanode tab in the NN UI is clogged. This jira 
 proposes to improve the visualization of the datanode tab in the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8388) Time and Date format need to be in sync in Namenode UI page

2015-07-30 Thread Surendra Singh Lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647452#comment-14647452
 ] 

Surendra Singh Lilhore commented on HDFS-8388:
--

Thanks [~ajisakaa] for review..

bq. 2. Would you use ddd MMM DD HH:mm:ss ZZ  format instead of ddd MMM 
DD HH:mm:ss  to be consistent with YARN Web UI?

If I use ddd MMM DD HH:mm:ss ZZ  format then Block Deletion Start Time 
and Compiled time is coming like this 
{noformat}
Compiled:   Mon Jul 06 12:20:00 +0530 2015 by sslilhore from trunk
Block Deletion Start Time:  Thu Jul 30 14:31:30 +0530 2015
{noformat}

and Started time is like this 
{noformat}
Started:Thu Jul 30 17:01:30 CST 2015
{noformat}

here Compiled time zone is +0530 but Started time zone is CST.

is this ok ??

 Time and Date format need to be in sync in Namenode UI page
 ---

 Key: HDFS-8388
 URL: https://issues.apache.org/jira/browse/HDFS-8388
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Archana T
Assignee: Surendra Singh Lilhore
Priority: Minor
 Attachments: HDFS-8388-002.patch, HDFS-8388.patch, HDFS-8388_1.patch, 
 ScreenShot-InvalidDate.png


 In NameNode UI Page, Date and Time FORMAT  displayed on the page are not in 
 sync currently.
 Started:Wed May 13 12:28:02 IST 2015
 Compiled:23 Apr 2015 12:22:59 
 Block Deletion Start Time   13 May 2015 12:28:02
 We can keep a common format in all the above places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8830) Support add/remove directories to an existing encryption zone

2015-07-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647464#comment-14647464
 ] 

Hadoop QA commented on HDFS-8830:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 49s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 27s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 26s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 17s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 14s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 162m  6s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 27s | Tests passed in 
hadoop-hdfs-client. |
| | | 235m 37s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747933/HDFS-8830.05.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ddc867ce |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11872/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11872/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11872/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11872/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11872/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11872/console |


This message was automatically generated.

 Support add/remove directories to an existing encryption zone
 -

 Key: HDFS-8830
 URL: https://issues.apache.org/jira/browse/HDFS-8830
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HDFS-8830.01.patch, HDFS-8830.02.patch, 
 HDFS-8830.03.patch, HDFS-8830.04.patch, HDFS-8830.05.patch


 This is the first step toward better Scratch space and Soft Delete 
 support. We remove the assumption that the hdfs directory and encryption zone 
 is 1 to 1 mapped and can't be changed once created.
 The encryption zone creation part is kept As-Is from Hadoop 2.4. We 
 generalize the encryption zone and its directories from 1:1 to 1:many. This 
 way, other directories such as scratch can be added to/removed from 
 encryption zone as needed. Later on, files in these directories can be 
 renamed within the same encryption zone efficiently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-07-30 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647366#comment-14647366
 ] 

Li Bo commented on HDFS-8838:
-

Thanks Nicholas for the effort on this problem. I have just updated a new patch 
to HDFS-8704 which also fixes this problem. I will review the patch of this 
jira and hope we can merge them together.

 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8838_20150729.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-30 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647360#comment-14647360
 ] 

Vinayakumar B commented on HDFS-8833:
-

{quote}path: Refer to an empty and already created directory in HDFS. This is a 
mandatory parameter.
schemaName: This is an optional parameter, specified using ‘-s’ flag. Refer to 
the name of ECSchema to be used for erasure coding of direct children (files or 
subdirectories) of this directory. If not specified the system default ECSchema 
will be used.{quote}
Here, description for path and schemaName should be updated. It could be file 
also. And need not be empty. I think can change after creating files also, only 
thing is further files will inherit different ECPolicy.


 Erasure coding: store EC schema and cell size with INodeFile and eliminate EC 
 zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails

2015-07-30 Thread Li Bo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8704:

Attachment: HDFS-8704-HDFS-7285-003.patch

 Erasure Coding: client fails to write large file when one datanode fails
 

 Key: HDFS-8704
 URL: https://issues.apache.org/jira/browse/HDFS-8704
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, 
 HDFS-8704-HDFS-7285-003.patch


 I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
 corrupt, client succeeds to write a file smaller than a block group but fails 
 to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
 files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails

2015-07-30 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647363#comment-14647363
 ] 

Li Bo commented on HDFS-8704:
-

Patch 003 fixes the problem of writing a file that is a little smaller than a 
block group, (ie. file length = blocksize * dataBlocks -123). If the test 
suspends, that's mainly caused by HDFS-8839.

 Erasure Coding: client fails to write large file when one datanode fails
 

 Key: HDFS-8704
 URL: https://issues.apache.org/jira/browse/HDFS-8704
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8704-000.patch, HDFS-8704-HDFS-7285-002.patch, 
 HDFS-8704-HDFS-7285-003.patch


 I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
 corrupt, client succeeds to write a file smaller than a block group but fails 
 to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
 files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size with INodeFile and eliminate EC zones

2015-07-30 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647371#comment-14647371
 ] 

Vinayakumar B commented on HDFS-8833:
-

Idea of having ECPolicy is good. But copying the Same content in All levels, 
(dir, file and block), will significantly increase the memory usage of the 
NameNode. I think this was one of the motivation for erasure coding zone 
concept. I agree that, trash and rename have some problems.
As [~walter.k.su] mentioned, we can have a ID, to save the memery usage and use 
this ID everywhere, as done for BlockStoragePolicy.
But BlockStoragePolicy is limited to max 16 policies and its hardcoded. I dont 
think we can hardcode all possible schemas and limit to only 16. If its made 
configurable, IDs of the already existing schemas should not change at all.

Huge increase in NameNode memory usage is by default not accepted.

[~zhz], you have any possible estimation, how much memory usage will be 
affected from this change?

 Erasure coding: store EC schema and cell size with INodeFile and eliminate EC 
 zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)