[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-04-29 Thread Xinwei Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518976#comment-14518976
 ] 

Xinwei Qin  commented on HDFS-7836:
---

Hi [~cmccabe],  [~clamb], This is a very meaningful improvement. Is there any 
update or next plan about this JIRA? Could you list a summary of the meeting 
held on March 11th? 

 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: BlockManagerScalabilityImprovementsDesign.pdf


 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2015-04-29 Thread surendra singh lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518968#comment-14518968
 ] 

surendra singh lilhore commented on HDFS-8277:
--


In {{DFSAdmin.java}} class {{setSafeMode()}} API will iterate on NN proxy list 
and one by one set NN in safe mode. 
If first namenode connection is failed then it will come out from loop.
{code}
 for (ProxyAndInfoClientProtocol proxy : proxies) {
ClientProtocol haNn = proxy.getProxy();
boolean inSafeMode = haNn.setSafeMode(action, false);
if (waitExitSafe) {
  inSafeMode = waitExitSafeMode(haNn, inSafeMode);
}
System.out.println(Safe mode is  + (inSafeMode ? ON : OFF)
+  in  + proxy.getAddress());
  }
{code}

Here we should catch the connection exception and continue for next NN.

 Safemode enter fails when Standby NameNode is down
 --

 Key: HDFS-8277
 URL: https://issues.apache.org/jira/browse/HDFS-8277
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, HDFS, namenode
Affects Versions: 2.6.0
 Environment: HDP 2.2.0
Reporter: Hari Sekhon
Assignee: surendra singh lilhore
Priority: Minor
 Attachments: HDFS-8277.patch


 HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
 AMBARI-10536).
 {code}hdfs dfsadmin -safemode enter
 safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused{code}
 This appears to be a bug in that it's not trying both NameNodes like the 
 standard hdfs client code does, and is instead stopping after getting a 
 connection refused from nn1 which is down. I verified normal hadoop fs writes 
 and reads via cli did work at this time, using nn2. I happened to run this 
 command as the hdfs user on nn2 which was the surviving Active NameNode.
 After I re-bootstrapped the Standby NN to fix it the command worked as 
 expected again.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518786#comment-14518786
 ] 

Hadoop QA commented on HDFS-8214:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 28s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   7m 26s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  5s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 164m 49s | Tests failed in hadoop-hdfs. |
| | | 212m 47s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12728499/HDFS-8214.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 439614b |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10446/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10446/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10446/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10446/console |


This message was automatically generated.

 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, 
 HDFS-8214.003.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2015-04-29 Thread surendra singh lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

surendra singh lilhore updated HDFS-8277:
-
Attachment: HDFS-8277_1.patch

Attached patch , Please review...

 Safemode enter fails when Standby NameNode is down
 --

 Key: HDFS-8277
 URL: https://issues.apache.org/jira/browse/HDFS-8277
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, HDFS, namenode
Affects Versions: 2.6.0
 Environment: HDP 2.2.0
Reporter: Hari Sekhon
Assignee: surendra singh lilhore
Priority: Minor
 Attachments: HDFS-8277.patch, HDFS-8277_1.patch


 HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
 AMBARI-10536).
 {code}hdfs dfsadmin -safemode enter
 safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused{code}
 This appears to be a bug in that it's not trying both NameNodes like the 
 standard hdfs client code does, and is instead stopping after getting a 
 connection refused from nn1 which is down. I verified normal hadoop fs writes 
 and reads via cli did work at this time, using nn2. I happened to run this 
 command as the hdfs user on nn2 which was the surviving Active NameNode.
 After I re-bootstrapped the Standby NN to fix it the command worked as 
 expected again.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2015-04-29 Thread surendra singh lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

surendra singh lilhore updated HDFS-8277:
-
Attachment: HDFS-8277.patch

 Safemode enter fails when Standby NameNode is down
 --

 Key: HDFS-8277
 URL: https://issues.apache.org/jira/browse/HDFS-8277
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, HDFS, namenode
Affects Versions: 2.6.0
 Environment: HDP 2.2.0
Reporter: Hari Sekhon
Assignee: surendra singh lilhore
Priority: Minor
 Attachments: HDFS-8277.patch


 HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
 AMBARI-10536).
 {code}hdfs dfsadmin -safemode enter
 safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused{code}
 This appears to be a bug in that it's not trying both NameNodes like the 
 standard hdfs client code does, and is instead stopping after getting a 
 connection refused from nn1 which is down. I verified normal hadoop fs writes 
 and reads via cli did work at this time, using nn2. I happened to run this 
 command as the hdfs user on nn2 which was the surviving Active NameNode.
 After I re-bootstrapped the Standby NN to fix it the command worked as 
 expected again.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8259) Erasure Coding: Test of reading EC file

2015-04-29 Thread Xinwei Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinwei Qin  reassigned HDFS-8259:
-

Assignee: Xinwei Qin 

 Erasure Coding: Test of reading EC file
 ---

 Key: HDFS-8259
 URL: https://issues.apache.org/jira/browse/HDFS-8259
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: HDFS-7285
Reporter: GAO Rui
Assignee: Xinwei Qin 

 1. Normally reading EC file(reading without datanote failure and no need of 
 recovery)
 2. Reading EC file with datanode failure.
 3. Reading EC file with data block recovery by decoding from parity blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8262) Erasure Coding: Test of datanode decommission which EC blocks are stored

2015-04-29 Thread Xinwei Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinwei Qin  reassigned HDFS-8262:
-

Assignee: Xinwei Qin 

 Erasure Coding: Test of  datanode decommission which EC blocks are stored 
 --

 Key: HDFS-8262
 URL: https://issues.apache.org/jira/browse/HDFS-8262
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: GAO Rui
Assignee: Xinwei Qin 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8260) Erasure Coding: test of writing EC file

2015-04-29 Thread Xinwei Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinwei Qin  reassigned HDFS-8260:
-

Assignee: Xinwei Qin 

 Erasure Coding:  test of writing EC file
 

 Key: HDFS-8260
 URL: https://issues.apache.org/jira/browse/HDFS-8260
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: HDFS-7285
Reporter: GAO Rui
Assignee: Xinwei Qin 

 1. Normally writing EC file(writing without datanote failure)
 2. Writing EC file with tolerable number of datanodes failing.
 3. Writing EC file with intolerable number of datanodes failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8265) Erasure Coding: Test of Quota calculation for EC files

2015-04-29 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R reassigned HDFS-8265:
--

Assignee: Rakesh R

 Erasure Coding: Test of Quota calculation for EC files
 --

 Key: HDFS-8265
 URL: https://issues.apache.org/jira/browse/HDFS-8265
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: HDFS-7285
Reporter: GAO Rui
Assignee: Rakesh R





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream

2015-04-29 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518982#comment-14518982
 ] 

Yi Liu edited comment on HDFS-8272 at 4/29/15 10:15 AM:


Thanks Jing for the work and Zhe for the review !

{code}
private int fetchEncryptionKeyTimes = 2;
private int fetchTokenTimes = 2;
{code}
Should them be {{1}}?





was (Author: hitliuyi):
Thanks Jing for the work and Zhe for the review !

{quote}
-  if (pos  blockEnd || currentNodes == null) {
-currentNodes = blockSeekTo(pos);
-  }
+if (pos  blockEnd) {
+  blockSeekTo(pos);
+}
{quote}
We should keep {{currentNodes == null}} ?  Otherwise {{blockReaders}} is not 
initialized?

{code}
private int fetchEncryptionKeyTimes = 2;
private int fetchTokenTimes = 2;
{code}
Should them be {{1}}?




 Erasure Coding: simplify the retry logic in DFSStripedInputStream
 -

 Key: HDFS-8272
 URL: https://issues.apache.org/jira/browse/HDFS-8272
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch


 Currently in DFSStripedInputStream the retry logic is still the same with 
 DFSInputStream. More specifically, every failed read will try to search for 
 another source node. And an exception is thrown when no new source node can 
 be identified. This logic is not appropriate for EC inputstream and can be 
 simplified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding

2015-04-29 Thread Uma Maheswara Rao G (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-8129:
--
Summary: Erasure Coding: Maintain consistent naming for Erasure Coding 
related classes - EC/ErasureCoding  (was: Maintain consistent naming for 
Erasure Coding related classes - EC/ErasureCode)

 Erasure Coding: Maintain consistent naming for Erasure Coding related classes 
 - EC/ErasureCoding
 

 Key: HDFS-8129
 URL: https://issues.apache.org/jira/browse/HDFS-8129
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor

 Currently I see some classes named as ErasureCode* and some are with EC*
 I feel we should maintain consistent naming across project. This jira to 
 correct the places where we named differently to be a unique.
 And also to discuss which naming we can follow from now onwards when we 
 create new classes. 
 ErasureCoding* should be fine IMO. Lets discuss what others feel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-04-29 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519127#comment-14519127
 ] 

Rakesh R commented on HDFS-8220:


Thanks a lot [~walter.k.su] for the details.
IMHO we could do a validation at the StripedDataStreamer now. Once the basic 
PlacementPolicyEC is implemented will revisit this case separately in the next 
phase. [~libo-intel] does this sound good to you.
I'm happy to volunteer for this task - support return multiple identical 
storages.

 Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
 doesn't satisfy BlockGroupSize
 ---

 Key: HDFS-8220
 URL: https://issues.apache.org/jira/browse/HDFS-8220
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
 HDFS-8220-003.patch


 During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
 validate the available datanodes against the {{BlockGroupSize}}. Please see 
 the exception to understand more:
 {code}
 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
 DataStreamer Exception
 java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
 (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
 java.io.IOException: DataStreamer Exception: 
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   ... 1 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding

2015-04-29 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519125#comment-14519125
 ] 

Uma Maheswara Rao G commented on HDFS-8129:
---

I checked the class names. Right now I see below names needs to be modified

{noformat}
org.apache.hadoop.hdfs.protocol.ECInfo
org.apache.hadoop.hdfs.protocol.ECZoneInfo
org.apache.hadoop.hdfs.server.namenode.ECSchemaManager
{noformat}

Please list me if miss some other references to change.

 Erasure Coding: Maintain consistent naming for Erasure Coding related classes 
 - EC/ErasureCoding
 

 Key: HDFS-8129
 URL: https://issues.apache.org/jira/browse/HDFS-8129
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor

 Currently I see some classes named as ErasureCode* and some are with EC*
 I feel we should maintain consistent naming across project. This jira to 
 correct the places where we named differently to be a unique.
 And also to discuss which naming we can follow from now onwards when we 
 create new classes. 
 ErasureCoding* should be fine IMO. Lets discuss what others feel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding

2015-04-29 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519149#comment-14519149
 ] 

Rakesh R commented on HDFS-8129:


I'm adding few more classes for the discussion, please see:
{code}
org.apache.hadoop.io.erasurecode.ECChunk
org.apache.hadoop.io.erasurecode.ECBlockGroup
org.apache.hadoop.io.erasurecode.ECBlock
org.apache.hadoop.io.erasurecode.ECSchema
{code}

 Erasure Coding: Maintain consistent naming for Erasure Coding related classes 
 - EC/ErasureCoding
 

 Key: HDFS-8129
 URL: https://issues.apache.org/jira/browse/HDFS-8129
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor

 Currently I see some classes named as ErasureCode* and some are with EC*
 I feel we should maintain consistent naming across project. This jira to 
 correct the places where we named differently to be a unique.
 And also to discuss which naming we can follow from now onwards when we 
 create new classes. 
 ErasureCoding* should be fine IMO. Lets discuss what others feel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519225#comment-14519225
 ] 

Hudson commented on HDFS-8273:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #169 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/169/])
HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the 
lock. Contributed by Haohui Mai. (wheat9: rev 
c79e7f7d997596e0c38ae4cddff2bd0910581c16)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 FSNamesystem#Delete() should not call logSync() when holding the lock
 -

 Key: HDFS-8273
 URL: https://issues.apache.org/jira/browse/HDFS-8273
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Jing Zhao
Assignee: Haohui Mai
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch


 HDFS-7573 moves the logSync call inside of the write lock by accident. We 
 should move it out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519223#comment-14519223
 ] 

Hudson commented on HDFS-8280:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #169 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/169/])
HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: 
rev 439614b0c8a3df3d8b7967451c5331a0e034e13a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


 Code Cleanup in DFSInputStream
 --

 Key: HDFS-8280
 URL: https://issues.apache.org/jira/browse/HDFS-8280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8280.000.patch


 This is some code cleanup separate from HDFS-8272:
 # Avoid duplicated block reader creation code
 # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null 
 instead of throwing Exception. Whether to throw Exception or not should be 
 determined by {{getBestNodeDNAddrPair}}'s caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519227#comment-14519227
 ] 

Hudson commented on HDFS-8204:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #169 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/169/])
HDFS-8204. Mover/Balancer should not schedule two replicas to the same 
datanode.  Contributed by Walter Su (szetszwo: rev 
5639bf02da716b3ecda785979b3d08cdca15972d)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java


 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Fix For: 2.7.1

 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
 HDFS-8204.003.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding

2015-04-29 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519230#comment-14519230
 ] 

Uma Maheswara Rao G commented on HDFS-8129:
---

Actually Nicholas quoted above that, for the classes already inside erasurecode 
package, it may be good enough to keep as is ( ECXXX). We can discuss further 
if others does not agree on it. That was the reason I did not picked up that 
classes into my list before.


 Erasure Coding: Maintain consistent naming for Erasure Coding related classes 
 - EC/ErasureCoding
 

 Key: HDFS-8129
 URL: https://issues.apache.org/jira/browse/HDFS-8129
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor

 Currently I see some classes named as ErasureCode* and some are with EC*
 I feel we should maintain consistent naming across project. This jira to 
 correct the places where we named differently to be a unique.
 And also to discuss which naming we can follow from now onwards when we 
 create new classes. 
 ErasureCoding* should be fine IMO. Lets discuss what others feel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519251#comment-14519251
 ] 

Hudson commented on HDFS-8273:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #178 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/178/])
HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the 
lock. Contributed by Haohui Mai. (wheat9: rev 
c79e7f7d997596e0c38ae4cddff2bd0910581c16)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java


 FSNamesystem#Delete() should not call logSync() when holding the lock
 -

 Key: HDFS-8273
 URL: https://issues.apache.org/jira/browse/HDFS-8273
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Jing Zhao
Assignee: Haohui Mai
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch


 HDFS-7573 moves the logSync call inside of the write lock by accident. We 
 should move it out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519253#comment-14519253
 ] 

Hudson commented on HDFS-8204:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #178 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/178/])
HDFS-8204. Mover/Balancer should not schedule two replicas to the same 
datanode.  Contributed by Walter Su (szetszwo: rev 
5639bf02da716b3ecda785979b3d08cdca15972d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java


 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Fix For: 2.7.1

 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
 HDFS-8204.003.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding

2015-04-29 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519256#comment-14519256
 ] 

Rakesh R commented on HDFS-8129:


oops, thanks Uma for pointing out this. Probably it can be used to reach to a 
conclusion (EcXxx) or (ECXxx)

 Erasure Coding: Maintain consistent naming for Erasure Coding related classes 
 - EC/ErasureCoding
 

 Key: HDFS-8129
 URL: https://issues.apache.org/jira/browse/HDFS-8129
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor

 Currently I see some classes named as ErasureCode* and some are with EC*
 I feel we should maintain consistent naming across project. This jira to 
 correct the places where we named differently to be a unique.
 And also to discuss which naming we can follow from now onwards when we 
 create new classes. 
 ErasureCoding* should be fine IMO. Lets discuss what others feel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-04-29 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Status: Patch Available  (was: Reopened)

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519199#comment-14519199
 ] 

Hudson commented on HDFS-8273:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2110 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2110/])
HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the 
lock. Contributed by Haohui Mai. (wheat9: rev 
c79e7f7d997596e0c38ae4cddff2bd0910581c16)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 FSNamesystem#Delete() should not call logSync() when holding the lock
 -

 Key: HDFS-8273
 URL: https://issues.apache.org/jira/browse/HDFS-8273
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Jing Zhao
Assignee: Haohui Mai
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch


 HDFS-7573 moves the logSync call inside of the write lock by accident. We 
 should move it out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519197#comment-14519197
 ] 

Hudson commented on HDFS-8280:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2110 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2110/])
HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: 
rev 439614b0c8a3df3d8b7967451c5331a0e034e13a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Code Cleanup in DFSInputStream
 --

 Key: HDFS-8280
 URL: https://issues.apache.org/jira/browse/HDFS-8280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8280.000.patch


 This is some code cleanup separate from HDFS-8272:
 # Avoid duplicated block reader creation code
 # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null 
 instead of throwing Exception. Whether to throw Exception or not should be 
 determined by {{getBestNodeDNAddrPair}}'s caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519201#comment-14519201
 ] 

Hudson commented on HDFS-8204:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2110 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2110/])
HDFS-8204. Mover/Balancer should not schedule two replicas to the same 
datanode.  Contributed by Walter Su (szetszwo: rev 
5639bf02da716b3ecda785979b3d08cdca15972d)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java


 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Fix For: 2.7.1

 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
 HDFS-8204.003.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-04-29 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb reopened HDFS-7847:


Porting to trunk. .004 submitted.

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519248#comment-14519248
 ] 

Hudson commented on HDFS-8280:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #178 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/178/])
HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: 
rev 439614b0c8a3df3d8b7967451c5331a0e034e13a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


 Code Cleanup in DFSInputStream
 --

 Key: HDFS-8280
 URL: https://issues.apache.org/jira/browse/HDFS-8280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8280.000.patch


 This is some code cleanup separate from HDFS-8272:
 # Avoid duplicated block reader creation code
 # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null 
 instead of throwing Exception. Whether to throw Exception or not should be 
 determined by {{getBestNodeDNAddrPair}}'s caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding

2015-04-29 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519263#comment-14519263
 ] 

Kai Zheng commented on HDFS-8129:
-

Thanks for the discussion. It would be great if we could keep those names in 
the codec  coder framework.

 Erasure Coding: Maintain consistent naming for Erasure Coding related classes 
 - EC/ErasureCoding
 

 Key: HDFS-8129
 URL: https://issues.apache.org/jira/browse/HDFS-8129
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor

 Currently I see some classes named as ErasureCode* and some are with EC*
 I feel we should maintain consistent naming across project. This jira to 
 correct the places where we named differently to be a unique.
 And also to discuss which naming we can follow from now onwards when we 
 create new classes. 
 ErasureCoding* should be fine IMO. Lets discuss what others feel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519283#comment-14519283
 ] 

Hudson commented on HDFS-8204:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #912 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/912/])
HDFS-8204. Mover/Balancer should not schedule two replicas to the same 
datanode.  Contributed by Walter Su (szetszwo: rev 
5639bf02da716b3ecda785979b3d08cdca15972d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Fix For: 2.7.1

 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
 HDFS-8204.003.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519281#comment-14519281
 ] 

Hudson commented on HDFS-8273:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #912 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/912/])
HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the 
lock. Contributed by Haohui Mai. (wheat9: rev 
c79e7f7d997596e0c38ae4cddff2bd0910581c16)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 FSNamesystem#Delete() should not call logSync() when holding the lock
 -

 Key: HDFS-8273
 URL: https://issues.apache.org/jira/browse/HDFS-8273
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Jing Zhao
Assignee: Haohui Mai
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch


 HDFS-7573 moves the logSync call inside of the write lock by accident. We 
 should move it out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519279#comment-14519279
 ] 

Hudson commented on HDFS-8280:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #912 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/912/])
HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: 
rev 439614b0c8a3df3d8b7967451c5331a0e034e13a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Code Cleanup in DFSInputStream
 --

 Key: HDFS-8280
 URL: https://issues.apache.org/jira/browse/HDFS-8280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8280.000.patch


 This is some code cleanup separate from HDFS-8272:
 # Avoid duplicated block reader creation code
 # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null 
 instead of throwing Exception. Whether to throw Exception or not should be 
 determined by {{getBestNodeDNAddrPair}}'s caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-29 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519304#comment-14519304
 ] 

Charles Lamb commented on HDFS-8214:


The test failure is unrelated. The checkstyle issue has already been discussed 
above.


 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, 
 HDFS-8214.003.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-04-29 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7847:
---
Target Version/s: 2.8.0  (was: HDFS-7836)
  Status: In Progress  (was: Patch Available)

 Modify NNThroughputBenchmark to be able to operate on a remote NameNode
 ---

 Key: HDFS-7847
 URL: https://issues.apache.org/jira/browse/HDFS-7847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
 Fix For: HDFS-7836

 Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch, 
 HDFS-7847.002.patch, HDFS-7847.003.patch, HDFS-7847.004.patch, 
 make_blocks.tar.gz


 Modify NNThroughputBenchmark to be able to operate on a NN that is not in 
 process. A followon Jira will modify it some more to allow quantifying native 
 and java heap sizes, and some latency numbers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8015) Erasure Coding: local and remote block writer for coding work in DataNode

2015-04-29 Thread Li Bo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8015:

Attachment: HDFS-8015-001.patch

 Erasure Coding: local and remote block writer for coding work in DataNode
 -

 Key: HDFS-8015
 URL: https://issues.apache.org/jira/browse/HDFS-8015
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Li Bo
 Attachments: HDFS-8015-000.patch, HDFS-8015-001.patch


 As a task of HDFS-7344 ECWorker, in either stripping or non-stripping erasure 
 coding, to perform encoding or decoding, we need to be able to write data 
 blocks locally or remotely. This is to come up block writer facility in 
 DataNode side. Better to think about the similar work done in client side, so 
 in future it's possible to unify the both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode

2015-04-29 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518754#comment-14518754
 ] 

Zhe Zhang commented on HDFS-7859:
-

[~aw] Thanks again for bringing in the feature-branch pre-commit Jenkins 
functionality! It's really helpful. We just saw another successful run under 
HDFS-7678.

 Erasure Coding: Persist EC schemas in NameNode
 --

 Key: HDFS-7859
 URL: https://issues.apache.org/jira/browse/HDFS-7859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-7859-HDFS-7285.002.patch, 
 HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch


 In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
 persist EC schemas in NameNode centrally and reliably, so that EC zones can 
 reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8276) LazyPersistFileScrubber should be disabled if scrubber interval configured zero

2015-04-29 Thread surendra singh lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518975#comment-14518975
 ] 

surendra singh lilhore commented on HDFS-8276:
--

Attached patch, Please review 

 LazyPersistFileScrubber should be disabled if scrubber interval configured 
 zero
 ---

 Key: HDFS-8276
 URL: https://issues.apache.org/jira/browse/HDFS-8276
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
 Attachments: HDFS-8276.patch


 bq. but I think it is simple enough to change the meaning of the value so 
 that zero means 'never scrub'. Let me post an updated patch.
 As discussed in [HDFS-6929|https://issues.apache.org/jira/browse/HDFS-6929], 
 scrubber should be disable if 
 *dfs.namenode.lazypersist.file.scrub.interval.sec* is zero.
 Currently namenode startup is failing if interval configured zero
 {code}
 2015-04-27 23:47:31,744 ERROR 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
 initialization failed.
 java.lang.IllegalArgumentException: 
 dfs.namenode.lazypersist.file.scrub.interval.sec must be non-zero.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:828)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block

2015-04-29 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518771#comment-14518771
 ] 

Yongjun Zhang commented on HDFS-7281:
-

Hi [~mingma],

Thanks for the new rev and the release notes!   One minor thing, I found three 
lines touched by this patch exceed 80 chars:
* Line 852 of BlockManager
* Line 575 and 703 of NamenodeFsck
Really sorry I did not catch them in earlier rounds. +1 after that and jenkins.




 Missing block is marked as corrupted block
 --

 Key: HDFS-7281
 URL: https://issues.apache.org/jira/browse/HDFS-7281
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
  Labels: supportability
 Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, 
 HDFS-7281-5.patch, HDFS-7281.patch


 In the situation where the block lost all its replicas, fsck shows the block 
 is missing as well as corrupted. Perhaps it is better not to mark the block 
 corrupted in this case. The reason it is marked as corrupted is 
 numCorruptNodes == numNodes == 0 in the following code.
 {noformat}
 BlockManager
 final boolean isCorrupt = numCorruptNodes == numNodes;
 {noformat}
 Would like to clarify if it is the intent to mark missing block as corrupted 
 or it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream

2015-04-29 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518982#comment-14518982
 ] 

Yi Liu commented on HDFS-8272:
--

Thanks Jing for the work and Zhe for the review !

{quote}
-  if (pos  blockEnd || currentNodes == null) {
-currentNodes = blockSeekTo(pos);
-  }
+if (pos  blockEnd) {
+  blockSeekTo(pos);
+}
{quote}
We should keep {{currentNodes == null}} ?  Otherwise {{blockReaders}} is not 
initialized?

{code}
private int fetchEncryptionKeyTimes = 2;
private int fetchTokenTimes = 2;
{code}
Should them be {{1}}?




 Erasure Coding: simplify the retry logic in DFSStripedInputStream
 -

 Key: HDFS-8272
 URL: https://issues.apache.org/jira/browse/HDFS-8272
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch


 Currently in DFSStripedInputStream the retry logic is still the same with 
 DFSInputStream. More specifically, every failed read will try to search for 
 another source node. And an exception is thrown when no new source node can 
 be identified. This logic is not appropriate for EC inputstream and can be 
 simplified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2015-04-29 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518992#comment-14518992
 ] 

Brahma Reddy Battula commented on HDFS-8277:


[~surendrasingh] Thanks for working on this.. Patch LGTM,+1 ( non binding)...

 Safemode enter fails when Standby NameNode is down
 --

 Key: HDFS-8277
 URL: https://issues.apache.org/jira/browse/HDFS-8277
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, HDFS, namenode
Affects Versions: 2.6.0
 Environment: HDP 2.2.0
Reporter: Hari Sekhon
Assignee: surendra singh lilhore
Priority: Minor
 Attachments: HDFS-8277.patch, HDFS-8277_1.patch


 HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
 AMBARI-10536).
 {code}hdfs dfsadmin -safemode enter
 safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused{code}
 This appears to be a bug in that it's not trying both NameNodes like the 
 standard hdfs client code does, and is instead stopping after getting a 
 connection refused from nn1 which is down. I verified normal hadoop fs writes 
 and reads via cli did work at this time, using nn2. I happened to run this 
 command as the hdfs user on nn2 which was the surviving Active NameNode.
 After I re-bootstrapped the Standby NN to fix it the command worked as 
 expected again.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-04-29 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518931#comment-14518931
 ] 

Walter Su commented on HDFS-8220:
-

...In your unit test, the cluster has 3 nodes and only 3 locations are 
returned by NN for RS(6,3).
Currently, PlacementPolicy doesn't support return two identical storages. It 
even doesn't support two identical DNs.
Does BlockInfo.addStorage()/removeStorage() function well when two identical 
storages exists? 
Currently, normal block doesn't support this, how about EC BlockGroup doesn't 
support this temperately? We can discuss it in the future work when HDFS-7285 
is done.
HDFS-7613 handles the situation when short of racks, it doesn't handle when 
short of nodes.

 Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
 doesn't satisfy BlockGroupSize
 ---

 Key: HDFS-8220
 URL: https://issues.apache.org/jira/browse/HDFS-8220
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
 HDFS-8220-003.patch


 During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
 validate the available datanodes against the {{BlockGroupSize}}. Please see 
 the exception to understand more:
 {code}
 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
 DataStreamer Exception
 java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
 (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
 java.io.IOException: DataStreamer Exception: 
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   ... 1 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8150) Make getFileChecksum fail for blocks under construction

2015-04-29 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-8150:
-
Attachment: HDFS-8150.2.patch

Attaching the patch after fixing testcase failure.
Please review.

 Make getFileChecksum fail for blocks under construction
 ---

 Key: HDFS-8150
 URL: https://issues.apache.org/jira/browse/HDFS-8150
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: J.Andreina
Priority: Critical
 Attachments: HDFS-8150.1.patch, HDFS-8150.2.patch


 We have seen the cases of validating data copy using checksum then the 
 content of target changing. It turns out the target wasn't closed 
 successfully, so it was still under-construction.  One hour later, a lease 
 recovery kicked in and truncated the block.
 Although this can be prevented in many ways, if there is no valid use case 
 for getting file checksum from under-construction blocks, can it be disabled? 
  E.g. Datanode can throw an exception if the replica is not in the finalized 
 state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7348) Erasure Coding: striped block recovery

2015-04-29 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518935#comment-14518935
 ] 

Li Bo commented on HDFS-7348:
-

We can give two working modes to the recovery work, Fast, or Slow. For slow 
mode, we read blocks(or cells) in sequence for a decode calculation, then write 
to disk. After finished, we send blocks one by one. For fast mode, we read 
blocks in parallel and directly sends them to destinations without storing on 
local disk. The selection of mode is based on factors such as the network 
status, the burden of datanode who takes the recovery task, etc .

 Erasure Coding: striped block recovery
 --

 Key: HDFS-7348
 URL: https://issues.apache.org/jira/browse/HDFS-7348
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Yi Liu
 Attachments: ECWorker.java, HDFS-7348.001.patch


 This JIRA is to recover one or more missed striped block in the striped block 
 group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8276) LazyPersistFileScrubber should be disabled if scrubber interval configured zero

2015-04-29 Thread surendra singh lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

surendra singh lilhore updated HDFS-8276:
-
Attachment: HDFS-8276.patch

 LazyPersistFileScrubber should be disabled if scrubber interval configured 
 zero
 ---

 Key: HDFS-8276
 URL: https://issues.apache.org/jira/browse/HDFS-8276
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
 Attachments: HDFS-8276.patch


 bq. but I think it is simple enough to change the meaning of the value so 
 that zero means 'never scrub'. Let me post an updated patch.
 As discussed in [HDFS-6929|https://issues.apache.org/jira/browse/HDFS-6929], 
 scrubber should be disable if 
 *dfs.namenode.lazypersist.file.scrub.interval.sec* is zero.
 Currently namenode startup is failing if interval configured zero
 {code}
 2015-04-27 23:47:31,744 ERROR 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
 initialization failed.
 java.lang.IllegalArgumentException: 
 dfs.namenode.lazypersist.file.scrub.interval.sec must be non-zero.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:828)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2484) checkLease should throw FileNotFoundException when file does not exist

2015-04-29 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519015#comment-14519015
 ] 

Rakesh R commented on HDFS-2484:


Thanks [~shv] for reporting this. I could see {{FSNamesystem#checkLease}} has 
the following checks in branch-2 and trunk, does this validation satisfy your 
case?
.
[FSNamesystem#checkLease 
logic|https://github.com/apache/hadoop/blob/branch-2.7/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3417]
{code}
if (inode == null) {
  Lease lease = leaseManager.getLease(holder);
  throw new LeaseExpiredException(
  No lease on  + ident + : File does not exist. 
  + (lease != null ? lease.toString()
  : Holder  + holder +  does not have any open files.));
}
{code}

 checkLease should throw FileNotFoundException when file does not exist
 --

 Key: HDFS-2484
 URL: https://issues.apache.org/jira/browse/HDFS-2484
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.22.0, 2.0.0-alpha
Reporter: Konstantin Shvachko

 When file is deleted during its creation {{FSNamesystem.checkLease(String 
 src, String holder)}} throws {{LeaseExpiredException}}. It would be more 
 informative if it thrown {{FileNotFoundException}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2015-04-29 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-8277:
---
Issue Type: Bug  (was: Improvement)

 Safemode enter fails when Standby NameNode is down
 --

 Key: HDFS-8277
 URL: https://issues.apache.org/jira/browse/HDFS-8277
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, HDFS, namenode
Affects Versions: 2.6.0
 Environment: HDP 2.2.0
Reporter: Hari Sekhon
Assignee: surendra singh lilhore
Priority: Minor
 Attachments: HDFS-8277.patch, HDFS-8277_1.patch


 HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
 AMBARI-10536).
 {code}hdfs dfsadmin -safemode enter
 safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused{code}
 This appears to be a bug in that it's not trying both NameNodes like the 
 standard hdfs client code does, and is instead stopping after getting a 
 connection refused from nn1 which is down. I verified normal hadoop fs writes 
 and reads via cli did work at this time, using nn2. I happened to run this 
 command as the hdfs user on nn2 which was the surviving Active NameNode.
 After I re-bootstrapped the Standby NN to fix it the command worked as 
 expected again.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7613) Block placement policy for erasure coding groups

2015-04-29 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518818#comment-14518818
 ] 

Li Bo commented on HDFS-7613:
-

hi, Walter
HDFS-8220 depends upon this sub task but the current patch can’t be applied. 
Could you update your patch according to current code so that HDFS-8220 can go 
on based your new patch?
Thanks.


 Block placement policy for erasure coding groups
 

 Key: HDFS-7613
 URL: https://issues.apache.org/jira/browse/HDFS-7613
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Walter Su
 Attachments: HDFS-7613.001.patch


 Blocks in an erasure coding group should be placed in different failure 
 domains -- different DataNodes at the minimum, and different racks ideally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518817#comment-14518817
 ] 

Hadoop QA commented on HDFS-8269:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 36s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 26s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   7m 29s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  3s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 165m 26s | Tests failed in hadoop-hdfs. |
| | | 213m 25s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729005/HDFS-8269.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 439614b |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10447/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10447/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10447/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10447/console |


This message was automatically generated.

 getBlockLocations() does not resolve the .reserved path and generates 
 incorrect edit logs when updating the atime
 -

 Key: HDFS-8269
 URL: https://issues.apache.org/jira/browse/HDFS-8269
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Haohui Mai
Priority: Blocker
 Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch, 
 HDFS-8269.002.patch, HDFS-8269.003.patch


 When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, 
 it uses the path passed from the client, which generates incorrect edit logs 
 entries:
 {noformat}
   RECORD
 OPCODEOP_TIMES/OPCODE
 DATA
   TXID5085/TXID
   LENGTH0/LENGTH
   PATH/.reserved/.inodes/18230/PATH
   MTIME-1/MTIME
   ATIME1429908236392/ATIME
 /DATA
   /RECORD
 {noformat}
 Note that the NN does not resolve the {{/.reserved}} path when processing the 
 edit log, therefore it eventually leads to a NPE when loading the edit logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7348) Erasure Coding: striped block recovery

2015-04-29 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518918#comment-14518918
 ] 

Li Bo commented on HDFS-7348:
-

Thanks Yi for the great work!
I think decreasing network I/O is very important for a cluster using EC,  for 
read part, we may read blocks in sequence; for write part, we may first write 
decoded blocks to local disk and then send them to remote datanodes. This may 
slow the recovery work, but we reduce the impaction to network I/O especially 
when the cluster is busy.
For reader and writer, I think we may separate them out as independent classes 
so that other tasks can also use them.


 Erasure Coding: striped block recovery
 --

 Key: HDFS-7348
 URL: https://issues.apache.org/jira/browse/HDFS-7348
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Yi Liu
 Attachments: ECWorker.java, HDFS-7348.001.patch


 This JIRA is to recover one or more missed striped block in the striped block 
 group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2015-04-29 Thread surendra singh lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

surendra singh lilhore updated HDFS-8277:
-
Status: Patch Available  (was: Open)

 Safemode enter fails when Standby NameNode is down
 --

 Key: HDFS-8277
 URL: https://issues.apache.org/jira/browse/HDFS-8277
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, HDFS, namenode
Affects Versions: 2.6.0
 Environment: HDP 2.2.0
Reporter: Hari Sekhon
Assignee: surendra singh lilhore
Priority: Minor
 Attachments: HDFS-8277.patch, HDFS-8277_1.patch


 HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
 AMBARI-10536).
 {code}hdfs dfsadmin -safemode enter
 safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused{code}
 This appears to be a bug in that it's not trying both NameNodes like the 
 standard hdfs client code does, and is instead stopping after getting a 
 connection refused from nn1 which is down. I verified normal hadoop fs writes 
 and reads via cli did work at this time, using nn2. I happened to run this 
 command as the hdfs user on nn2 which was the surviving Active NameNode.
 After I re-bootstrapped the Standby NN to fix it the command worked as 
 expected again.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8265) Erasure Coding: Test of Quota calculation for EC files

2015-04-29 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519108#comment-14519108
 ] 

Rakesh R commented on HDFS-8265:


Thanks [~demongaorui] for reporting this task. I'm happy to volunteer and make 
an attempt. Please feel free to reassign if you have started with this.

 Erasure Coding: Test of Quota calculation for EC files
 --

 Key: HDFS-8265
 URL: https://issues.apache.org/jira/browse/HDFS-8265
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: HDFS-7285
Reporter: GAO Rui
Assignee: Rakesh R





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519571#comment-14519571
 ] 

Hudson commented on HDFS-8273:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2128 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2128/])
HDFS-8273. FSNamesystem#Delete() should not call logSync() when holding the 
lock. Contributed by Haohui Mai. (wheat9: rev 
c79e7f7d997596e0c38ae4cddff2bd0910581c16)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirDeleteOp.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 FSNamesystem#Delete() should not call logSync() when holding the lock
 -

 Key: HDFS-8273
 URL: https://issues.apache.org/jira/browse/HDFS-8273
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Jing Zhao
Assignee: Haohui Mai
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch


 HDFS-7573 moves the logSync call inside of the write lock by accident. We 
 should move it out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519573#comment-14519573
 ] 

Hudson commented on HDFS-8204:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2128 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2128/])
HDFS-8204. Mover/Balancer should not schedule two replicas to the same 
datanode.  Contributed by Walter Su (szetszwo: rev 
5639bf02da716b3ecda785979b3d08cdca15972d)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Fix For: 2.7.1

 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
 HDFS-8204.003.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8280) Code Cleanup in DFSInputStream

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519569#comment-14519569
 ] 

Hudson commented on HDFS-8280:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2128 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2128/])
HDFS-8280. Code Cleanup in DFSInputStream. Contributed by Jing Zhao. (wheat9: 
rev 439614b0c8a3df3d8b7967451c5331a0e034e13a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


 Code Cleanup in DFSInputStream
 --

 Key: HDFS-8280
 URL: https://issues.apache.org/jira/browse/HDFS-8280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8280.000.patch


 This is some code cleanup separate from HDFS-8272:
 # Avoid duplicated block reader creation code
 # If no new source DN can be found, {{getBestNodeDNAddrPair}} returns null 
 instead of throwing Exception. Whether to throw Exception or not should be 
 determined by {{getBestNodeDNAddrPair}}'s caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read)

2015-04-29 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8272:

Summary: Erasure Coding: simplify the retry logic in DFSStripedInputStream 
(stateful read)  (was: Erasure Coding: simplify the retry logic in 
DFSStripedInputStream)

 Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful 
 read)
 -

 Key: HDFS-8272
 URL: https://issues.apache.org/jira/browse/HDFS-8272
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: h8272-HDFS-7285.000.patch, h8272-HDFS-7285.001.patch


 Currently in DFSStripedInputStream the retry logic is still the same with 
 DFSInputStream. More specifically, every failed read will try to search for 
 another source node. And an exception is thrown when no new source node can 
 be identified. This logic is not appropriate for EC inputstream and can be 
 simplified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.

2015-04-29 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin reassigned HDFS-1950:


Assignee: ramtin  (was: Uma Maheswara Rao G)

 Blocks that are under construction are not getting read if the blocks are 
 more than 10. Only complete blocks are read properly. 
 

 Key: HDFS-1950
 URL: https://issues.apache.org/jira/browse/HDFS-1950
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 0.20.205.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramtin
Priority: Blocker
 Attachments: HDFS-1950-2.patch, HDFS-1950.1.patch, 
 hdfs-1950-0.20-append-tests.txt, hdfs-1950-trunk-test.txt, 
 hdfs-1950-trunk-test.txt


 Before going to the root cause lets see the read behavior for a file having 
 more than 10 blocks in append case.. 
 Logic: 
  
 There is prefetch size dfs.read.prefetch.size for the DFSInputStream which 
 has default value of 10 
 This prefetch size is the number of blocks that the client will fetch from 
 the namenode for reading a file.. 
 For example lets assume that a file X having 22 blocks is residing in HDFS 
 The reader first fetches first 10 blocks from the namenode and start reading 
 After the above step , the reader fetches the next 10 blocks from NN and 
 continue reading 
 Then the reader fetches the remaining 2 blocks from NN and complete the write 
 Cause: 
 === 
 Lets see the cause for this issue now... 
 Scenario that will fail is Writer wrote 10+ blocks and a partial block and 
 called sync. Reader trying to read the file will not get the last partial 
 block . 
 Client first gets the 10 block locations from the NN. Now it checks whether 
 the file is under construction and if so it gets the size of the last partial 
 block from datanode and reads the full file 
 However when the number of blocks is more than 10, the last block will not be 
 in the first fetch. It will be in the second or other blocks(last block will 
 be in (num of blocks / 10)th fetch) 
 The problem now is, in DFSClient there is no logic to get the size of the 
 last partial block(as in case of point 1), for the rest of the fetches other 
 than first fetch, the reader will not be able to read the complete data 
 synced...!! 
 also the InputStream.available api uses the first fetched block size to 
 iterate. Ideally this size has to be increased



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7281) Missing block is marked as corrupted block

2015-04-29 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7281:
--
Attachment: HDFS-7281-6.patch

Thanks [~yzhangal]. Here is the updated patch.

 Missing block is marked as corrupted block
 --

 Key: HDFS-7281
 URL: https://issues.apache.org/jira/browse/HDFS-7281
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
  Labels: supportability
 Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, 
 HDFS-7281-5.patch, HDFS-7281-6.patch, HDFS-7281.patch


 In the situation where the block lost all its replicas, fsck shows the block 
 is missing as well as corrupted. Perhaps it is better not to mark the block 
 corrupted in this case. The reason it is marked as corrupted is 
 numCorruptNodes == numNodes == 0 in the following code.
 {noformat}
 BlockManager
 final boolean isCorrupt = numCorruptNodes == numNodes;
 {noformat}
 Would like to clarify if it is the intent to mark missing block as corrupted 
 or it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read)

2015-04-29 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8272:

Attachment: HDFS-8272.002.patch

Thanks for the review, Zhe and Yi! I just merged HDFS-8280 into HDFS-7285 
branch. Upload the patch excluding the changes in DFSInputStream. Also fix the 
bug pointed out by Yi.

About the {{seekToBlockSource}}, I think it may be better to remove it by now:
# With decoding functionality we do not need to spend more time trying our luck 
on the same DN.
# Currently calling {{seekToBlockSource}} will cause all the current block 
readers to be closed (since {{blockSeekTo}} will call 
{{closeCurrentBlockReaders}}). To fix this we need to add extra complexity so 
as to make sure only one block reader is retried.

 Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful 
 read)
 -

 Key: HDFS-8272
 URL: https://issues.apache.org/jira/browse/HDFS-8272
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8272.002.patch, h8272-HDFS-7285.000.patch, 
 h8272-HDFS-7285.001.patch


 Currently in DFSStripedInputStream the retry logic is still the same with 
 DFSInputStream. More specifically, every failed read will try to search for 
 another source node. And an exception is thrown when no new source node can 
 be identified. This logic is not appropriate for EC inputstream and can be 
 simplified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8283) DataStreamer cleanup and some minor improvement

2015-04-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519802#comment-14519802
 ] 

Jing Zhao commented on HDFS-8283:
-

Thanks for working on this, Nicholas! The patch looks pretty good to me. The 
test failure should be unrelated and it passed in my local run. +1

I will commit the patch shortly.

 DataStreamer cleanup and some minor improvement
 ---

 Key: HDFS-8283
 URL: https://issues.apache.org/jira/browse/HDFS-8283
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h8283_20150428.patch


 - When throwing an exception
 -* always set lastException 
 -* always creating a new exception so that it has the new stack trace
 - Add LOG.
 - Add final to isAppend and favoredNodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8283) DataStreamer cleanup and some minor improvement

2015-04-29 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8283:

   Resolution: Fixed
Fix Version/s: 2.8.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2.

 DataStreamer cleanup and some minor improvement
 ---

 Key: HDFS-8283
 URL: https://issues.apache.org/jira/browse/HDFS-8283
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.8.0

 Attachments: h8283_20150428.patch


 - When throwing an exception
 -* always set lastException 
 -* always creating a new exception so that it has the new stack trace
 - Add LOG.
 - Add final to isAppend and favoredNodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8288) Refactor DFSStripedOutputStream and StripedDataStreamer

2015-04-29 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8288:
-

 Summary: Refactor DFSStripedOutputStream and StripedDataStreamer
 Key: HDFS-8288
 URL: https://issues.apache.org/jira/browse/HDFS-8288
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


- DFSStripedOutputStream has a list of StripedDataStreamer(s).  The streamers 
share a data structure ListBlockingQueueLocatedBlock stripeBlocks for 
communicate located block and end block information.
For example,
{code}
//StripedDataStreamer.endBlock()
  // before retrieving a new block, transfer the finished block to
  // leading streamer
  LocatedBlock finishedBlock = new LocatedBlock(
  new ExtendedBlock(block.getBlockPoolId(), block.getBlockId(),
  block.getNumBytes(), block.getGenerationStamp()), null);
  try {
boolean offSuccess = stripedBlocks.get(0).offer(finishedBlock, 30,
TimeUnit.SECONDS);
{code}
It is unnecessary to create a LocatedBlock object for an end block since the 
locations passed is null.  Also, the return value is ignored (i.e. offSuccess 
is not used).

- DFSStripedOutputStream has another data structure cellBuffers for computing 
parity.  It should be refactored to a class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520002#comment-14520002
 ] 

Colin Patrick McCabe commented on HDFS-8113:


+1 for HDFS-8113.02.patch.  I think it's a good robustness improvement to the 
code.

It would be nice to continue the investigation about why you hit this issue in 
another jira, as [~chengbing.liu] suggested.

 NullPointerException in BlockInfoContiguous causes block report failure
 ---

 Key: HDFS-8113
 URL: https://issues.apache.org/jira/browse/HDFS-8113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: HDFS-8113.02.patch, HDFS-8113.patch


 The following copy constructor can throw NullPointerException if {{bc}} is 
 null.
 {code}
   protected BlockInfoContiguous(BlockInfoContiguous from) {
 this(from, from.bc.getBlockReplication());
 this.bc = from.bc;
   }
 {code}
 We have observed that some DataNodes keeps failing doing block reports with 
 NameNode. The stacktrace is as follows. Though we are not using the latest 
 version, the problem still exists.
 {quote}
 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint

2015-04-29 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520056#comment-14520056
 ] 

Haohui Mai commented on HDFS-8214:
--

{code}
+  if (v  0) {
+return unknown;
+  }
{code}

It might make more sense to move it to the template (i.e., {{status.html}}), as 
the function might later be superseded by moment.js.

 Secondary NN Web UI shows wrong date for Last Checkpoint
 

 Key: HDFS-8214
 URL: https://issues.apache.org/jira/browse/HDFS-8214
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS, namenode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, 
 HDFS-8214.003.patch


 SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in 
 the web UI. This causes weird times, generally, just after the epoch, to be 
 displayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8037) WebHDFS: CheckAccess silently accepts certain malformed FsActions

2015-04-29 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520064#comment-14520064
 ] 

Haohui Mai commented on HDFS-8037:
--

Good catch. Can you please add a unit test?

 WebHDFS: CheckAccess silently accepts certain malformed FsActions
 -

 Key: HDFS-8037
 URL: https://issues.apache.org/jira/browse/HDFS-8037
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Jake Low
Assignee: Walter Su
Priority: Minor
  Labels: easyfix, newbie
 Attachments: HDFS-8037.001.patch, HDFS-8037.002.patch


 WebHDFS's {{CHECKACCESS}} operation accepts a parameter called {{fsaction}}, 
 which represents the type(s) of access to check for.
 According to the documentation, and also the source code, the domain of 
 {{fsaction}} is the set of strings matched by the regex {{\[rwx-\]{3\}}}. 
 This domain is wider than the set of valid {{FsAction}} objects, because it 
 doesn't guarantee sensible ordering of access types. For example, the strings 
 {{rxw}} and {{--r}} are valid {{fsaction}} parameter values, but don't 
 correspond to valid {{FsAction}} instances.
 The result is that WebHDFS silently accepts {{fsaction}} parameter values 
 which don't match any valid {{FsAction}} instance, but doesn't actually 
 perform any permissions checking in this case.
 For example, here's a {{CHECKACCESS}} call where we request {{rw-}} access 
 on a file which we only have permission to read and execute. It raises an 
 exception, as it should.
 {code:none}
 curl -i -X GET 
 http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-x;
 HTTP/1.1 403 Forbidden
 Content-Type: application/json
 {
   RemoteException: {
 exception: AccessControlException,
 javaClassName: org.apache.hadoop.security.AccessControlException,
 message: Permission denied: user=nobody, access=READ_WRITE, 
 inode=\\/myfile\:root:supergroup:drwxr-xr-x
   }
 }
 {code}
 But if we instead request {{r-w}} access, the call appears to succeed:
 {code:none}
 curl -i -X GET 
 http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-w;
 HTTP/1.1 200 OK
 Content-Length: 0
 {code}
 As I see it, the fix would be to change the regex pattern in 
 {{FsActionParam}} to something like {{\[r-\]\[w-\]\[x-\]}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead

2015-04-29 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520083#comment-14520083
 ] 

Lei (Eddy) Xu commented on HDFS-7758:
-

Seems that the checkstyle error is caused by HADOOP-11889.

 Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
 -

 Key: HDFS-7758
 URL: https://issues.apache.org/jira/browse/HDFS-7758
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, 
 HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, 
 HDFS-7758.005.patch, HDFS-7758.006.patch


 HDFS-7496 introduced reference-counting  the volume instances being used to 
 prevent race condition when hot swapping a volume.
 However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance 
 without increasing its reference count. In this JIRA, we retire the 
 {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} 
 and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer 
 of {{FsVolume}} always has correct reference count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read)

2015-04-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520095#comment-14520095
 ] 

Jing Zhao edited comment on HDFS-8272 at 4/29/15 8:03 PM:
--

Thanks for the review, Zhe and Yi! I just merged HDFS-8280 into HDFS-7285 
branch. Upload the patch excluding the changes in DFSInputStream. Also fix the 
bug pointed out by Yi.

About the {{seekToBlockSource}}, I think it may be better to remove it:
# With decoding functionality we do not need to spend more time trying our luck 
on the same DN.
# Currently calling {{seekToBlockSource}} will cause all the current block 
readers to be closed (since {{blockSeekTo}} will call 
{{closeCurrentBlockReaders}}). To fix this we need to add extra complexity so 
as to make sure only one block reader is retried.
How about removing it by now and we can add it back if necessary in the future?


was (Author: jingzhao):
Thanks for the review, Zhe and Yi! I just merged HDFS-8280 into HDFS-7285 
branch. Upload the patch excluding the changes in DFSInputStream. Also fix the 
bug pointed out by Yi.

About the {{seekToBlockSource}}, I think it may be better to remove it by now:
# With decoding functionality we do not need to spend more time trying our luck 
on the same DN.
# Currently calling {{seekToBlockSource}} will cause all the current block 
readers to be closed (since {{blockSeekTo}} will call 
{{closeCurrentBlockReaders}}). To fix this we need to add extra complexity so 
as to make sure only one block reader is retried.

 Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful 
 read)
 -

 Key: HDFS-8272
 URL: https://issues.apache.org/jira/browse/HDFS-8272
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8272.002.patch, h8272-HDFS-7285.000.patch, 
 h8272-HDFS-7285.001.patch


 Currently in DFSStripedInputStream the retry logic is still the same with 
 DFSInputStream. More specifically, every failed read will try to search for 
 another source node. And an exception is thrown when no new source node can 
 be identified. This logic is not appropriate for EC inputstream and can be 
 simplified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8286) Scaling out the namespace using KV store

2015-04-29 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-8286:


 Summary: Scaling out the namespace using KV store
 Key: HDFS-8286
 URL: https://issues.apache.org/jira/browse/HDFS-8286
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


Currently the NN keeps the namespace in the memory. To improve the scalability 
of the namespace, users can scale up by using more RAM or scale out using 
Federation (i.e., statically partitioning the namespace).

We would like to remove the limitation of scaling the global namespace. Our 
vision is that that HDFS should adopt a scalable underlying architecture that 
allows the global namespace scales linearly.

We propose to implement the HDFS namespace on top of a key-value (KV) store. 
Adopting the KV store interfaces allows HDFS to leverage the capability of 
modern KV store and to become much easier to scale. Going forward, the 
architecture allows distributing the namespace across multiple machines, or  
storing only the working set in the memory (HDFS-5389), both of which allows  
HDFS to manage billions of files using the commodity hardware available today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.

2015-04-29 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated HDFS-1950:
-
Assignee: (was: ramtin)

 Blocks that are under construction are not getting read if the blocks are 
 more than 10. Only complete blocks are read properly. 
 

 Key: HDFS-1950
 URL: https://issues.apache.org/jira/browse/HDFS-1950
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 0.20.205.0
Reporter: ramkrishna.s.vasudevan
Priority: Blocker
 Attachments: HDFS-1950-2.patch, HDFS-1950.1.patch, 
 hdfs-1950-0.20-append-tests.txt, hdfs-1950-trunk-test.txt, 
 hdfs-1950-trunk-test.txt


 Before going to the root cause lets see the read behavior for a file having 
 more than 10 blocks in append case.. 
 Logic: 
  
 There is prefetch size dfs.read.prefetch.size for the DFSInputStream which 
 has default value of 10 
 This prefetch size is the number of blocks that the client will fetch from 
 the namenode for reading a file.. 
 For example lets assume that a file X having 22 blocks is residing in HDFS 
 The reader first fetches first 10 blocks from the namenode and start reading 
 After the above step , the reader fetches the next 10 blocks from NN and 
 continue reading 
 Then the reader fetches the remaining 2 blocks from NN and complete the write 
 Cause: 
 === 
 Lets see the cause for this issue now... 
 Scenario that will fail is Writer wrote 10+ blocks and a partial block and 
 called sync. Reader trying to read the file will not get the last partial 
 block . 
 Client first gets the 10 block locations from the NN. Now it checks whether 
 the file is under construction and if so it gets the size of the last partial 
 block from datanode and reads the full file 
 However when the number of blocks is more than 10, the last block will not be 
 in the first fetch. It will be in the second or other blocks(last block will 
 be in (num of blocks / 10)th fetch) 
 The problem now is, in DFSClient there is no logic to get the size of the 
 last partial block(as in case of point 1), for the rest of the fetches other 
 than first fetch, the reader will not be able to read the complete data 
 synced...!! 
 also the InputStream.available api uses the first fetched block size to 
 iterate. Ideally this size has to be increased



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8249) Separate HdfsConstants into the client and the server side class

2015-04-29 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-8249:
-
Attachment: HDFS-8249.001.patch

 Separate HdfsConstants into the client and the server side class
 

 Key: HDFS-8249
 URL: https://issues.apache.org/jira/browse/HDFS-8249
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8249.000.patch, HDFS-8249.001.patch


 The constants in {{HdfsConstants}} are used by both the client side and the 
 server side. There are two types of constants in the class:
 1. Constants that are used internally by the servers or not part of the APIs. 
 These constants are free to evolve without breaking compatibilities. For 
 example, {{MAX_PATH_LENGTH}} is used by the NN to enforce the length of the 
 path does not go too long. Developers are free to change the name of the 
 constants and to move it around if necessary.
 1. Constants that are used by the clients, but not parts of the APIs. For 
 example, {{QUOTA_DONT_SET}} represents an unlimited quota. The value is part 
 of the wire protocol but the value is not. Developers are free to rename the 
 constants but are not allowed to change the value of the constants.
 1. Constants that are parts of the APIs. For example, {{SafeModeAction}} is 
 used in {{DistributedFileSystem}}. Changing the name / value of the constant 
 will break binary compatibility, but not source code compatibility.
 This jira proposes to separate the above three types of constants into 
 different classes:
 * Creating a new class {{HdfsConstantsServer}} to hold the first type of 
 constants.
 * Move {{HdfsConstants}} into the {{hdfs-client}} package. The work of 
 separating the second and the third types of constants will be postponed in a 
 separate jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.

2015-04-29 Thread ramtin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramtin updated HDFS-1950:
-
Assignee: Uma Maheswara Rao G

 Blocks that are under construction are not getting read if the blocks are 
 more than 10. Only complete blocks are read properly. 
 

 Key: HDFS-1950
 URL: https://issues.apache.org/jira/browse/HDFS-1950
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 0.20.205.0
Reporter: ramkrishna.s.vasudevan
Assignee: Uma Maheswara Rao G
Priority: Blocker
 Attachments: HDFS-1950-2.patch, HDFS-1950.1.patch, 
 hdfs-1950-0.20-append-tests.txt, hdfs-1950-trunk-test.txt, 
 hdfs-1950-trunk-test.txt


 Before going to the root cause lets see the read behavior for a file having 
 more than 10 blocks in append case.. 
 Logic: 
  
 There is prefetch size dfs.read.prefetch.size for the DFSInputStream which 
 has default value of 10 
 This prefetch size is the number of blocks that the client will fetch from 
 the namenode for reading a file.. 
 For example lets assume that a file X having 22 blocks is residing in HDFS 
 The reader first fetches first 10 blocks from the namenode and start reading 
 After the above step , the reader fetches the next 10 blocks from NN and 
 continue reading 
 Then the reader fetches the remaining 2 blocks from NN and complete the write 
 Cause: 
 === 
 Lets see the cause for this issue now... 
 Scenario that will fail is Writer wrote 10+ blocks and a partial block and 
 called sync. Reader trying to read the file will not get the last partial 
 block . 
 Client first gets the 10 block locations from the NN. Now it checks whether 
 the file is under construction and if so it gets the size of the last partial 
 block from datanode and reads the full file 
 However when the number of blocks is more than 10, the last block will not be 
 in the first fetch. It will be in the second or other blocks(last block will 
 be in (num of blocks / 10)th fetch) 
 The problem now is, in DFSClient there is no logic to get the size of the 
 last partial block(as in case of point 1), for the rest of the fetches other 
 than first fetch, the reader will not be able to read the complete data 
 synced...!! 
 also the InputStream.available api uses the first fetched block size to 
 iterate. Ideally this size has to be increased



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists

2015-04-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519769#comment-14519769
 ] 

Jing Zhao commented on HDFS-8270:
-

I think we can use HDFS-6697 to make the lease soft and hard limits 
configurable, and make retry times configurable as well.

 create() always retried with hardcoded timeout when file already exists
 ---

 Key: HDFS-8270
 URL: https://issues.apache.org/jira/browse/HDFS-8270
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Andrey Stepachev
Assignee: J.Andreina

 In Hbase we stumbled on unexpected behaviour, which could 
 break things. 
 HDFS-6478 fixed wrong exception
 translation, but that apparently led to unexpected bahaviour:
 clients trying to create file without override=true will be forced
 to retry hardcoded amount of time (60 seconds).
 That could break or slowdown systems, that use filesystem
 for locks (like hbase fsck did, and we got it broken HBASE-13574).
 We should make this behaviour configurable, do client really need
 to wait lease timeout to be sure that file doesn't exists, or it it should
 be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7758) Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead

2015-04-29 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519763#comment-14519763
 ] 

Lei (Eddy) Xu commented on HDFS-7758:
-

working on fixing checkstyle reports.

 Retire FsDatasetSpi#getVolumes() and use FsDatasetSpi#getVolumeRefs() instead
 -

 Key: HDFS-7758
 URL: https://issues.apache.org/jira/browse/HDFS-7758
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7758.000.patch, HDFS-7758.001.patch, 
 HDFS-7758.002.patch, HDFS-7758.003.patch, HDFS-7758.004.patch, 
 HDFS-7758.005.patch, HDFS-7758.006.patch


 HDFS-7496 introduced reference-counting  the volume instances being used to 
 prevent race condition when hot swapping a volume.
 However, {{FsDatasetSpi#getVolumes()}} can still leak the volume instance 
 without increasing its reference count. In this JIRA, we retire the 
 {{FsDatasetSpi#getVolumes()}} and propose {{FsDatasetSpi#getVolumeRefs()}} 
 and etc. method to access {{FsVolume}}. Thus it makes sure that the consumer 
 of {{FsVolume}} always has correct reference count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime

2015-04-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519834#comment-14519834
 ] 

Jing Zhao commented on HDFS-8269:
-

The 003 patch looks pretty good to me. +1. The failed test 
TestPipelinesFailover should be unrelated.

 getBlockLocations() does not resolve the .reserved path and generates 
 incorrect edit logs when updating the atime
 -

 Key: HDFS-8269
 URL: https://issues.apache.org/jira/browse/HDFS-8269
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Haohui Mai
Priority: Blocker
 Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch, 
 HDFS-8269.002.patch, HDFS-8269.003.patch


 When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, 
 it uses the path passed from the client, which generates incorrect edit logs 
 entries:
 {noformat}
   RECORD
 OPCODEOP_TIMES/OPCODE
 DATA
   TXID5085/TXID
   LENGTH0/LENGTH
   PATH/.reserved/.inodes/18230/PATH
   MTIME-1/MTIME
   ATIME1429908236392/ATIME
 /DATA
   /RECORD
 {noformat}
 Note that the NN does not resolve the {{/.reserved}} path when processing the 
 edit log, therefore it eventually leads to a NPE when loading the edit logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8283) DataStreamer cleanup and some minor improvement

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519827#comment-14519827
 ] 

Hudson commented on HDFS-8283:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7699 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7699/])
HDFS-8283. DataStreamer cleanup and some minor improvement. Contributed by Tsz 
Wo Nicholas Sze. (jing9: rev 7947e5b53b9ac9524b535b0384c1c355b74723ff)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/MultipleIOException.java


 DataStreamer cleanup and some minor improvement
 ---

 Key: HDFS-8283
 URL: https://issues.apache.org/jira/browse/HDFS-8283
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Fix For: 2.8.0

 Attachments: h8283_20150428.patch


 - When throwing an exception
 -* always set lastException 
 -* always creating a new exception so that it has the new stack trace
 - Add LOG.
 - Add final to isAppend and favoredNodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8273) FSNamesystem#Delete() should not call logSync() when holding the lock

2015-04-29 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518497#comment-14518497
 ] 

Haohui Mai edited comment on HDFS-8273 at 4/29/15 6:14 PM:
---

I've committed the patch to trunk, branch-2 and branch-2.7. Thanks Jing for the 
reviews.


was (Author: wheat9):
I've committed the patch to trunk and branch-2. Thanks Jing for the reviews.

 FSNamesystem#Delete() should not call logSync() when holding the lock
 -

 Key: HDFS-8273
 URL: https://issues.apache.org/jira/browse/HDFS-8273
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Jing Zhao
Assignee: Haohui Mai
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8273.000.patch, HDFS-8273.001.patch


 HDFS-7573 moves the logSync call inside of the write lock by accident. We 
 should move it out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime

2015-04-29 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-8269:
-
   Resolution: Fixed
Fix Version/s: 2.7.1
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk, branch-2 and branch-2.7. Thanks Jing for the 
reviews.

 getBlockLocations() does not resolve the .reserved path and generates 
 incorrect edit logs when updating the atime
 -

 Key: HDFS-8269
 URL: https://issues.apache.org/jira/browse/HDFS-8269
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Haohui Mai
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch, 
 HDFS-8269.002.patch, HDFS-8269.003.patch


 When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, 
 it uses the path passed from the client, which generates incorrect edit logs 
 entries:
 {noformat}
   RECORD
 OPCODEOP_TIMES/OPCODE
 DATA
   TXID5085/TXID
   LENGTH0/LENGTH
   PATH/.reserved/.inodes/18230/PATH
   MTIME-1/MTIME
   ATIME1429908236392/ATIME
 /DATA
   /RECORD
 {noformat}
 Note that the NN does not resolve the {{/.reserved}} path when processing the 
 edit log, therefore it eventually leads to a NPE when loading the edit logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime

2015-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519880#comment-14519880
 ] 

Hudson commented on HDFS-8269:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7700 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7700/])
HDFS-8269. getBlockLocations() does not resolve the .reserved path and 
generates incorrect edit logs when updating the atime. Contributed by Haohui 
Mai. (wheat9: rev 3dd6395bb2448e5b178a51c864e3c9a3d12e8bc9)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGetBlockLocations.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 getBlockLocations() does not resolve the .reserved path and generates 
 incorrect edit logs when updating the atime
 -

 Key: HDFS-8269
 URL: https://issues.apache.org/jira/browse/HDFS-8269
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Haohui Mai
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch, 
 HDFS-8269.002.patch, HDFS-8269.003.patch


 When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, 
 it uses the path passed from the client, which generates incorrect edit logs 
 entries:
 {noformat}
   RECORD
 OPCODEOP_TIMES/OPCODE
 DATA
   TXID5085/TXID
   LENGTH0/LENGTH
   PATH/.reserved/.inodes/18230/PATH
   MTIME-1/MTIME
   ATIME1429908236392/ATIME
 /DATA
   /RECORD
 {noformat}
 Note that the NN does not resolve the {{/.reserved}} path when processing the 
 edit log, therefore it eventually leads to a NPE when loading the edit logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8286) Scaling out the namespace using KV store

2015-04-29 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-8286:
-
Attachment: hdfs-kv-design.pdf

The attachment outlines the architecture of the HDFS namespace over KV store. 
It describes how to encode the current namespace into KV schema, and how to 
implement existing features such as HA and snapshot under the proposed 
architecture.

One thing worth noting is that in the proposed design HDFS still keeps the 
namespace in the memory to smoothen the migration. What it means is that the 
implementation will be based on an in-memory KV store. Our preliminary 
evaluations of our prototype show that the architecture has comparable memory 
usage and performance w.r.t. HDFS today.

This jira can be seen as the Phase I implementation of HDFS-5389. In this jira 
we plan to focus on faithfully implementing the features that are available in 
HDFS today, and focusing on migrating from this architecture toward HDFS-5389 
in a later phase of implementation.

 Scaling out the namespace using KV store
 

 Key: HDFS-8286
 URL: https://issues.apache.org/jira/browse/HDFS-8286
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: hdfs-kv-design.pdf


 Currently the NN keeps the namespace in the memory. To improve the 
 scalability of the namespace, users can scale up by using more RAM or scale 
 out using Federation (i.e., statically partitioning the namespace).
 We would like to remove the limitation of scaling the global namespace. Our 
 vision is that that HDFS should adopt a scalable underlying architecture that 
 allows the global namespace scales linearly.
 We propose to implement the HDFS namespace on top of a key-value (KV) store. 
 Adopting the KV store interfaces allows HDFS to leverage the capability of 
 modern KV store and to become much easier to scale. Going forward, the 
 architecture allows distributing the namespace across multiple machines, or  
 storing only the working set in the memory (HDFS-5389), both of which allows  
 HDFS to manage billions of files using the commodity hardware available today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8287) DFSStripedOutputStream.writeChunk should not wait for writing parity

2015-04-29 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8287:
-

 Summary: DFSStripedOutputStream.writeChunk should not wait for 
writing parity 
 Key: HDFS-8287
 URL: https://issues.apache.org/jira/browse/HDFS-8287
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


When a stripping cell is full, writeChunk computes and generates parity 
packets.  It sequentially calls waitAndQueuePacket so that user client cannot 
continue to write data until it finishes.

We should allow user client to continue writing instead but not blocking it 
when writing parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8249) Separate HdfsConstants into the client and the server side class

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520261#comment-14520261
 ] 

Hadoop QA commented on HDFS-8249:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 31s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 37 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 27s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 33s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   7m 51s | The applied patch generated  
22  additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m  2s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | native |   3m 12s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 165m 51s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 16s | Tests passed in 
hadoop-hdfs-client. |
| {color:green}+1{color} | hdfs tests |   1m 41s | Tests passed in 
hadoop-hdfs-nfs. |
| {color:green}+1{color} | hdfs tests |   4m  2s | Tests passed in bkjournal. |
| | | 222m  2s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729230/HDFS-8249.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8f82970 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/checkstyle-result-diff.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| hadoop-hdfs-nfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/testrun_hadoop-hdfs-nfs.txt
 |
| bkjournal test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10453/artifact/patchprocess/testrun_bkjournal.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10453/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10453/console |


This message was automatically generated.

 Separate HdfsConstants into the client and the server side class
 

 Key: HDFS-8249
 URL: https://issues.apache.org/jira/browse/HDFS-8249
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8249.000.patch, HDFS-8249.001.patch


 The constants in {{HdfsConstants}} are used by both the client side and the 
 server side. There are two types of constants in the class:
 1. Constants that are used internally by the servers or not part of the APIs. 
 These constants are free to evolve without breaking compatibilities. For 
 example, {{MAX_PATH_LENGTH}} is used by the NN to enforce the length of the 
 path does not go too long. Developers are free to change the name of the 
 constants and to move it around if necessary.
 1. Constants that are used by the clients, but not parts of the APIs. For 
 example, {{QUOTA_DONT_SET}} represents an unlimited quota. The value is part 
 of the wire protocol but the value is not. Developers are free to rename the 
 constants but are not allowed to change the value of the constants.
 1. Constants that are parts of the APIs. For example, {{SafeModeAction}} is 
 used in {{DistributedFileSystem}}. Changing the name / value of the constant 
 will break binary compatibility, but not source code compatibility.
 This jira proposes to separate the above three types of constants into 
 different classes:
 * 

[jira] [Created] (HDFS-8290) WebHDFS calls before namesystem initialization can cause NullPointerException.

2015-04-29 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-8290:
---

 Summary: WebHDFS calls before namesystem initialization can cause 
NullPointerException.
 Key: HDFS-8290
 URL: https://issues.apache.org/jira/browse/HDFS-8290
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


The NameNode has a brief window of time when the HTTP server has been 
initialized, but the namesystem has not been initialized.  During this window, 
a WebHDFS call can cause a {{NullPointerException}}.  We can catch this 
condition and return a more meaningful error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6697) Make NN lease soft and hard limits configurable

2015-04-29 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520194#comment-14520194
 ] 

Haohui Mai commented on HDFS-6697:
--

What about changing the values through reflection? It should not be exposed in 
the configuration which might lead to compatibility concerns.

 Make NN lease soft and hard limits configurable
 ---

 Key: HDFS-6697
 URL: https://issues.apache.org/jira/browse/HDFS-6697
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma

 For testing, NameNodeAdapter allows test code to specify lease soft and hard 
 limit via setLeasePeriod directly on LeaseManager. But NamenodeProxies.java 
 still use the default values.
  
 It is useful if we can make NN lease soft and hard limit configurable via 
 Configuration. That will allow NamenodeProxies.java to use the configured 
 values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8289) DFSStripedOutputStream uses an additional rpc all to getErasureCodingInfo

2015-04-29 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-8289:
-

 Summary: DFSStripedOutputStream uses an additional rpc all to 
getErasureCodingInfo
 Key: HDFS-8289
 URL: https://issues.apache.org/jira/browse/HDFS-8289
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


{code}
// ECInfo is restored from NN just before writing striped files.
ecInfo = dfsClient.getErasureCodingInfo(src);
{code}
The rpc call above can be avoided by adding ECSchema to HdfsFileStatus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7559) Create unit test to automatically compare HDFS related classes and hdfs-default.xml

2015-04-29 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated HDFS-7559:
-
Labels: BB2015-05-TBR supportability  (was: supportability)

 Create unit test to automatically compare HDFS related classes and 
 hdfs-default.xml
 ---

 Key: HDFS-7559
 URL: https://issues.apache.org/jira/browse/HDFS-7559
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: BB2015-05-TBR, supportability
 Attachments: HDFS-7559.001.patch, HDFS-7559.002.patch, 
 HDFS-7559.003.patch, HDFS-7559.004.patch


 Create a unit test that will automatically compare the fields in the various 
 HDFS related classes and hdfs-default.xml. It should throw an error if a 
 property is missing in either the class or the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7559) Create unit test to automatically compare HDFS related classes and hdfs-default.xml

2015-04-29 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520301#comment-14520301
 ] 

Ray Chiang commented on HDFS-7559:
--

Both failed tests pass in my tree.

 Create unit test to automatically compare HDFS related classes and 
 hdfs-default.xml
 ---

 Key: HDFS-7559
 URL: https://issues.apache.org/jira/browse/HDFS-7559
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Attachments: HDFS-7559.001.patch, HDFS-7559.002.patch, 
 HDFS-7559.003.patch, HDFS-7559.004.patch


 Create a unit test that will automatically compare the fields in the various 
 HDFS related classes and hdfs-default.xml. It should throw an error if a 
 property is missing in either the class or the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8290) WebHDFS calls before namesystem initialization can cause NullPointerException.

2015-04-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-8290:

Attachment: HDFS-8290.001.patch

I'm attaching a patch that adds a null check and a test.

 WebHDFS calls before namesystem initialization can cause NullPointerException.
 --

 Key: HDFS-8290
 URL: https://issues.apache.org/jira/browse/HDFS-8290
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-8290.001.patch


 The NameNode has a brief window of time when the HTTP server has been 
 initialized, but the namesystem has not been initialized.  During this 
 window, a WebHDFS call can cause a {{NullPointerException}}.  We can catch 
 this condition and return a more meaningful error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8290) WebHDFS calls before namesystem initialization can cause NullPointerException.

2015-04-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-8290:

Status: Patch Available  (was: Open)

 WebHDFS calls before namesystem initialization can cause NullPointerException.
 --

 Key: HDFS-8290
 URL: https://issues.apache.org/jira/browse/HDFS-8290
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-8290.001.patch


 The NameNode has a brief window of time when the HTTP server has been 
 initialized, but the namesystem has not been initialized.  During this 
 window, a WebHDFS call can cause a {{NullPointerException}}.  We can catch 
 this condition and return a more meaningful error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8290) WebHDFS calls before namesystem initialization can cause NullPointerException.

2015-04-29 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520358#comment-14520358
 ] 

Jakob Homan commented on HDFS-8290:
---

+1.

 WebHDFS calls before namesystem initialization can cause NullPointerException.
 --

 Key: HDFS-8290
 URL: https://issues.apache.org/jira/browse/HDFS-8290
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-8290.001.patch


 The NameNode has a brief window of time when the HTTP server has been 
 initialized, but the namesystem has not been initialized.  During this 
 window, a WebHDFS call can cause a {{NullPointerException}}.  We can catch 
 this condition and return a more meaningful error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-8291) Modify NN WebUI to display correct unit

2015-04-29 Thread Zhongyi Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-8291 started by Zhongyi Xie.
-
 Modify NN WebUI to display correct unit 
 

 Key: HDFS-8291
 URL: https://issues.apache.org/jira/browse/HDFS-8291
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhongyi Xie
Assignee: Zhongyi Xie
Priority: Minor

 NN Web UI displays its capacity and usage in TB, but it is actually TiB. We 
 should either change the unit name or the calculation to ensure it follows 
 standards.
 http://en.wikipedia.org/wiki/Tebibyte
 http://en.wikipedia.org/wiki/Terabyte



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8174) Update replication count to live rep count in fsck report

2015-04-29 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520453#comment-14520453
 ] 

Ming Ma commented on HDFS-8174:
---

Thanks [~andreina]. LGTM.

 Update replication count to live rep count in fsck report
 -

 Key: HDFS-8174
 URL: https://issues.apache.org/jira/browse/HDFS-8174
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Minor
 Attachments: HDFS-8174.1.patch


 When one of the replica is decommissioned , fetching fsck report gives repl 
 count is one less than the total replica information displayed. 
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 Update the description from rep to Live_rep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality

2015-04-29 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7678:

Attachment: (was: HDFS-7678-HDFS-7285.004.patch)

 Erasure coding: DFSInputStream with decode functionality
 

 Key: HDFS-7678
 URL: https://issues.apache.org/jira/browse/HDFS-7678
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Li Bo
Assignee: Zhe Zhang
 Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, 
 HDFS-7678-HDFS-7285.003.patch, HDFS-7678.000.patch, HDFS-7678.001.patch


 A block group reader will read data from BlockGroup no matter in striping 
 layout or contiguous layout. The corrupt blocks can be known before 
 reading(told by namenode), or just be found during reading. The block group 
 reader needs to do decoding work when some blocks are found corrupt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality

2015-04-29 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7678:

Attachment: HDFS-7678-HDFS-7285.004.patch

 Erasure coding: DFSInputStream with decode functionality
 

 Key: HDFS-7678
 URL: https://issues.apache.org/jira/browse/HDFS-7678
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Li Bo
Assignee: Zhe Zhang
 Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, 
 HDFS-7678-HDFS-7285.003.patch, HDFS-7678-HDFS-7285.004.patch, 
 HDFS-7678.000.patch, HDFS-7678.001.patch


 A block group reader will read data from BlockGroup no matter in striping 
 layout or contiguous layout. The corrupt blocks can be known before 
 reading(told by namenode), or just be found during reading. The block group 
 reader needs to do decoding work when some blocks are found corrupt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8249) Separate HdfsConstants into the client and the server side class

2015-04-29 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-8249:
-
Attachment: HDFS-8249.002.patch

 Separate HdfsConstants into the client and the server side class
 

 Key: HDFS-8249
 URL: https://issues.apache.org/jira/browse/HDFS-8249
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8249.000.patch, HDFS-8249.001.patch, 
 HDFS-8249.002.patch


 The constants in {{HdfsConstants}} are used by both the client side and the 
 server side. There are two types of constants in the class:
 1. Constants that are used internally by the servers or not part of the APIs. 
 These constants are free to evolve without breaking compatibilities. For 
 example, {{MAX_PATH_LENGTH}} is used by the NN to enforce the length of the 
 path does not go too long. Developers are free to change the name of the 
 constants and to move it around if necessary.
 1. Constants that are used by the clients, but not parts of the APIs. For 
 example, {{QUOTA_DONT_SET}} represents an unlimited quota. The value is part 
 of the wire protocol but the value is not. Developers are free to rename the 
 constants but are not allowed to change the value of the constants.
 1. Constants that are parts of the APIs. For example, {{SafeModeAction}} is 
 used in {{DistributedFileSystem}}. Changing the name / value of the constant 
 will break binary compatibility, but not source code compatibility.
 This jira proposes to separate the above three types of constants into 
 different classes:
 * Creating a new class {{HdfsConstantsServer}} to hold the first type of 
 constants.
 * Move {{HdfsConstants}} into the {{hdfs-client}} package. The work of 
 separating the second and the third types of constants will be postponed in a 
 separate jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7348) Erasure Coding: striped block recovery

2015-04-29 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520589#comment-14520589
 ] 

Zhe Zhang commented on HDFS-7348:
-

Thanks for the discussion Yi and Bo.

On the write path:
# I wonder if we should have a fast track for the most common case, where the 
DN receiving EC command is the final destination? In this case, this DN should 
just create a local block and write to it.
# If we decide to have such a fast track, then it seems natural to use that 
code to store a copy of all reconstructed blocks first. Then we can use 
existing {{DataNode#DataTransfer}} to push them out. Yi mentioned several 
drawbacks of storing a reconstructed block on disk before sending it out: i) 
performance; ii) disk space; iii) management; iv) calculate crc. The 
performance and disk usage overheads are still valid concerns even if we have a 
fast track code mentioned above. So how about split out the current logic of 
transferring to remote targets (e.g., {{transferCells2Targets}}) as a separate 
JIRA (recovering multiple missing blocks)? Of course that's assuming we do 
want to have a fast track for recovering single block locally.

On the read path:
# bq. (read entire blocks and then decode) It's big issue for memory, 
especially there may be multiple stripe block recovery at the same time.
Yes I agree.. So block size is too large as the sync-and-decode unit and I 
think cell size is too small for that purpose. I think it's reasonable to use a 
few 100MB's of memory for recovery. So how about setting the default as 32MB or 
64MB? Assuming 6+3 schema that will be 300~600MB of memory usage. And we only 
need to create block reader 2~4 times to each source.
# Sequential vs. parallel reading is a hard decision. Since the current code is 
in parallel mode we should probably keep it that way in this stage, and add the 
other mode (like Bo suggested, Fast and Slow modes) later if needed.

 Erasure Coding: striped block recovery
 --

 Key: HDFS-7348
 URL: https://issues.apache.org/jira/browse/HDFS-7348
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Kai Zheng
Assignee: Yi Liu
 Attachments: ECWorker.java, HDFS-7348.001.patch


 This JIRA is to recover one or more missed striped block in the striped block 
 group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block

2015-04-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520594#comment-14520594
 ] 

Hadoop QA commented on HDFS-7281:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | javac |   7m 28s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   5m 26s | The applied patch generated  1 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m  7s | The patch appears to introduce 1 
new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 226m 22s | Tests failed in hadoop-hdfs. |
| | | 272m 36s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
|  |  Class org.apache.hadoop.hdfs.DataStreamer$LastException is not derived 
from an Exception, even though it is named as such  At DataStreamer.java:from 
an Exception, even though it is named as such  At DataStreamer.java:[lines 
177-201] |
| Failed unit tests | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.server.namenode.TestDeleteRace |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation |
|   | hadoop.hdfs.TestFileLengthOnClusterRestart |
|   | hadoop.cli.TestHDFSCLI |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.TestQuota |
|   | hadoop.hdfs.TestClose |
|   | hadoop.hdfs.TestMultiThreadedHflush |
|   | hadoop.hdfs.server.datanode.TestBlockRecovery |
|   | hadoop.hdfs.TestDFSOutputStream |
| Timed out tests | 
org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache |
|   | org.apache.hadoop.hdfs.TestDataTransferProtocol |
|   | org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729274/HDFS-7281-6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3dd6395 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10454/artifact/patchprocess/checkstyle-result-diff.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10454/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10454/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10454/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10454/console |


This message was automatically generated.

 Missing block is marked as corrupted block
 --

 Key: HDFS-7281
 URL: https://issues.apache.org/jira/browse/HDFS-7281
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
  Labels: supportability
 Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, 
 HDFS-7281-5.patch, HDFS-7281-6.patch, HDFS-7281.patch


 In the situation where the block lost all its replicas, fsck shows the block 
 is missing as well as corrupted. Perhaps it is better not to mark the block 
 corrupted in this case. The reason it is marked as corrupted is 
 numCorruptNodes == numNodes == 0 in the following code.
 {noformat}
 BlockManager
 final boolean isCorrupt = numCorruptNodes == numNodes;
 {noformat}
 Would like to clarify if it is the intent to mark missing block as corrupted 
 or it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

2015-04-29 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520401#comment-14520401
 ] 

Ravi Prakash commented on HDFS-7342:


I found another\(?) instance in which the lease is not recovered. This is 
reproducible easily on a pseudo-distributed single node cluster
# Before you start it helps if you set. This is not necessary, but simply 
reduces how long you have to wait 
{code}
  public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
  public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD;
{code}
# Client starts to write a file. (could be less than 1 block, but it hflushed 
so some of the data has landed on the datanodes) (I'm copying the client code I 
am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
# Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) 
process after it has printed Wrote to the bufferedWriter
# Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
only 1)

I believe the lease should be recovered and the block should be marked missing. 
However this is not happening. The lease is never recovered. I am going to 
check what happens when only the primary datanode is shot. {color:red}Please 
let me know if I shouldn't hijack this JIRA. By default I will{color}

{code:title=TestHadoop.java|borderStyle=solid}
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class TestHadoop {
  public static void main(String args[]) throws IOException, 
InterruptedException {
Path path = new Path(/tmp/testHadoop);
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
System.out.println(DefaultFS:  + conf.get(fs.defaultFS));
System.out.flush();
FSDataOutputStream hdfsout = fs.create(path,true);
BufferedWriter br=new BufferedWriter(new OutputStreamWriter(hdfsout));
System.out.println(Created the bufferedWriter );
System.out.flush();
br.write(Some string);
br.flush();
hdfsout.hflush();
System.out.println(Wrote to the bufferedWriter );
System.out.flush();

Thread.sleep(12); //KILL THE PROCESS DURING THIS SLEEP
br.close();
System.out.println(Closed the bufferedWriter );
System.out.flush();
  }
}
{code}



 Lease Recovery doesn't happen some times
 

 Key: HDFS-7342
 URL: https://issues.apache.org/jira/browse/HDFS-7342
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Ravi Prakash
 Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch


 In some cases, LeaseManager tries to recover a lease, but is not able to. 
 HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7342) Lease Recovery doesn't happen some times

2015-04-29 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-7342:
---
Target Version/s: 2.8.0  (was: 2.6.1)

 Lease Recovery doesn't happen some times
 

 Key: HDFS-7342
 URL: https://issues.apache.org/jira/browse/HDFS-7342
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Ravi Prakash
 Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch


 In some cases, LeaseManager tries to recover a lease, but is not able to. 
 HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8291) Modify NN WebUI to display correct unit

2015-04-29 Thread Zhongyi Xie (JIRA)
Zhongyi Xie created HDFS-8291:
-

 Summary: Modify NN WebUI to display correct unit 
 Key: HDFS-8291
 URL: https://issues.apache.org/jira/browse/HDFS-8291
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhongyi Xie
Assignee: Zhongyi Xie
Priority: Minor


NN Web UI displays its capacity and usage in TB, but it is actually TiB. We 
should either change the unit name or the calculation to ensure it follows 
standards.
http://en.wikipedia.org/wiki/Tebibyte
http://en.wikipedia.org/wiki/Terabyte



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8291) Modify NN WebUI to display correct unit

2015-04-29 Thread Zhongyi Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongyi Xie updated HDFS-8291:
--
Status: Open  (was: Patch Available)

 Modify NN WebUI to display correct unit 
 

 Key: HDFS-8291
 URL: https://issues.apache.org/jira/browse/HDFS-8291
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhongyi Xie
Assignee: Zhongyi Xie
Priority: Minor

 NN Web UI displays its capacity and usage in TB, but it is actually TiB. We 
 should either change the unit name or the calculation to ensure it follows 
 standards.
 http://en.wikipedia.org/wiki/Tebibyte
 http://en.wikipedia.org/wiki/Terabyte



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality

2015-04-29 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7678:

Attachment: HDFS-7678-HDFS-7285.003.patch

Attaching new patch which based on Andrew's comments.

# An overall timeout is enforced
# All data fetching happens in a single loop, leveraging Yi's idea under 
HDFS-7348
# It also refactors shared striped reading logic (among client and DN) to the 
util class. [~andrew.wang] / [~hitliuyi] could you take a look at the changes 
in {{StripedBlockUtil}}? If that part looks OK I'll split it to HDFS-8282 and 
get it in first, so this client decode JIRA doesn't block HDFS-7348.

 Erasure coding: DFSInputStream with decode functionality
 

 Key: HDFS-7678
 URL: https://issues.apache.org/jira/browse/HDFS-7678
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Li Bo
Assignee: Zhe Zhang
 Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, 
 HDFS-7678-HDFS-7285.003.patch, HDFS-7678.000.patch, HDFS-7678.001.patch


 A block group reader will read data from BlockGroup no matter in striping 
 layout or contiguous layout. The corrupt blocks can be known before 
 reading(told by namenode), or just be found during reading. The block group 
 reader needs to do decoding work when some blocks are found corrupt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful read)

2015-04-29 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520458#comment-14520458
 ] 

Zhe Zhang commented on HDFS-8272:
-

The rest of the patch LGTM, +1 and thanks Jing for the contribution! I just 
committed it to the branch.

 Erasure Coding: simplify the retry logic in DFSStripedInputStream (stateful 
 read)
 -

 Key: HDFS-8272
 URL: https://issues.apache.org/jira/browse/HDFS-8272
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8272.002.patch, h8272-HDFS-7285.000.patch, 
 h8272-HDFS-7285.001.patch


 Currently in DFSStripedInputStream the retry logic is still the same with 
 DFSInputStream. More specifically, every failed read will try to search for 
 another source node. And an exception is thrown when no new source node can 
 be identified. This logic is not appropriate for EC inputstream and can be 
 simplified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >