date:20150529


[ 
https://issues.apache.org/jira/browse/HDFS-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564343#comment-14564343
 ] 

Hadoop QA commented on HDFS-8490:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 37s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 10s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 18s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 162m  9s | Tests passed in hadoop-hdfs. 
|
| | | 208m  7s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736041/HDFS-8490.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d725dd8 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11159/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11159/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11159/console |


This message was automatically generated.

 Typo in trace enabled log in WebHDFS exception handler
 --

 Key: HDFS-8490
 URL: https://issues.apache.org/jira/browse/HDFS-8490
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Reporter: Jakob Homan
Assignee: Archana T
Priority: Trivial
  Labels: newbie
 Attachments: HDFS-8490.patch


 /hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/web/webhdfs/ExceptionHandler.java:
 {code}  static DefaultFullHttpResponse exceptionCaught(Throwable cause) {
 Exception e = cause instanceof Exception ? (Exception) cause : new 
 Exception(cause);
 if (LOG.isTraceEnabled()) {
   LOG.trace(GOT EXCEPITION, e);
 }{code}
 EXCEPITION is a typo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks


[ 
https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564381#comment-14564381
 ] 

Hadoop QA commented on HDFS-8489:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 30s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 18s | The applied patch generated  2 
new checkstyle issues (total was 687, now 685). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 21s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 22s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 163m 33s | Tests passed in hadoop-hdfs. 
|
| | | 211m 38s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736043/HDFS-8489.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d725dd8 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11160/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11160/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11160/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11160/console |


This message was automatically generated.

 Subclass BlockInfo to represent contiguous blocks
 -

 Key: HDFS-8489
 URL: https://issues.apache.org/jira/browse/HDFS-8489
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch


 As second step of the cleanup, we should make {{BlockInfo}} an abstract class 
 and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The 
 patch should clearly separate where to use the abstract class versus the 
 subclass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8450) Erasure Coding: Consolidate erasure coding zone related implementation into a single class


 [ 
https://issues.apache.org/jira/browse/HDFS-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8450:
---
Attachment: HDFS-8450-HDFS-7285-03.patch

 Erasure Coding: Consolidate erasure coding zone related implementation into a 
 single class
 --

 Key: HDFS-8450
 URL: https://issues.apache.org/jira/browse/HDFS-8450
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8450-HDFS-7285-00.patch, 
 HDFS-8450-HDFS-7285-01.patch, HDFS-8450-HDFS-7285-02.patch, 
 HDFS-8450-HDFS-7285-03.patch


 The idea is to follow the same pattern suggested by HDFS-7416. It is good  to 
 consolidate all the erasure coding zone related implementations of 
 {{FSNamesystem}}. Here, proposing {{FSDirErasureCodingZoneOp}} class to have 
 functions to perform related erasure coding zone operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8450) Erasure Coding: Consolidate erasure coding zone related implementation into a single class


[ 
https://issues.apache.org/jira/browse/HDFS-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564425#comment-14564425
 ] 

Rakesh R commented on HDFS-8450:


Attached another patch addressing [~drankye] comments.

 Erasure Coding: Consolidate erasure coding zone related implementation into a 
 single class
 --

 Key: HDFS-8450
 URL: https://issues.apache.org/jira/browse/HDFS-8450
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8450-HDFS-7285-00.patch, 
 HDFS-8450-HDFS-7285-01.patch, HDFS-8450-HDFS-7285-02.patch, 
 HDFS-8450-HDFS-7285-03.patch


 The idea is to follow the same pattern suggested by HDFS-7416. It is good  to 
 consolidate all the erasure coding zone related implementations of 
 {{FSNamesystem}}. Here, proposing {{FSDirErasureCodingZoneOp}} class to have 
 functions to perform related erasure coding zone operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8497) ErasureCodingWorker fails to do decode work

2015-05-29 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564334#comment-14564334
 ] 

Yi Liu commented on HDFS-8497:
--

Hi Bo, the decode workaround is removed in HDFS-8328.

 ErasureCodingWorker fails to do decode work
 ---

 Key: HDFS-8497
 URL: https://issues.apache.org/jira/browse/HDFS-8497
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8497-HDFS-7285-01.patch


 When I run the unit test in HDFS-8449, it fails due to the decode error in 
 ErasureCodingWorker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7609) startup used too much time to load edits


[ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564340#comment-14564340
 ] 

Hadoop QA commented on HDFS-7609:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 51s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 14s | The applied patch generated  2 
new checkstyle issues (total was 321, now 321). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 17s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 162m 52s | Tests passed in hadoop-hdfs. 
|
| | | 209m 13s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736039/HDFS-7609-3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d725dd8 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11158/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11158/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11158/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11158/console |


This message was automatically generated.

 startup used too much time to load edits
 

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
Assignee: Ming Ma
  Labels: BB2015-05-RFC
 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, 
 HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer

2015-05-29 Thread Walter Su (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564433#comment-14564433
 ] 

Walter Su commented on HDFS-8254:
-

This case passed.
{code}
  @Test(timeout=12)
  public void testDatanodeFailure3() {
final int length = NUM_DATA_BLOCKS*BLOCK_SIZE -1;
  ...
{code}

This case failed.
{code}
  @Test(timeout=12)
  public void testDatanodeFailure3() {
final int length = NUM_DATA_BLOCKS*BLOCK_SIZE;
  ...
{code}

Fix
{code}
  private long getCurrentSumBytes() {
long sum = 0;
for (int i = 0; i  numDataBlocks; i++) {
+  if(streamers.get(i).isFailed()){
+continue;
+  }   
  System.out.println(streamers.get(i).getBytesCurBlock());
  sum += streamers.get(i).getBytesCurBlock();
}   
return sum;
  }
{code}

cause
{{BytesCurBlock}} of the failed streamer isn't 0. When last stripe is full. We 
call {{writeParityCells()}} twice.

To [~zhz]:
bq.  It also looks like we could run into a race condition if 2 streamers enter 
locateFollowingBlock around the same time? 
I think it won't be an issue. Cause MultipleBlockingQueue.poll(..) has 
{{synchronized(queues)}}

 In StripedDataStreamer, it is hard to tolerate datanode failure in the 
 leading streamer
 ---

 Key: HDFS-8254
 URL: https://issues.apache.org/jira/browse/HDFS-8254
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8254_20150526.patch, h8254_20150526b.patch


 StripedDataStreamer javadoc is shown below.
 {code}
  * The StripedDataStreamer class is used by {@link DFSStripedOutputStream}.
  * There are two kinds of StripedDataStreamer, leading streamer and ordinary
  * stream. Leading streamer requests a block group from NameNode, unwraps
  * it to located blocks and transfers each located block to its corresponding
  * ordinary streamer via a blocking queue.
 {code}
 Leading streamer is the streamer with index 0.  When the datanode of the 
 leading streamer fails, the other steamers cannot continue since no one will 
 request a block group from NameNode anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8453) Erasure coding: properly handle start offset for internal blocks in a block group


 [ 
https://issues.apache.org/jira/browse/HDFS-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8453:

Summary: Erasure coding: properly handle start offset for internal blocks 
in a block group  (was: Erasure coding: properly assign start offset for 
internal blocks in a block group)

 Erasure coding: properly handle start offset for internal blocks in a block 
 group
 -

 Key: HDFS-8453
 URL: https://issues.apache.org/jira/browse/HDFS-8453
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8453-HDFS-7285.00.patch


 {code}
   void actualGetFromOneDataNode(final DNAddrPair datanode,
 ...
   LocatedBlock block = getBlockAt(blockStartOffset);
 ...
   fetchBlockAt(block.getStartOffset());
 {code}
 The {{blockStartOffset}} here is from inner block. For parity blocks, the 
 offset will overlap with the next block group, and we may end up with 
 fetching wrong block. So we have to assign a meaningful start offset for 
 internal blocks in a block group, especially for parity blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8425) [umbrella] Performance tuning and bug fixing for system tests for EC feature


 [ 
https://issues.apache.org/jira/browse/HDFS-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8425:

Summary: [umbrella] Performance tuning and bug fixing for system tests for 
EC feature  (was: [umbrella] Bug fixing for System tests for EC feature)

 [umbrella] Performance tuning and bug fixing for system tests for EC feature
 

 Key: HDFS-8425
 URL: https://issues.apache.org/jira/browse/HDFS-8425
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: GAO Rui

 This jira is {{umbrella}} jira for bug fixing of  System tests for EC 
 feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8497) ErasureCodingWorker fails to do decode work

2015-05-29 Thread Li Bo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8497:

Attachment: HDFS-8497-HDFS-7285-01.patch

The unit test is another test for recovery work of datanode, I think we can add 
 it to branch before HDFS-8497

 ErasureCodingWorker fails to do decode work
 ---

 Key: HDFS-8497
 URL: https://issues.apache.org/jira/browse/HDFS-8497
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8497-HDFS-7285-01.patch


 When I run the unit test in HDFS-8449, it fails due to the decode error in 
 ErasureCodingWorker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads

2015-05-29 Thread zhouyingchao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhouyingchao updated HDFS-8496:
---
Attachment: HDFS-8496-001.patch

 Calling stopWriter() with FSDatasetImpl lock held may  block other threads
 --

 Key: HDFS-8496
 URL: https://issues.apache.org/jira/browse/HDFS-8496
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: zhouyingchao
Assignee: zhouyingchao
 Attachments: HDFS-8496-001.patch


 On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and  
 heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By 
 looking at the stack, we found the calling of stopWriter() with FSDatasetImpl 
 lock blocked everything.
 Following is the heartbeat stack, as an example, to show how threads are 
 blocked by FSDatasetImpl lock:
 {code}
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
 - waiting to lock 0x0007701badc0 (a 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
 - locked 0x000770465dc0 (a java.lang.Object)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The thread which held the FSDatasetImpl lock is just sleeping to wait another 
 thread to exit in stopWriter(). The stack is:
 {code}
java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Thread.join(Thread.java:1194)
 - locked 0x0007636953b8 (a org.apache.hadoop.util.Daemon)
 at 
 org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026)
 - locked 0x0007701badc0 (a 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 In this case, we deployed quite a lot other workloads on the DN, the local 
 file system and disk is quite busy. We guess this is why the stopWriter took 
 quite a long time.
 Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl 
 lock held.   In HDFS-7999, the createTemporary() is changed to call 
 stopWriter without FSDatasetImpl lock. We guess we should do so in the other 
 three methods: recoverClose()/recoverAppend/recoverRbw().
 I'll try to finish a patch for this today. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8497) ErasureCodingWorker fails to do decode work

2015-05-29 Thread Li Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564252#comment-14564252
 ] 

Li Bo commented on HDFS-8497:
-

error correct: before HDFS-8449

 ErasureCodingWorker fails to do decode work
 ---

 Key: HDFS-8497
 URL: https://issues.apache.org/jira/browse/HDFS-8497
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8497-HDFS-7285-01.patch


 When I run the unit test in HDFS-8449, it fails due to the decode error in 
 ErasureCodingWorker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


 [ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8481:

Attachment: HDFS-8481-HDFS-7285.03.patch

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, 
 HDFS-8481-HDFS-7285.03.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


[ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564260#comment-14564260
 ] 

Zhe Zhang commented on HDFS-8481:
-

Thanks Kai and Walter for the comments.

The new patch moves the decoder to the {{DFSStripedInputStream}} level.

bq. Assume we has a 768mb file (128mb * 6) which exactly contains 1 block 
group. We lost one block so we have to decode until 768mb data has been read.
This is a good point. But to address this issue we need some nontrivial logic 
to call {{decode()}} multiple times. I suggest we do this optimization as a 
follow-on under HDFS-8031. Per Walter's suggestion above, we can also think of 
a better way to abstract {{decodeAndFillBuffer}} in that follow-on JIRA (it 
will be easier when both client and DN codes are stabilized).

Let me know if the new patch looks good to you in respect of removing the 
decoding workaround.

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes


[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564317#comment-14564317
 ] 

Hadoop QA commented on HDFS-6440:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 24 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m  1s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   4m  2s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 59s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  23m 25s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 168m 33s | Tests failed in hadoop-hdfs. |
| {color:red}-1{color} | hdfs tests |   0m 18s | Tests failed in bkjournal. |
| | | 247m  4s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestEncryptedTransfer |
| Timed out tests | org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache |
| Failed build | bkjournal |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736032/hdfs-6440-trunk-v7.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d725dd8 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11157/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11157/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11157/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| bkjournal test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11157/artifact/patchprocess/testrun_bkjournal.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11157/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11157/console |


This message was automatically generated.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8450) Erasure Coding: Consolidate erasure coding zone related implementation into a single class


[ 
https://issues.apache.org/jira/browse/HDFS-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564597#comment-14564597
 ] 

Hadoop QA commented on HDFS-8450:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 28s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 37s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 41s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 55s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m  3s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 57s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 105m 47s | Tests failed in hadoop-hdfs. |
| | | 153m  6s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.TestEncryptedTransfer |
|   | hadoop.hdfs.server.namenode.TestAuditLogs |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.TestDFSStripedInputStream |
| Timed out tests | org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736087/HDFS-8450-HDFS-7285-03.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 1299357 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11163/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11163/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11163/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11163/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11163/console |


This message was automatically generated.

 Erasure Coding: Consolidate erasure coding zone related implementation into a 
 single class
 --

 Key: HDFS-8450
 URL: https://issues.apache.org/jira/browse/HDFS-8450
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8450-HDFS-7285-00.patch, 
 HDFS-8450-HDFS-7285-01.patch, HDFS-8450-HDFS-7285-02.patch, 
 HDFS-8450-HDFS-7285-03.patch


 The idea is to follow the same pattern suggested by HDFS-7416. It is good  to 
 consolidate all the erasure coding zone related implementations of 
 {{FSNamesystem}}. Here, proposing {{FSDirErasureCodingZoneOp}} class to have 
 functions to perform related erasure coding zone operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-05-29 Thread kanaka kumar avvaru (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564618#comment-14564618
]

kanaka kumar avvaru commented on HDFS-7240:
---

Very interesting to follow [~jnp], we also have some requirements to support
Trillion level small objects/files.
We will be intrested to contibute for OZone development. Can you please invite
me also to the webex meeting?

For now I have few comments on the this Project

Practically partitioning may be difficult to be controlled by Storage Layer
alone, as distribution depends on key construction applications.
So, bucket partitioner classes can be a input while creating a bucket so that
applications can handle the partitions well.

Object level Metadata would be required such as tags/labels which can be used
by computing jobs as additional info(similar to xaatributes on file)

What is the plan for leveldbjni content file persistence has any concept like
WAL for reliability is planned?
When how does the leveldbjni content will be replicated?

As millions are buckets are expected, is Partitioning for buckets is also
required based on volume name?

Swift AWS S3 support supports Object versions and replace. Does OZone also
plan for the same?

Missing feature like multi part loading,heavy object/storage space splits etc,,
also can be pooled in the coming phases( may be phase 2 or later)

We can also add readable snap shots of a bucket in the features queue? (may be
at later stage of project)

As part of Transparency encryption, encryption zone at bucket level could be an
expectation from applications.

Object store in HDFS

Key: HDFS-7240
URL: https://issues.apache.org/jira/browse/HDFS-7240
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
Attachments: Ozone-architecture-v1.pdf

This jira proposes to add object store capabilities into HDFS.
As part of the federation work (HDFS-1052) we separated block storage as a
generic storage layer. Using the Block Pool abstraction, new kinds of
namespaces can be built on top of the storage layer i.e. datanodes.
In this jira I will explore building an object store using the datanode
storage, but independent of namespace metadata.
I will soon update with a detailed design document.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8407) hdfsListDirectory must set errno to 0 on success


[ 
https://issues.apache.org/jira/browse/HDFS-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564632#comment-14564632
 ] 

Hudson commented on HDFS-8407:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #212 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/212/])
HDFS-8407. libhdfs hdfsListDirectory must set errno to 0 on success (Masatake 
Iwasaki via Colin P. McCabe) (cmccabe: rev 
d2d95bfe886a7fdf9d58fd5c47ec7c0158393afb)
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/expect.h
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test_libhdfs_threaded.c
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_libhdfs_ops.c


 hdfsListDirectory must set errno to 0 on success
 

 Key: HDFS-8407
 URL: https://issues.apache.org/jira/browse/HDFS-8407
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Reporter: Juan Yu
Assignee: Masatake Iwasaki
 Fix For: 2.8.0

 Attachments: HDFS-8407.001.patch, HDFS-8407.002.patch, 
 HDFS-8407.003.patch


 The documentation says it returns NULL on error, but it could also return 
 NULL when the directory is empty.
 /** 
  * hdfsListDirectory - Get list of files/directories for a given
  * directory-path. hdfsFreeFileInfo should be called to deallocate 
 memory. 
  * @param fs The configured filesystem handle.
  * @param path The path of the directory. 
  * @param numEntries Set to the number of files/directories in path.
  * @return Returns a dynamically-allocated array of hdfsFileInfo
  * objects; NULL on error.
  */
 {code}
 hdfsFileInfo *pathList = NULL; 
 ...
 //Figure out the number of entries in that directory
 jPathListSize = (*env)-GetArrayLength(env, jPathList);
 if (jPathListSize == 0) {
 ret = 0;
 goto done;
 }
 ...
 if (ret) {
 hdfsFreeFileInfo(pathList, jPathListSize);
 errno = ret;
 return NULL;
 }
 *numEntries = jPathListSize;
 return pathList;
 {code}
 Either change the implementation to match the doc, or fix the doc to match 
 the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8429) Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread


[ 
https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564633#comment-14564633
 ] 

Hudson commented on HDFS-8429:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #212 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/212/])
HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that 
stops the thread.  (zhouyingchao via cmccabe) (cmccabe: rev 
246cefa089156a50bf086b8b1e4d4324d66dc58c)
* hadoop-common-project/hadoop-common/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java


 Avoid stuck threads if there is an error in DomainSocketWatcher that stops 
 the thread
 -

 Key: HDFS-8429
 URL: https://issues.apache.org/jira/browse/HDFS-8429
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.8.0

 Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch, 
 HDFS-8429-003.patch


 In our cluster, an application is hung when doing a short circuit read of 
 local hdfs block. By looking into the log, we found the DataNode's 
 DomainSocketWatcher.watcherThread has exited with following log:
 {code}
 ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: 
 Thread[Thread-25,5,main] terminating on unexpected exception
 java.lang.NullPointerException
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The line 463 is following code snippet:
 {code}
  try {
 for (int fd : fdSet.getAndClearReadableFds()) {
   sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet,
 fd);
 }
 {code}
 getAndClearReadableFds is a native method which will malloc an int array. 
 Since our memory is very tight, it looks like the malloc failed and a NULL 
 pointer is returned.
 The bad thing is that other threads then blocked in stack like this:
 {code}
 DataXceiver for client 
 unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for 
 operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on 
 condition [0x7f09b9856000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x0007b0174808 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 IMO, we should exit the DN so that the users can know that something go  
 wrong  and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8443) Document dfs.namenode.service.handler.count in hdfs-site.xml


[ 
https://issues.apache.org/jira/browse/HDFS-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564630#comment-14564630
 ] 

Hudson commented on HDFS-8443:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #212 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/212/])
HDFS-8443. Document dfs.namenode.service.handler.count in hdfs-site.xml. 
Contributed by J.Andreina. (aajisaka: rev 
d725dd8af682f0877cf523744d9801174b727f4e)
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Document dfs.namenode.service.handler.count in hdfs-site.xml
 

 Key: HDFS-8443
 URL: https://issues.apache.org/jira/browse/HDFS-8443
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-8443.1.patch, HDFS-8443.2.patch, HDFS-8443.3.patch


 When dfs.namenode.servicerpc-address is configured, NameNode launches an 
 extra RPC server to handle requests from non-client nodes. 
 dfs.namenode.service.handler.count specifies the number of threads for the 
 server but the parameter is not documented anywhere.
 I found a mail for asking about the parameter. 
 http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3CE0D5A619-BDEA-44D2-81EB-C32B8464133D%40gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8256) -storagepolicies , -blockId ,-replicaDetails options are missed out in usage and from documentation


[ 
https://issues.apache.org/jira/browse/HDFS-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564646#comment-14564646
 ] 

Vinayakumar B commented on HDFS-8256:
-

Seems patch not applying on latest trunk. Needs rebase.

 -storagepolicies , -blockId ,-replicaDetails  options are missed out in 
 usage and from documentation
 --

 Key: HDFS-8256
 URL: https://issues.apache.org/jira/browse/HDFS-8256
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Reporter: J.Andreina
Assignee: J.Andreina
  Labels: BB2015-05-TBR
 Attachments: HDFS-8256.2.patch, HDFS-8256.3.patch, 
 HDFS-8256_Trunk.1.patch


 -storagepolicies , -blockId ,-replicaDetails  options are missed out in 
 usage and from documentation.
 {noformat}
 Usage: hdfs fsck path [-list-corruptfileblocks | [-move | -delete | 
 -openforwrite] [-files [-blocks [-locations | -racks [-includeSnapshots] 
 [-showprogress]
 {noformat}
 Found as part of HDFS-8108.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8443) Document dfs.namenode.service.handler.count in hdfs-site.xml


[ 
https://issues.apache.org/jira/browse/HDFS-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564653#comment-14564653
 ] 

Hudson commented on HDFS-8443:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #942 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/942/])
HDFS-8443. Document dfs.namenode.service.handler.count in hdfs-site.xml. 
Contributed by J.Andreina. (aajisaka: rev 
d725dd8af682f0877cf523744d9801174b727f4e)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


 Document dfs.namenode.service.handler.count in hdfs-site.xml
 

 Key: HDFS-8443
 URL: https://issues.apache.org/jira/browse/HDFS-8443
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-8443.1.patch, HDFS-8443.2.patch, HDFS-8443.3.patch


 When dfs.namenode.servicerpc-address is configured, NameNode launches an 
 extra RPC server to handle requests from non-client nodes. 
 dfs.namenode.service.handler.count specifies the number of threads for the 
 server but the parameter is not documented anywhere.
 I found a mail for asking about the parameter. 
 http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3CE0D5A619-BDEA-44D2-81EB-C32B8464133D%40gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8407) hdfsListDirectory must set errno to 0 on success


[ 
https://issues.apache.org/jira/browse/HDFS-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564655#comment-14564655
 ] 

Hudson commented on HDFS-8407:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #942 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/942/])
HDFS-8407. libhdfs hdfsListDirectory must set errno to 0 on success (Masatake 
Iwasaki via Colin P. McCabe) (cmccabe: rev 
d2d95bfe886a7fdf9d58fd5c47ec7c0158393afb)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test_libhdfs_threaded.c
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_libhdfs_ops.c
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/expect.h
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h


 hdfsListDirectory must set errno to 0 on success
 

 Key: HDFS-8407
 URL: https://issues.apache.org/jira/browse/HDFS-8407
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Reporter: Juan Yu
Assignee: Masatake Iwasaki
 Fix For: 2.8.0

 Attachments: HDFS-8407.001.patch, HDFS-8407.002.patch, 
 HDFS-8407.003.patch


 The documentation says it returns NULL on error, but it could also return 
 NULL when the directory is empty.
 /** 
  * hdfsListDirectory - Get list of files/directories for a given
  * directory-path. hdfsFreeFileInfo should be called to deallocate 
 memory. 
  * @param fs The configured filesystem handle.
  * @param path The path of the directory. 
  * @param numEntries Set to the number of files/directories in path.
  * @return Returns a dynamically-allocated array of hdfsFileInfo
  * objects; NULL on error.
  */
 {code}
 hdfsFileInfo *pathList = NULL; 
 ...
 //Figure out the number of entries in that directory
 jPathListSize = (*env)-GetArrayLength(env, jPathList);
 if (jPathListSize == 0) {
 ret = 0;
 goto done;
 }
 ...
 if (ret) {
 hdfsFreeFileInfo(pathList, jPathListSize);
 errno = ret;
 return NULL;
 }
 *numEntries = jPathListSize;
 return pathList;
 {code}
 Either change the implementation to match the doc, or fix the doc to match 
 the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8429) Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread


[ 
https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564656#comment-14564656
 ] 

Hudson commented on HDFS-8429:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #942 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/942/])
HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that 
stops the thread.  (zhouyingchao via cmccabe) (cmccabe: rev 
246cefa089156a50bf086b8b1e4d4324d66dc58c)
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
* hadoop-common-project/hadoop-common/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java


 Avoid stuck threads if there is an error in DomainSocketWatcher that stops 
 the thread
 -

 Key: HDFS-8429
 URL: https://issues.apache.org/jira/browse/HDFS-8429
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.8.0

 Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch, 
 HDFS-8429-003.patch


 In our cluster, an application is hung when doing a short circuit read of 
 local hdfs block. By looking into the log, we found the DataNode's 
 DomainSocketWatcher.watcherThread has exited with following log:
 {code}
 ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: 
 Thread[Thread-25,5,main] terminating on unexpected exception
 java.lang.NullPointerException
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The line 463 is following code snippet:
 {code}
  try {
 for (int fd : fdSet.getAndClearReadableFds()) {
   sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet,
 fd);
 }
 {code}
 getAndClearReadableFds is a native method which will malloc an int array. 
 Since our memory is very tight, it looks like the malloc failed and a NULL 
 pointer is returned.
 The bad thing is that other threads then blocked in stack like this:
 {code}
 DataXceiver for client 
 unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for 
 operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on 
 condition [0x7f09b9856000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x0007b0174808 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 IMO, we should exit the DN so that the users can know that something go  
 wrong  and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


[ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564474#comment-14564474
 ] 

Hadoop QA commented on HDFS-8481:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m  9s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 38s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 22s | The patch appears to introduce 2 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 172m  4s | Tests failed in hadoop-hdfs. |
| | | 214m  6s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.TestEncryptedTransfer |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS |
|   | hadoop.hdfs.TestRecoverStripedFile |
|   | hadoop.hdfs.server.namenode.TestAuditLogs |
|   | hadoop.hdfs.server.blockmanagement.TestBlockInfo |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736057/HDFS-8481-HDFS-7285.03.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 1299357 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11161/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11161/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11161/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11161/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11161/console |


This message was automatically generated.

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, 
 HDFS-8481-HDFS-7285.03.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8251) Move the synthetic load generator into its own package


[ 
https://issues.apache.org/jira/browse/HDFS-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564552#comment-14564552
 ] 

Vinayakumar B commented on HDFS-8251:
-

bq. hadoop-test-tool seems very generic to me. It might make sense if there was 
more than the HDFS load generator in it. Leaving this in RFC for bug bash for a 
second opinion.
IMO, its okay to keep it generic, as current tools depends on the entire 
hadoop( HDFS and MR) for execution, rather than only HDFS. In-fact this is the 
reason why these were kept in mapreduce project to resolve the dependencies.
I agree they are not MR tools, but uses MR infrastructure.
In future, if any such tools intended for some other components, which uses 
entire hadoop, also can be put in this project.

Thoughts?

 Move the synthetic load generator into its own package
 --

 Key: HDFS-8251
 URL: https://issues.apache.org/jira/browse/HDFS-8251
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: J.Andreina
  Labels: BB2015-05-RFC
 Attachments: HDFS-8251.1.patch


 It doesn't really make sense for the HDFS load generator to be a part of the 
 (extremely large) mapreduce jobclient package. It should be pulled out and 
 put its own package, probably in hadoop-tools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3716) Purger should remove stale fsimage ckpt files


[ 
https://issues.apache.org/jira/browse/HDFS-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564582#comment-14564582
 ] 

Vinayakumar B commented on HDFS-3716:
-

change looks fine to me.
I think you can add one test case for this.

 Purger should remove stale fsimage ckpt files
 -

 Key: HDFS-3716
 URL: https://issues.apache.org/jira/browse/HDFS-3716
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: suja s
Assignee: J.Andreina
Priority: Minor
 Attachments: HDFS-3716.1.patch


 NN got killed while checkpointing in progress before renaming the ckpt file 
 to actual file.
 Since the checkpointing process is not completed, on next NN startup it will 
 load previous fsimage and apply rest of the edits.
 Functionally there's no harm but this ckpt file will be retained as is.
 Purger will not remove the ckpt file though other old fsimage files will be 
 taken care.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists


[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564534#comment-14564534
 ] 

Vinayakumar B commented on HDFS-8270:
-

Seems like default retries also got removed. 
Client is not retrying for even connect exceptions.

Just following changes will do IMO

in NameNodeProxies#createNNProxyWithClientProtocol(..) inside {{withRetries}} 
if block, do the below changes. Let everything else be same.
{code} if (withRetries) { // create the proxy with retries
 
-  RetryPolicy createPolicy = RetryPolicies
-  .retryUpToMaximumCountWithFixedSleep(5,
-  HdfsServerConstants.LEASE_SOFTLIMIT_PERIOD, 
TimeUnit.MILLISECONDS);
-
-  MapClass? extends Exception, RetryPolicy remoteExceptionToPolicyMap 
- = new HashMapClass? extends Exception, RetryPolicy();
-  remoteExceptionToPolicyMap.put(AlreadyBeingCreatedException.class,
-  createPolicy);
-
-  RetryPolicy methodPolicy = RetryPolicies.retryByRemoteException(
-  defaultPolicy, remoteExceptionToPolicyMap);
   MapString, RetryPolicy methodNameToPolicyMap 
  = new HashMapString, RetryPolicy();
-
-  methodNameToPolicyMap.put(create, methodPolicy);
 
   ClientProtocol translatorProxy =
 new ClientNamenodeProtocolTranslatorPB(proxy);
{code}

 create() always retried with hardcoded timeout when file already exists
 ---

 Key: HDFS-8270
 URL: https://issues.apache.org/jira/browse/HDFS-8270
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Andrey Stepachev
Assignee: J.Andreina
 Attachments: HDFS-8270.1.patch


 In Hbase we stumbled on unexpected behaviour, which could 
 break things. 
 HDFS-6478 fixed wrong exception
 translation, but that apparently led to unexpected bahaviour:
 clients trying to create file without override=true will be forced
 to retry hardcoded amount of time (60 seconds).
 That could break or slowdown systems, that use filesystem
 for locks (like hbase fsck did, and we got it broken HBASE-13574).
 We should make this behaviour configurable, do client really need
 to wait lease timeout to be sure that file doesn't exists, or it it should
 be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads


[ 
https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564543#comment-14564543
 ] 

Hadoop QA commented on HDFS-8496:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 26s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 14s | The applied patch generated  1 
new checkstyle issues (total was 124, now 120). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 3  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 12s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 162m 57s | Tests passed in hadoop-hdfs. 
|
| | | 208m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736065/HDFS-8496-001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d725dd8 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11162/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11162/console |


This message was automatically generated.

 Calling stopWriter() with FSDatasetImpl lock held may  block other threads
 --

 Key: HDFS-8496
 URL: https://issues.apache.org/jira/browse/HDFS-8496
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: zhouyingchao
Assignee: zhouyingchao
 Attachments: HDFS-8496-001.patch


 On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and  
 heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By 
 looking at the stack, we found the calling of stopWriter() with FSDatasetImpl 
 lock blocked everything.
 Following is the heartbeat stack, as an example, to show how threads are 
 blocked by FSDatasetImpl lock:
 {code}
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
 - waiting to lock 0x0007701badc0 (a 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
 - locked 0x000770465dc0 (a java.lang.Object)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The thread which held the FSDatasetImpl lock is just sleeping to wait another 
 thread to exit in stopWriter(). The stack is:
 {code}
java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Thread.join(Thread.java:1194)
 - locked 0x0007636953b8 (a org.apache.hadoop.util.Daemon)

[jira] [Commented] (HDFS-6775) Users may see TrashPolicy if hdfs dfs -rm is run


[ 
https://issues.apache.org/jira/browse/HDFS-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564594#comment-14564594
 ] 

Vinayakumar B commented on HDFS-6775:
-

+1, LGTM.
Committing soon

 Users may see TrashPolicy if hdfs dfs -rm is run
 

 Key: HDFS-6775
 URL: https://issues.apache.org/jira/browse/HDFS-6775
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: J.Andreina
 Attachments: HDFS-6775.1.patch, HDFS-6775.2.patch


 Doing 'hdfs dfs -rm file' generates an extra log message on the console:
 {code}
 14/07/29 15:18:56 INFO fs.TrashPolicyDefault: Namenode trash configuration: 
 Deletion interval = 0 minutes, Emptier interval = 0 minutes.
 {code}
 This shouldn't be seen by users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes


[ 
https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564571#comment-14564571
 ] 

Vinayakumar B commented on HDFS-7401:
-

+1 for the patch.
Will commit soon

 Add block info to DFSInputStream' WARN message when it adds node to deadNodes
 -

 Key: HDFS-7401
 URL: https://issues.apache.org/jira/browse/HDFS-7401
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Arshad Mohammad
Priority: Minor
  Labels: BB2015-05-RFC
 Attachments: HDFS-7401-2.patch, HDFS-7401.patch


 Block info is missing in the below message
 {noformat}
 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to 
 connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. 
 java.io.IOException: Got error for OP_READ_BLOCK
 {noformat}
 The code
 {noformat}
 DFSInputStream.java
   DFSClient.LOG.warn(Failed to connect to  + targetAddr +  for 
 block
 + , add to deadNodes and continue.  + ex, ex);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer

2015-05-29 Thread Walter Su (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564572#comment-14564572
 ] 

Walter Su commented on HDFS-8254:
-

This case failed.
{code}
  @Test(timeout=12)
  public void testDatanodeFailure3() {
final int length = NUM_DATA_BLOCKS*BLOCK_SIZE * 2;
  ...
{code}
cause:
Thread streamer #3 has been shutdown because of {{handleBadDatanode()}}. When 
outputstream move forword to write next block group. streamer #3 has error and 
doesn't have endBlock in Coordinator.

 In StripedDataStreamer, it is hard to tolerate datanode failure in the 
 leading streamer
 ---

 Key: HDFS-8254
 URL: https://issues.apache.org/jira/browse/HDFS-8254
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8254_20150526.patch, h8254_20150526b.patch


 StripedDataStreamer javadoc is shown below.
 {code}
  * The StripedDataStreamer class is used by {@link DFSStripedOutputStream}.
  * There are two kinds of StripedDataStreamer, leading streamer and ordinary
  * stream. Leading streamer requests a block group from NameNode, unwraps
  * it to located blocks and transfers each located block to its corresponding
  * ordinary streamer via a blocking queue.
 {code}
 Leading streamer is the streamer with index 0.  When the datanode of the 
 leading streamer fails, the other steamers cannot continue since no one will 
 request a block group from NameNode anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes


 [ 
https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7401:

   Resolution: Fixed
Fix Version/s: 2.8.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.
Thanks all.

 Add block info to DFSInputStream' WARN message when it adds node to deadNodes
 -

 Key: HDFS-7401
 URL: https://issues.apache.org/jira/browse/HDFS-7401
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Arshad Mohammad
Priority: Minor
  Labels: BB2015-05-RFC
 Fix For: 2.8.0

 Attachments: HDFS-7401-2.patch, HDFS-7401.patch


 Block info is missing in the below message
 {noformat}
 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to 
 connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. 
 java.io.IOException: Got error for OP_READ_BLOCK
 {noformat}
 The code
 {noformat}
 DFSInputStream.java
   DFSClient.LOG.warn(Failed to connect to  + targetAddr +  for 
 block
 + , add to deadNodes and continue.  + ex, ex);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes


[ 
https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564590#comment-14564590
 ] 

Hudson commented on HDFS-7401:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7924 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7924/])
HDFS-7401. Add block info to DFSInputStream' WARN message when it adds node to 
deadNodes (Contributed by Arshad Mohammad) (vinayakumarb: rev 
b75df697e0f101f86788ad23a338ab3545b8d702)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


 Add block info to DFSInputStream' WARN message when it adds node to deadNodes
 -

 Key: HDFS-7401
 URL: https://issues.apache.org/jira/browse/HDFS-7401
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Arshad Mohammad
Priority: Minor
  Labels: BB2015-05-RFC
 Fix For: 2.8.0

 Attachments: HDFS-7401-2.patch, HDFS-7401.patch


 Block info is missing in the below message
 {noformat}
 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to 
 connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. 
 java.io.IOException: Got error for OP_READ_BLOCK
 {noformat}
 The code
 {noformat}
 DFSInputStream.java
   DFSClient.LOG.warn(Failed to connect to  + targetAddr +  for 
 block
 + , add to deadNodes and continue.  + ex, ex);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8443) Document dfs.namenode.service.handler.count in hdfs-site.xml


[ 
https://issues.apache.org/jira/browse/HDFS-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564753#comment-14564753
 ] 

Hudson commented on HDFS-8443:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2140 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2140/])
HDFS-8443. Document dfs.namenode.service.handler.count in hdfs-site.xml. 
Contributed by J.Andreina. (aajisaka: rev 
d725dd8af682f0877cf523744d9801174b727f4e)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


 Document dfs.namenode.service.handler.count in hdfs-site.xml
 

 Key: HDFS-8443
 URL: https://issues.apache.org/jira/browse/HDFS-8443
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-8443.1.patch, HDFS-8443.2.patch, HDFS-8443.3.patch


 When dfs.namenode.servicerpc-address is configured, NameNode launches an 
 extra RPC server to handle requests from non-client nodes. 
 dfs.namenode.service.handler.count specifies the number of threads for the 
 server but the parameter is not documented anywhere.
 I found a mail for asking about the parameter. 
 http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3CE0D5A619-BDEA-44D2-81EB-C32B8464133D%40gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes


[ 
https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564749#comment-14564749
 ] 

Hudson commented on HDFS-7401:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2140 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2140/])
HDFS-7401. Add block info to DFSInputStream' WARN message when it adds node to 
deadNodes (Contributed by Arshad Mohammad) (vinayakumarb: rev 
b75df697e0f101f86788ad23a338ab3545b8d702)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


 Add block info to DFSInputStream' WARN message when it adds node to deadNodes
 -

 Key: HDFS-7401
 URL: https://issues.apache.org/jira/browse/HDFS-7401
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Arshad Mohammad
Priority: Minor
  Labels: BB2015-05-RFC
 Fix For: 2.8.0

 Attachments: HDFS-7401-2.patch, HDFS-7401.patch


 Block info is missing in the below message
 {noformat}
 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to 
 connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. 
 java.io.IOException: Got error for OP_READ_BLOCK
 {noformat}
 The code
 {noformat}
 DFSInputStream.java
   DFSClient.LOG.warn(Failed to connect to  + targetAddr +  for 
 block
 + , add to deadNodes and continue.  + ex, ex);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8429) Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread


[ 
https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564757#comment-14564757
 ] 

Hudson commented on HDFS-8429:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2140 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2140/])
HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that 
stops the thread.  (zhouyingchao via cmccabe) (cmccabe: rev 
246cefa089156a50bf086b8b1e4d4324d66dc58c)
* hadoop-common-project/hadoop-common/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java


 Avoid stuck threads if there is an error in DomainSocketWatcher that stops 
 the thread
 -

 Key: HDFS-8429
 URL: https://issues.apache.org/jira/browse/HDFS-8429
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.8.0

 Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch, 
 HDFS-8429-003.patch


 In our cluster, an application is hung when doing a short circuit read of 
 local hdfs block. By looking into the log, we found the DataNode's 
 DomainSocketWatcher.watcherThread has exited with following log:
 {code}
 ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: 
 Thread[Thread-25,5,main] terminating on unexpected exception
 java.lang.NullPointerException
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The line 463 is following code snippet:
 {code}
  try {
 for (int fd : fdSet.getAndClearReadableFds()) {
   sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet,
 fd);
 }
 {code}
 getAndClearReadableFds is a native method which will malloc an int array. 
 Since our memory is very tight, it looks like the malloc failed and a NULL 
 pointer is returned.
 The bad thing is that other threads then blocked in stack like this:
 {code}
 DataXceiver for client 
 unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for 
 operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on 
 condition [0x7f09b9856000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x0007b0174808 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 IMO, we should exit the DN so that the users can know that something go  
 wrong  and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8407) hdfsListDirectory must set errno to 0 on success


[ 
https://issues.apache.org/jira/browse/HDFS-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564756#comment-14564756
 ] 

Hudson commented on HDFS-8407:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2140 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2140/])
HDFS-8407. libhdfs hdfsListDirectory must set errno to 0 on success (Masatake 
Iwasaki via Colin P. McCabe) (cmccabe: rev 
d2d95bfe886a7fdf9d58fd5c47ec7c0158393afb)
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test_libhdfs_threaded.c
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_libhdfs_ops.c
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/expect.h


 hdfsListDirectory must set errno to 0 on success
 

 Key: HDFS-8407
 URL: https://issues.apache.org/jira/browse/HDFS-8407
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Reporter: Juan Yu
Assignee: Masatake Iwasaki
 Fix For: 2.8.0

 Attachments: HDFS-8407.001.patch, HDFS-8407.002.patch, 
 HDFS-8407.003.patch


 The documentation says it returns NULL on error, but it could also return 
 NULL when the directory is empty.
 /** 
  * hdfsListDirectory - Get list of files/directories for a given
  * directory-path. hdfsFreeFileInfo should be called to deallocate 
 memory. 
  * @param fs The configured filesystem handle.
  * @param path The path of the directory. 
  * @param numEntries Set to the number of files/directories in path.
  * @return Returns a dynamically-allocated array of hdfsFileInfo
  * objects; NULL on error.
  */
 {code}
 hdfsFileInfo *pathList = NULL; 
 ...
 //Figure out the number of entries in that directory
 jPathListSize = (*env)-GetArrayLength(env, jPathList);
 if (jPathListSize == 0) {
 ret = 0;
 goto done;
 }
 ...
 if (ret) {
 hdfsFreeFileInfo(pathList, jPathListSize);
 errno = ret;
 return NULL;
 }
 *numEntries = jPathListSize;
 return pathList;
 {code}
 Either change the implementation to match the doc, or fix the doc to match 
 the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8443) Document dfs.namenode.service.handler.count in hdfs-site.xml


[ 
https://issues.apache.org/jira/browse/HDFS-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564909#comment-14564909
 ] 

Hudson commented on HDFS-8443:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #201 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/201/])
HDFS-8443. Document dfs.namenode.service.handler.count in hdfs-site.xml. 
Contributed by J.Andreina. (aajisaka: rev 
d725dd8af682f0877cf523744d9801174b727f4e)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml


 Document dfs.namenode.service.handler.count in hdfs-site.xml
 

 Key: HDFS-8443
 URL: https://issues.apache.org/jira/browse/HDFS-8443
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-8443.1.patch, HDFS-8443.2.patch, HDFS-8443.3.patch


 When dfs.namenode.servicerpc-address is configured, NameNode launches an 
 extra RPC server to handle requests from non-client nodes. 
 dfs.namenode.service.handler.count specifies the number of threads for the 
 server but the parameter is not documented anywhere.
 I found a mail for asking about the parameter. 
 http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3CE0D5A619-BDEA-44D2-81EB-C32B8464133D%40gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8471) Implement read block over HTTP/2

2015-05-29 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Attachment: HDFS-8471.1.patch

Add checksum support. Introduce a ReadBlockHandler. Add a testcase to test 
block not exists error.

 Implement read block over HTTP/2
 

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.1.patch, HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8407) hdfsListDirectory must set errno to 0 on success


[ 
https://issues.apache.org/jira/browse/HDFS-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564956#comment-14564956
 ] 

Hudson commented on HDFS-8407:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #210 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/210/])
HDFS-8407. libhdfs hdfsListDirectory must set errno to 0 on success (Masatake 
Iwasaki via Colin P. McCabe) (cmccabe: rev 
d2d95bfe886a7fdf9d58fd5c47ec7c0158393afb)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_libhdfs_ops.c
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test_libhdfs_threaded.c
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/expect.h
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 hdfsListDirectory must set errno to 0 on success
 

 Key: HDFS-8407
 URL: https://issues.apache.org/jira/browse/HDFS-8407
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Reporter: Juan Yu
Assignee: Masatake Iwasaki
 Fix For: 2.8.0

 Attachments: HDFS-8407.001.patch, HDFS-8407.002.patch, 
 HDFS-8407.003.patch


 The documentation says it returns NULL on error, but it could also return 
 NULL when the directory is empty.
 /** 
  * hdfsListDirectory - Get list of files/directories for a given
  * directory-path. hdfsFreeFileInfo should be called to deallocate 
 memory. 
  * @param fs The configured filesystem handle.
  * @param path The path of the directory. 
  * @param numEntries Set to the number of files/directories in path.
  * @return Returns a dynamically-allocated array of hdfsFileInfo
  * objects; NULL on error.
  */
 {code}
 hdfsFileInfo *pathList = NULL; 
 ...
 //Figure out the number of entries in that directory
 jPathListSize = (*env)-GetArrayLength(env, jPathList);
 if (jPathListSize == 0) {
 ret = 0;
 goto done;
 }
 ...
 if (ret) {
 hdfsFreeFileInfo(pathList, jPathListSize);
 errno = ret;
 return NULL;
 }
 *numEntries = jPathListSize;
 return pathList;
 {code}
 Either change the implementation to match the doc, or fix the doc to match 
 the implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8498) Blocks can be committed with wrong size

2015-05-29 Thread Daryn Sharp (JIRA)

Daryn Sharp created HDFS-8498:
-

 Summary: Blocks can be committed with wrong size
 Key: HDFS-8498
 URL: https://issues.apache.org/jira/browse/HDFS-8498
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical


When an IBR for a UC block arrives, the NN updates the expected location's 
block and replica state _only_ if it's on an unexpected storage for an expected 
DN.  If it's for an expected storage, only the genstamp is updated.  When the 
block is committed, and the expected locations are verified, only the genstamp 
is checked.  The size is not checked but it wasn't updated in the expected 
locations anyway.

A faulty client may misreport the size when committing the block.  The block is 
effectively corrupted.  If the NN issues replications, the received IBR is 
considered corrupt, the NN invalidates the block, immediately issues another 
replication.  The NN eventually realizes all the original replicas are corrupt 
after full BRs are received from the original DNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes


[ 
https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564908#comment-14564908
 ] 

Hudson commented on HDFS-7401:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #201 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/201/])
HDFS-7401. Add block info to DFSInputStream' WARN message when it adds node to 
deadNodes (Contributed by Arshad Mohammad) (vinayakumarb: rev 
b75df697e0f101f86788ad23a338ab3545b8d702)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Add block info to DFSInputStream' WARN message when it adds node to deadNodes
 -

 Key: HDFS-7401
 URL: https://issues.apache.org/jira/browse/HDFS-7401
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Arshad Mohammad
Priority: Minor
  Labels: BB2015-05-RFC
 Fix For: 2.8.0

 Attachments: HDFS-7401-2.patch, HDFS-7401.patch


 Block info is missing in the below message
 {noformat}
 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to 
 connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. 
 java.io.IOException: Got error for OP_READ_BLOCK
 {noformat}
 The code
 {noformat}
 DFSInputStream.java
   DFSClient.LOG.warn(Failed to connect to  + targetAddr +  for 
 block
 + , add to deadNodes and continue.  + ex, ex);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8443) Document dfs.namenode.service.handler.count in hdfs-site.xml


[ 
https://issues.apache.org/jira/browse/HDFS-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564954#comment-14564954
 ] 

Hudson commented on HDFS-8443:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #210 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/210/])
HDFS-8443. Document dfs.namenode.service.handler.count in hdfs-site.xml. 
Contributed by J.Andreina. (aajisaka: rev 
d725dd8af682f0877cf523744d9801174b727f4e)
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Document dfs.namenode.service.handler.count in hdfs-site.xml
 

 Key: HDFS-8443
 URL: https://issues.apache.org/jira/browse/HDFS-8443
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-8443.1.patch, HDFS-8443.2.patch, HDFS-8443.3.patch


 When dfs.namenode.servicerpc-address is configured, NameNode launches an 
 extra RPC server to handle requests from non-client nodes. 
 dfs.namenode.service.handler.count specifies the number of threads for the 
 server but the parameter is not documented anywhere.
 I found a mail for asking about the parameter. 
 http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3CE0D5A619-BDEA-44D2-81EB-C32B8464133D%40gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes


[ 
https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564953#comment-14564953
 ] 

Hudson commented on HDFS-7401:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #210 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/210/])
HDFS-7401. Add block info to DFSInputStream' WARN message when it adds node to 
deadNodes (Contributed by Arshad Mohammad) (vinayakumarb: rev 
b75df697e0f101f86788ad23a338ab3545b8d702)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


 Add block info to DFSInputStream' WARN message when it adds node to deadNodes
 -

 Key: HDFS-7401
 URL: https://issues.apache.org/jira/browse/HDFS-7401
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Arshad Mohammad
Priority: Minor
  Labels: BB2015-05-RFC
 Fix For: 2.8.0

 Attachments: HDFS-7401-2.patch, HDFS-7401.patch


 Block info is missing in the below message
 {noformat}
 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to 
 connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. 
 java.io.IOException: Got error for OP_READ_BLOCK
 {noformat}
 The code
 {noformat}
 DFSInputStream.java
   DFSClient.LOG.warn(Failed to connect to  + targetAddr +  for 
 block
 + , add to deadNodes and continue.  + ex, ex);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8429) Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread


[ 
https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564957#comment-14564957
 ] 

Hudson commented on HDFS-8429:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #210 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/210/])
HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that 
stops the thread.  (zhouyingchao via cmccabe) (cmccabe: rev 
246cefa089156a50bf086b8b1e4d4324d66dc58c)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java
* hadoop-common-project/hadoop-common/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
* 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c


 Avoid stuck threads if there is an error in DomainSocketWatcher that stops 
 the thread
 -

 Key: HDFS-8429
 URL: https://issues.apache.org/jira/browse/HDFS-8429
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: zhouyingchao
Assignee: zhouyingchao
 Fix For: 2.8.0

 Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch, 
 HDFS-8429-003.patch


 In our cluster, an application is hung when doing a short circuit read of 
 local hdfs block. By looking into the log, we found the DataNode's 
 DomainSocketWatcher.watcherThread has exited with following log:
 {code}
 ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: 
 Thread[Thread-25,5,main] terminating on unexpected exception
 java.lang.NullPointerException
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 The line 463 is following code snippet:
 {code}
  try {
 for (int fd : fdSet.getAndClearReadableFds()) {
   sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet,
 fd);
 }
 {code}
 getAndClearReadableFds is a native method which will malloc an int array. 
 Since our memory is very tight, it looks like the malloc failed and a NULL 
 pointer is returned.
 The bad thing is that other threads then blocked in stack like this:
 {code}
 DataXceiver for client 
 unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for 
 operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on 
 condition [0x7f09b9856000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x0007b0174808 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 IMO, we should exit the DN so that the users can know that something go  
 wrong  and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


[ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564971#comment-14564971
 ] 

Kai Zheng commented on HDFS-8481:
-

Thanks Walter for the good ideas and Zhe for the update!
1. How about having {{RawDecoder decoder}} instead, in all places. Later we 
would easily change to use other decoder.
{code}
+  private final RSRawDecoder rsRawDecoder;
{code}
2. I guess only when data blocks are erased it will run into here to decode. 1) 
Should we count MISSING count and avoid too many blocks erased exception? 2) Do 
we need the {{else}} block? 3) Note the code format minor.
{code}
+  } else if (chunk.state == StripingChunk.MISSING){
+decodeInputs[i] = null;
+  } else {
+decodeInputs[i] = null;
{code}
3. Around or in {{decodeAndFillBuffer}}, is it doable to use the source buffers 
as input buffers and destination buffers as the output buffers directly to 
avoid data copy?
4. I agree with Walter's concern, we would try to reuse related buffers or 
structures around and across the decode calling. For that, we might need to 
move them to and prepare them in the main class ({{DFSStripedInputStream}}) 
along with the decoder, or have a high level construct like 
{{StrippedDecoder}}. As it's non-trivial, I agree we can do this separately, 
but maybe in HDFS-7285?

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, 
 HDFS-8481-HDFS-7285.03.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7609) startup used too much time to load edits


[ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565161#comment-14565161
 ] 

Jing Zhao commented on HDFS-7609:
-

The 03 patch looks good to me. +1. I will commit it shortly.

 startup used too much time to load edits
 

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
Assignee: Ming Ma
  Labels: BB2015-05-RFC
 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, 
 HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits


 [ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7609:

Summary: Avoid retry cache collision when Standby NameNode loading edits  
(was: startup used too much time to load edits)

 Avoid retry cache collision when Standby NameNode loading edits
 ---

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
Assignee: Ming Ma
  Labels: BB2015-05-RFC
 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, 
 HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8322) Display warning if hadoop fs -ls is showing the local filesystem


 [ 
https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-8322:

Attachment: HDFS-8322.004.patch

Thanks a lot, [~andrew.wang]. It is a great suggestion. I have modified the 
patch to address your comments.

 Display warning if hadoop fs -ls is showing the local filesystem
 

 Key: HDFS-8322
 URL: https://issues.apache.org/jira/browse/HDFS-8322
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, 
 HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, 
 HDFS-8322.004.patch


 Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs 
 -ls}}.
 This JIRA proposes displaying a warning message if hadoop fs -ls is showing 
 the local filesystem or using default fs.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits


 [ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7609:

Issue Type: Bug  (was: Improvement)

 Avoid retry cache collision when Standby NameNode loading edits
 ---

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
Assignee: Ming Ma
Priority: Critical
  Labels: BB2015-05-RFC
 Fix For: 2.8.0

 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, 
 HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits


 [ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7609:

Priority: Critical  (was: Major)

 Avoid retry cache collision when Standby NameNode loading edits
 ---

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
Assignee: Ming Ma
Priority: Critical
  Labels: BB2015-05-RFC
 Fix For: 2.8.0

 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, 
 HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits


 [ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7609:

   Resolution: Fixed
Fix Version/s: 2.8.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks [~mingma] for the fix and 
[~CarreyZhan] for the report! And thanks to all for the discussion!

 Avoid retry cache collision when Standby NameNode loading edits
 ---

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
Assignee: Ming Ma
  Labels: BB2015-05-RFC
 Fix For: 2.8.0

 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, 
 HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8460) Erasure Coding: stateful read result doesn't match data occasionally because of flawed test


[ 
https://issues.apache.org/jira/browse/HDFS-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565129#comment-14565129
 ] 

Jing Zhao commented on HDFS-8460:
-

We can use {{DataNodeTestUtil#setHeartbeatsDisabledForTests}} to disable the 
heartbeat. Other than this looks good to me.

 Erasure Coding: stateful read result doesn't match data occasionally because 
 of flawed test
 ---

 Key: HDFS-8460
 URL: https://issues.apache.org/jira/browse/HDFS-8460
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Yi Liu
Assignee: Walter Su
 Attachments: HDFS-8460-HDFS-7285.001.patch


 I found this issue in TestDFSStripedInputStream, {{testStatefulRead}} failed 
 occasionally shows that read result doesn't match data written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-05-29 Thread Jesse Yates (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565137#comment-14565137
 ] 

Jesse Yates commented on HDFS-6440:
---

Failed tests pass locally. Missed a whitespace in TestPipelinesFailover :( 
Could fix on commit, unless there are other comments on the latest version, in 
which case I'll wrap that into a new revision.

Otherwise, i'd say this is go to go, [~atm]?

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8499) Merge BlockInfoUnderConstruction into trunk

Zhe Zhang created HDFS-8499:
---

 Summary: Merge BlockInfoUnderConstruction into trunk
 Key: HDFS-8499
 URL: https://issues.apache.org/jira/browse/HDFS-8499
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang


In HDFS-7285 branch, the {{BlockInfoUnderConstruction}} interface provides a 
common abstraction for striped and contiguous UC blocks. This JIRA aims to 
merge it to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.


[ 
https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565684#comment-14565684
 ] 

Lei (Eddy) Xu commented on HDFS-8322:
-

[~andrew.wang] I think {{TestWebDelegationToken}} is not relevant, I ran this 
test locally and successed.



 Display warning if defaultFs is not set when running dfs commands.
 --

 Key: HDFS-8322
 URL: https://issues.apache.org/jira/browse/HDFS-8322
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, 
 HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, 
 HDFS-8322.004.patch, HDFS-8322.005.patch, HDFS-8322.006.patch


 Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs 
 -ls}}.
 This JIRA proposes displaying a warning message if hadoop fs -ls is showing 
 the local filesystem or using default fs.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8420) Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir


[ 
https://issues.apache.org/jira/browse/HDFS-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565716#comment-14565716
 ] 

Zhe Zhang commented on HDFS-8420:
-

Seems the change is already included in HDFS-8408

 Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path 
 properly if zone dir itself is the snapshottable dir
 --

 Key: HDFS-8420
 URL: https://issues.apache.org/jira/browse/HDFS-8420
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8320-HDFS-7285-00.patch, 
 HDFS-8320-HDFS-7285-01.patch


 Presently the resultant zone dir will come with {{.snapshot}} only when the 
 zone dir itself is snapshottable dir. It will return the path including the 
 snapshot name like, {{/zone/.snapshot/snap1}}. Instead could improve this by 
 returning only path {{/zone}}.
 Thanks [~vinayrpet] for the helpful 
 [discussion|https://issues.apache.org/jira/browse/HDFS-8266?focusedCommentId=14543821page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14543821]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks


 [ 
https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8489:

Attachment: HDFS-8489.03.patch

Thanks Jing for the comment! Yes the {{replaceBlock}} logic is also different 
with striping. Uploading new patch with the {{replaceBlock}} change.

 Subclass BlockInfo to represent contiguous blocks
 -

 Key: HDFS-8489
 URL: https://issues.apache.org/jira/browse/HDFS-8489
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, 
 HDFS-8489.02.patch, HDFS-8489.03.patch


 As second step of the cleanup, we should make {{BlockInfo}} an abstract class 
 and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The 
 patch should clearly separate where to use the abstract class versus the 
 subclass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


[ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565691#comment-14565691
 ] 

Kai Zheng commented on HDFS-8481:
-

Good discussion here, thanks!
bq. In that case we cannot reuse the source buffers I guess? Then do we need to 
expose this information in the decoder?
Good catch Jing! Yes in this case we can't reuse the source buffers here as 
they need to be passed to caller/applications without being changed. I'm 
planning to re-implement the Java coders in HADOOP-12041 and related, when done 
it's possible to ensure the input buffers not to be affected. Benefits of doing 
this in coder layer: 1) a more clear contract between coder and caller in more 
general sense for the inputs; 2) concrete coder may have specific tweak to 
optimize in the aspect, ideally no input data copying at all, worst, make the 
copy, but all transparent to callers; 3) allow new coders (LRC, HH) to be 
layered on other primitive coders (RS, XOR) more easily. So for now let's 
forget the source buffers reusing here and we can do it in future, but do it 
for output buffers now if easy?

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, 
 HDFS-8481-HDFS-7285.03.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks


 [ 
https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8489:

Attachment: HDFS-8489.02.patch

Updating the patch to remove redundant logic between {{BlockInfo}} and 
{{BlockInfoContiguous}}. 

The main difference between {{BlockInfoStriped}} and {{BlockInfoContiguous}} is 
that in {{BIStriped#triplets}}, the first {{dataBlockNum}} slots are ordered 
based on internal block indices. Therefore the first {{dataBlockNum}} slots 
could have null, and we need an indices array to interpret the slots after 
{{dataBlockNum}}. So only {{addStorage}}, {{removeStorage}}, and {{numNodes}} 
should stay abstract in {{BlockInfo}} and be separately implemented.

[~jingzhao] We discussed similar ideas under HDFS-7285 JIRAs. Let me know if 
the above makes sense to you.

 Subclass BlockInfo to represent contiguous blocks
 -

 Key: HDFS-8489
 URL: https://issues.apache.org/jira/browse/HDFS-8489
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, 
 HDFS-8489.02.patch


 As second step of the cleanup, we should make {{BlockInfo}} an abstract class 
 and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The 
 patch should clearly separate where to use the abstract class versus the 
 subclass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.


 [ 
https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-8322:

Attachment: HDFS-8322.006.patch

Good findings, [~andrew.wang]. Updated accordingly.

 Display warning if defaultFs is not set when running dfs commands.
 --

 Key: HDFS-8322
 URL: https://issues.apache.org/jira/browse/HDFS-8322
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, 
 HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, 
 HDFS-8322.004.patch, HDFS-8322.005.patch, HDFS-8322.006.patch


 Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs 
 -ls}}.
 This JIRA proposes displaying a warning message if hadoop fs -ls is showing 
 the local filesystem or using default fs.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-05-29 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565561#comment-14565561
]

Colin Patrick McCabe commented on HDFS-7923:

bq. Missing config key documentation in hdfs-defaults.xml

added

bq. requestBlockReportLeaseId: empty catch for unregistered node, we could add
some more informative logging rather than relying on the warn below

added

bq. I discussed the NodeData structure with Colin offline, wondering why we
didn't use a standard Collection. Colin brought up the reason of reducing
garbage, which seems valid. I think we should consider implementing
IntrusiveCollection though rather than writing another.

yes, there will be quite a few of these requests coming in at any given point.
IntrusiveCollection is an interface rather than an implementation, so I don't
think it would help here (it's most useful when an element needs to be in
multiple lists at once, and when you need fancy operations like finding the
list from the element)

bq. I also asked about putting NodeData into DatanodeDescriptor. Not sure what
the conclusion was on this, it might reduce garbage since we don't need a
separate NodeData object.

The locking is easier to understand if all the lease data is inside
{{BlockReportLeaseManager}}.

bq. I prefer Precondition checks for invalid configuration values at startup,
so there aren't any surprises for the user. Not everyone reads the messages on
startup.

bq. requestLease has a check for isTraceEnabled, then logs at debug level

fixed

bq. In offerService, we ignore the new leaseID if we already have one. On the
NN though, a new request wipes out the old leaseID, and processReport checks
based on leaseID rather than node. This kind of bug makes me wonder why we
really need the leaseID at all, why not just attach a boolean to the node? Or
if it's in the deferred vs. pending list?

It's safer for the NameNode to wipe the old lease ID every time there is a new
request. It avoids problems where the DN went down while holding a lease, and
then came back up. We could potentially also avoid those problems by being
very careful with node (un)registration, but why make things more complicated
than they need to be? I do think that the DN should overwrite its old lease ID
if the NN gives it a new one, for the same reason. Let me change it to do
that... Of course this code path should never happen since the NN should never
give a new lease ID when none was requested. So calling this a bug seems
like a bit of a stretch.

I prefer IDs to simply checking against the datanode UUID, because lease IDs
allow us to match up the NN granting a lease with the DN accepting and using
it, which is very useful for debugging or understanding what is happening in
production. It also makes it very obvious whether a DN is cheating by
sending a block report with leaseID = 0 to disable rate-limiting. This is a
use-case we want to support but we also want to know when it is going on.

bq. Can we fix the javadoc for scheduleBlockReport to mention randomness, and
not send...at the next heartbeat? Incorrect right now.

I looked pretty far back into the history of this code. It seems to go back to
at least 2009. The underlying ideas seem to be:
1. the first full block report can have a configurable delay in seconds
expressed by {{dfs.blockreport.initialDelay}}
2. the second full block report gets a random delay between 0 and
{{dfs.blockreport.intervalMsec}}
3. all other block reports get an interval of {{dfs.blockreport.intervalMsec}}
*unless* the previous block report had a longer interval than expected... if
the previous one had a longer interval than expected, the next one gets a
shorter interval.

We can keep behavior #1... it's simple to implement and may be useful for
testing (although I think this patch makes it no longer necessary).

Behavior #2 seems like a workaround for the lack of congestion control in the
past. In a world where the NN rate-limits full block reports, we don't need
this behavior to prevent FBRs from clumping. They will just naturally not
overly clump because we are rate-limiting them.

Behavior #3 just seems incorrect, even without this patch. By definition, a
full block report contains all the information the NN needs to understand the
DN state. Just because block report interval N was longer than expected, seems
no reason to shorten block report interval N+1. In fact, this behavior seems
like it could lead to congestion collapse... if the NN gets overloaded and
can't handle block reports for some time, a bunch of DNs will shorten the time
in between the current block report and the next one, further increasing total
NN load. Not good. Not good at all.

I replaced this with a simple randomize first block report time within 0 and
{{dfs.blockreport.initialDelay}}, then try to do all other

[jira] [Updated] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-05-29 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7923:
---
Attachment: HDFS-7923.004.patch

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks


[ 
https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565556#comment-14565556
 ] 

Jing Zhao commented on HDFS-8489:
-

Thanks for working on this, Zhe. Yes, addStorage, removeStorage, and numNodes 
should be abstract in BlockInfo. Besides, the block replacement logic can also 
be separated from {{BlocksMap#replaceBlock}} and becomes an abstract function 
in BlockInfo, as is done in the current EC feature branch. But this is optional.

 Subclass BlockInfo to represent contiguous blocks
 -

 Key: HDFS-8489
 URL: https://issues.apache.org/jira/browse/HDFS-8489
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, 
 HDFS-8489.02.patch


 As second step of the cleanup, we should make {{BlockInfo}} an abstract class 
 and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The 
 patch should clearly separate where to use the abstract class versus the 
 subclass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8487) Merge BlockInfo-related code changes from HDFS-7285 into trunk


 [ 
https://issues.apache.org/jira/browse/HDFS-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8487:

Description: 
Per offline discussion with [~andrew.wang], for easier and cleaner reviewing, 
we should probably shrink the size of the consolidated HDFS-7285 patch by 
merging some mechanical changes that are unrelated to EC-specific logic to 
trunk first. Those include renaming, subclassing, interfaces, and so forth. 
This umbrella JIRA specifically aims to merge code changes around {{BlockInfo}} 
and {{BlockInfoContiguous}} back into trunk.

The structure of {{BlockInfo}} -related classes are shown below:
{code}
BlockInfo (abstract)
   / \
BlockInfoStriped  BlockInfoContiguous
   ||
   |   BlockInfoUC  |
   |   (interface)  |
   |   / \  |
BlockInfoStripedUC   BlockInfoContiguousUC
{code}

  was:Per offline discussion with [~andrew.wang], for easier and cleaner 
reviewing, we should probably shrink the size of the consolidated HDFS-7285 
patch by merging some mechanical changes that are unrelated to EC-specific 
logic to trunk first. Those include renaming, subclassing, interfaces, and so 
forth. This umbrella JIRA specifically aims to merge code changes around 
{{BlockInfo}} and {{BlockInfoContiguous}} back into trunk.


 Merge BlockInfo-related code changes from HDFS-7285 into trunk
 --

 Key: HDFS-8487
 URL: https://issues.apache.org/jira/browse/HDFS-8487
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 Per offline discussion with [~andrew.wang], for easier and cleaner reviewing, 
 we should probably shrink the size of the consolidated HDFS-7285 patch by 
 merging some mechanical changes that are unrelated to EC-specific logic to 
 trunk first. Those include renaming, subclassing, interfaces, and so forth. 
 This umbrella JIRA specifically aims to merge code changes around 
 {{BlockInfo}} and {{BlockInfoContiguous}} back into trunk.
 The structure of {{BlockInfo}} -related classes are shown below:
 {code}
 BlockInfo (abstract)
/ \
 BlockInfoStriped  BlockInfoContiguous
||
|   BlockInfoUC  |
|   (interface)  |
|   / \  |
 BlockInfoStripedUC   BlockInfoContiguousUC
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8409) HDFS client RPC call throws java.lang.IllegalStateException

2015-05-29 Thread Juan Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565649#comment-14565649
 ] 

Juan Yu commented on HDFS-8409:
---

If it happens at retry, not the initial call. For example, the initial call 
gets an exception after call object is created and sent. so it needs retry, but 
during retry, somehow it gets exception again and this time even before call 
object (should be same callId as the initial call) is created.
In my patch, I added a test to simulate it, does it make sense?

 HDFS client RPC call throws java.lang.IllegalStateException
 -

 Key: HDFS-8409
 URL: https://issues.apache.org/jira/browse/HDFS-8409
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Juan Yu
Assignee: Juan Yu
 Attachments: HDFS-8409.001.patch, HDFS-8409.002.patch, 
 HDFS-8409.003.patch


 When the HDFS client RPC calls need to retry, it sometimes throws 
 java.lang.IllegalStateException and retry is aborted and cause the client 
 call will fail.
 {code}
 Caused by: java.lang.IllegalStateException
   at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
   at org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:116)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:99)
   at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1912)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1089)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
   at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
 {code}
 Here is the check that throws exception
 {code}
   public static void setCallIdAndRetryCount(int cid, int rc) {
   ...
   Preconditions.checkState(callId.get() == null);
   }
 {code}
 The RetryInvocationHandler tries to call it with not null callId and causes 
 exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.


[ 
https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565664#comment-14565664
 ] 

Hadoop QA commented on HDFS-8322:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 26s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m  7s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  0s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  24m 41s | Tests failed in 
hadoop-common. |
| | |  70m 29s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.security.token.delegation.web.TestWebDelegationToken |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736262/HDFS-8322.006.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3ae2a62 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11167/artifact/patchprocess/testrun_hadoop-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11167/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11167/console |


This message was automatically generated.

 Display warning if defaultFs is not set when running dfs commands.
 --

 Key: HDFS-8322
 URL: https://issues.apache.org/jira/browse/HDFS-8322
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, 
 HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, 
 HDFS-8322.004.patch, HDFS-8322.005.patch, HDFS-8322.006.patch


 Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs 
 -ls}}.
 This JIRA proposes displaying a warning message if hadoop fs -ls is showing 
 the local filesystem or using default fs.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering

[
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565708#comment-14565708
]

Kai Zheng commented on HDFS-8481:
-

bq. it is beneficial to accumulate multiple of them before sending to decode.
Kai Zheng Could probably suggest a threshold size.
In pure coder's point of view, yes it's good to have larger cell size. It's not
clear yet in this case because the bottleneck might not be in the computation,
instead in network traffic and data copying stuffs? My suggestion would be, if
the accumulation is already available then we could have a default threshold
value like 4MB but allowing it to be configurable in future; otherwise leave
the accumulation optimization for future consideration at all. I would prefer
not to do the accumulation in coder caller layer because it's hard. If it's
good to have then we may do it in coder layer in one place, like having a
{{BufferedRawErasureCoder}} layered on existing raw coders, transparent to
callers.

Erasure coding: remove workarounds in client side stripped blocks recovering

Key: HDFS-8481
URL: https://issues.apache.org/jira/browse/HDFS-8481
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Attachments: HDFS-8481-HDFS-7285.00.patch,
HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch,
HDFS-8481-HDFS-7285.03.patch, HDFS-8481-HDFS-7285.04.patch

After HADOOP-11847 and related fixes, we should be able to properly calculate
decoded contents.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.

2015-05-29 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565546#comment-14565546
 ] 

Andrew Wang commented on HDFS-8322:
---

I noticed you changed the name of the config parameter, and it's different in 
the code vs. core-default.xml. +1 pending fixing that though.

 Display warning if defaultFs is not set when running dfs commands.
 --

 Key: HDFS-8322
 URL: https://issues.apache.org/jira/browse/HDFS-8322
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, 
 HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, 
 HDFS-8322.004.patch, HDFS-8322.005.patch


 Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs 
 -ls}}.
 This JIRA proposes displaying a warning message if hadoop fs -ls is showing 
 the local filesystem or using default fs.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.


 [ 
https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-8322:

Attachment: HDFS-8322.005.patch

Address checkstyle warning.

The test failure is not relevant.

 Display warning if defaultFs is not set when running dfs commands.
 --

 Key: HDFS-8322
 URL: https://issues.apache.org/jira/browse/HDFS-8322
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, 
 HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, 
 HDFS-8322.004.patch, HDFS-8322.005.patch


 Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs 
 -ls}}.
 This JIRA proposes displaying a warning message if hadoop fs -ls is showing 
 the local filesystem or using default fs.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8409) HDFS client RPC call throws java.lang.IllegalStateException

2015-05-29 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565605#comment-14565605
 ] 

Andrew Wang commented on HDFS-8409:
---

Hey Juan, when would an exception before creation of a Call object not be a 
fatal error?

 HDFS client RPC call throws java.lang.IllegalStateException
 -

 Key: HDFS-8409
 URL: https://issues.apache.org/jira/browse/HDFS-8409
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Juan Yu
Assignee: Juan Yu
 Attachments: HDFS-8409.001.patch, HDFS-8409.002.patch, 
 HDFS-8409.003.patch


 When the HDFS client RPC calls need to retry, it sometimes throws 
 java.lang.IllegalStateException and retry is aborted and cause the client 
 call will fail.
 {code}
 Caused by: java.lang.IllegalStateException
   at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
   at org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:116)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:99)
   at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1912)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1089)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
   at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
 {code}
 Here is the check that throws exception
 {code}
   public static void setCallIdAndRetryCount(int cid, int rc) {
   ...
   Preconditions.checkState(callId.get() == null);
   }
 {code}
 The RetryInvocationHandler tries to call it with not null callId and causes 
 exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-05-29 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7923:
---
 Target Version/s: 2.8.0
Affects Version/s: 2.8.0
   Status: Patch Available  (was: In Progress)

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages

2015-05-29 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565672#comment-14565672
]

Andrew Wang commented on HDFS-7923:
---

Nits:

* Should the checkLease logs be done to the blockLog? We log the startup error
log there in processReport
* Update javadoc in BlockReportContext with what leaseID is for.
* Add something to the log message about overwriting the old leaseID in
offerService. Agree that this shouldn't really trigger, but good defensive
coding practice :)
* DatanodeManager, there's still a register/unregister in registerDatanode I
think we could skip. This is the node restart case where it's registered
previously.
* BRLManager requestLease, we auto-register the node on requestLease. This
shouldn't happen since DNs need to register before doing anything else. We can
keep this here
* Still need documentation of new config keys in hdfs-default.xml

Block report scheduling:
* We removed TestBPSAScheduler#testScheduleBlockReportImmediate, should this
swap over to testing forceFullBlockReport?
* Extra import in TestBPSAScheduler and BPSA
* I'm worried about convoy effects if we don't stick to the stride system of
the old code. I think of the old code as follows:

# Choose a random time within the initialDelay interval to jitter
# Attempt to block report at that same time every hour.

This keeps the BRs from all the DNs spread out, even if the NN gets temporarily
backed up. Once the NN catches up and flushes its backlog of FBRs, future BRs
will still be nicely spread out.

My understanding of your new scheme is that after a DN successfully BRs, it'll
BR again an hour afterwards. So, if all the BRs piled up and then are processed
in quick succession, all the DNs will BR at about the same time next hour.
Since we want to spread the BRs out across the hour, this is not good.

Other ideas are to round up to the next stride. Or, wait an interval plus a
random delay. We might consider some congestion control too, where the DNs
backoff linearly or exponentially. All these schemes delay the FBRs, but maybe
we trust IBRs enough now.

If you want to pursue this logic change more, let's split it out into a
follow-on JIRA. The rest LGTM, +1 pending above comments.

The DataNodes should rate-limit their full block reports by asking the NN on
heartbeat messages
---

Key: HDFS-7923
URL: https://issues.apache.org/jira/browse/HDFS-7923
Project: Hadoop HDFS
Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch,
HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch

The DataNodes should rate-limit their full block reports. They can do this
by first sending a heartbeat message to the NN with an optional boolean set
which requests permission to send a full block report. If the NN responds
with another optional boolean set, the DN will send an FBR... if not, it will
wait until later. This can be done compatibly with optional fields.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


 [ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8481:

Attachment: HDFS-8481-HDFS-7285.04.patch

Thanks Kai for verifying this. I'm attaching 04 patch to address minor issues 
above.

To address the GC issue we should also avoid filling 0 bytes. Maybe the codec 
can support a special flag to mark an input slot as all-zero?

I'm currently working on reusing the input/output buffers. It turns out tricky 
because 1) we need to change all byte arrays to {{ByteBuffer} and 2) we need a 
better abstraction to divide the rounds of {{decode()}} aligned at cell 
boundaries. Perhaps something like a {{StripedDecoder}}.

If the 04 patch looks OK for removing decoding workaround, how about we commit 
it first while we work on the various tasks discussed above to reuse all input 
and output buffers?

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, 
 HDFS-8481-HDFS-7285.03.patch, HDFS-8481-HDFS-7285.04.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer


[ 
https://issues.apache.org/jira/browse/HDFS-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565722#comment-14565722
 ] 

Zhe Zhang commented on HDFS-8254:
-

bq. I think it won't be an issue. Cause MultipleBlockingQueue.poll(..) has 
synchronized(queues)
Yes good point. I'm OK with leaving {{locateFollowingBlock}} as-is in this JIRA 
but we can think about moving it to the coordinator for cleaner flow.

 In StripedDataStreamer, it is hard to tolerate datanode failure in the 
 leading streamer
 ---

 Key: HDFS-8254
 URL: https://issues.apache.org/jira/browse/HDFS-8254
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8254_20150526.patch, h8254_20150526b.patch


 StripedDataStreamer javadoc is shown below.
 {code}
  * The StripedDataStreamer class is used by {@link DFSStripedOutputStream}.
  * There are two kinds of StripedDataStreamer, leading streamer and ordinary
  * stream. Leading streamer requests a block group from NameNode, unwraps
  * it to located blocks and transfers each located block to its corresponding
  * ordinary streamer via a blocking queue.
 {code}
 Leading streamer is the streamer with index 0.  When the datanode of the 
 leading streamer fails, the other steamers cannot continue since no one will 
 request a block group from NameNode anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


[ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565721#comment-14565721
 ] 

Kai Zheng commented on HDFS-8481:
-

bq. To address the GC issue we should also avoid filling 0 bytes. Maybe the 
codec can support a special flag to mark an input slot as all-zero?
Good idea! It's easy to add such flag in {{ECChunk}} and we can use the 
following version API:
{code}
public void decode(ECChunk[] inputs, int[] erasedIndexes, ECChunk[] outputs);
{code}
bq. how about we commit it first while we work on the various tasks discussed 
above to reuse all input and output buffers?
I'm OK with this approach. 

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, 
 HDFS-8481-HDFS-7285.03.patch, HDFS-8481-HDFS-7285.04.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks


 [ 
https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8489:

Attachment: HDFS-8489.04.patch

Both {{TestFileTruncate}} and {{TestAppendSnapshotTruncate}} pass locally.

Uploading new patch with 2 changes to address check style issues:
# Remove 2 unused imports from {{BlocksMap}}
# Add a period to a Javadoc in {{BlockInfo}}

 Subclass BlockInfo to represent contiguous blocks
 -

 Key: HDFS-8489
 URL: https://issues.apache.org/jira/browse/HDFS-8489
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, 
 HDFS-8489.02.patch, HDFS-8489.03.patch, HDFS-8489.04.patch


 As second step of the cleanup, we should make {{BlockInfo}} an abstract class 
 and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The 
 patch should clearly separate where to use the abstract class versus the 
 subclass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks


[ 
https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565794#comment-14565794
 ] 

Hadoop QA commented on HDFS-8489:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:green}+1{color} | javac |   7m 28s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 12s | The applied patch generated  5 
new checkstyle issues (total was 692, now 692). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 17s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 163m  2s | Tests failed in hadoop-hdfs. |
| | | 209m 10s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestAppendSnapshotTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736270/HDFS-8489.03.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6aec13c |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11168/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11168/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11168/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11168/console |


This message was automatically generated.

 Subclass BlockInfo to represent contiguous blocks
 -

 Key: HDFS-8489
 URL: https://issues.apache.org/jira/browse/HDFS-8489
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, 
 HDFS-8489.02.patch, HDFS-8489.03.patch


 As second step of the cleanup, we should make {{BlockInfo}} an abstract class 
 and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The 
 patch should clearly separate where to use the abstract class versus the 
 subclass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks


[ 
https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565770#comment-14565770
 ] 

Hadoop QA commented on HDFS-8489:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 14s | The applied patch generated  2 
new checkstyle issues (total was 687, now 684). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 14s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 162m 39s | Tests failed in hadoop-hdfs. |
| | | 208m 42s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736254/HDFS-8489.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7673d4f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11166/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11166/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11166/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11166/console |


This message was automatically generated.

 Subclass BlockInfo to represent contiguous blocks
 -

 Key: HDFS-8489
 URL: https://issues.apache.org/jira/browse/HDFS-8489
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, 
 HDFS-8489.02.patch, HDFS-8489.03.patch


 As second step of the cleanup, we should make {{BlockInfo}} an abstract class 
 and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The 
 patch should clearly separate where to use the abstract class versus the 
 subclass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8420) Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir


[ 
https://issues.apache.org/jira/browse/HDFS-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565806#comment-14565806
 ] 

Rakesh R commented on HDFS-8420:


bq. Seems the change is already included in HDFS-8408

Thanks [~zhz] for your time and taking a look at this issue. Yes, I agree with 
you.

Also, thank you [~vinayrpet] for incorporating this case in HDFS-8408

 Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path 
 properly if zone dir itself is the snapshottable dir
 --

 Key: HDFS-8420
 URL: https://issues.apache.org/jira/browse/HDFS-8420
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8320-HDFS-7285-00.patch, 
 HDFS-8320-HDFS-7285-01.patch


 Presently the resultant zone dir will come with {{.snapshot}} only when the 
 zone dir itself is snapshottable dir. It will return the path including the 
 snapshot name like, {{/zone/.snapshot/snap1}}. Instead could improve this by 
 returning only path {{/zone}}.
 Thanks [~vinayrpet] for the helpful 
 [discussion|https://issues.apache.org/jira/browse/HDFS-8266?focusedCommentId=14543821page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14543821]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8420) Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir


 [ 
https://issues.apache.org/jira/browse/HDFS-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8420:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

 Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path 
 properly if zone dir itself is the snapshottable dir
 --

 Key: HDFS-8420
 URL: https://issues.apache.org/jira/browse/HDFS-8420
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8320-HDFS-7285-00.patch, 
 HDFS-8320-HDFS-7285-01.patch


 Presently the resultant zone dir will come with {{.snapshot}} only when the 
 zone dir itself is snapshottable dir. It will return the path including the 
 snapshot name like, {{/zone/.snapshot/snap1}}. Instead could improve this by 
 returning only path {{/zone}}.
 Thanks [~vinayrpet] for the helpful 
 [discussion|https://issues.apache.org/jira/browse/HDFS-8266?focusedCommentId=14543821page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14543821]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages


[ 
https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565784#comment-14565784
 ] 

Hadoop QA commented on HDFS-7923:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 13s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 15 new or modified test files. |
| {color:green}+1{color} | javac |   9m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 19s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 36s | The applied patch generated  
25 new checkstyle issues (total was 1365, now 1380). |
| {color:red}-1{color} | whitespace |   0m  9s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 53s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 43s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 35s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 100m 55s | Tests failed in hadoop-hdfs. |
| | | 152m 39s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.TestSetrepDecreasing |
| Timed out tests | 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736276/HDFS-7923.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6aec13c |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11169/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11169/console |


This message was automatically generated.

 The DataNodes should rate-limit their full block reports by asking the NN on 
 heartbeat messages
 ---

 Key: HDFS-7923
 URL: https://issues.apache.org/jira/browse/HDFS-7923
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, 
 HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch


 The DataNodes should rate-limit their full block reports.  They can do this 
 by first sending a heartbeat message to the NN with an optional boolean set 
 which requests permission to send a full block report.  If the NN responds 
 with another optional boolean set, the DN will send an FBR... if not, it will 
 wait until later.  This can be done compatibly with optional fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8450) Erasure Coding: Consolidate erasure coding zone related implementation into a single class


[ 
https://issues.apache.org/jira/browse/HDFS-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565813#comment-14565813
 ] 

Rakesh R commented on HDFS-8450:


[~drankye] I hope I've addressed your comments. Could you please review the 
patch again when you get a chance. Thanks!

 Erasure Coding: Consolidate erasure coding zone related implementation into a 
 single class
 --

 Key: HDFS-8450
 URL: https://issues.apache.org/jira/browse/HDFS-8450
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8450-HDFS-7285-00.patch, 
 HDFS-8450-HDFS-7285-01.patch, HDFS-8450-HDFS-7285-02.patch, 
 HDFS-8450-HDFS-7285-03.patch


 The idea is to follow the same pattern suggested by HDFS-7416. It is good  to 
 consolidate all the erasure coding zone related implementations of 
 {{FSNamesystem}}. Here, proposing {{FSDirErasureCodingZoneOp}} class to have 
 functions to perform related erasure coding zone operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


[ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565822#comment-14565822
 ] 

Hadoop QA commented on HDFS-8481:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 44s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 55s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 14s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 38s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 28s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 20s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 172m 36s | Tests failed in hadoop-hdfs. |
| | | 215m 59s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.TestAppendSnapshotTruncate |
|   | hadoop.hdfs.TestEncryptedTransfer |
|   | hadoop.hdfs.TestRecoverStripedFile |
|   | hadoop.hdfs.server.namenode.TestAuditLogs |
|   | hadoop.hdfs.server.blockmanagement.TestBlockInfo |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736292/HDFS-8481-HDFS-7285.04.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 1299357 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11170/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11170/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11170/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11170/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11170/console |


This message was automatically generated.

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, 
 HDFS-8481-HDFS-7285.03.patch, HDFS-8481-HDFS-7285.04.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


[ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565224#comment-14565224
 ] 

Jing Zhao commented on HDFS-8481:
-

Thanks for working on this, Zhe! I agree that we should reuse the source 
buffers if possible. One question for [~drankye] is, in the javadoc of decoder, 
it is mentioned that some decoder may change the content of the input. In that 
case we cannot reuse the source buffers I guess? Then do we need to expose this 
information in the decoder?

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, 
 HDFS-8481-HDFS-7285.03.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


[ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565259#comment-14565259
 ] 

Zhe Zhang commented on HDFS-8481:
-

Thanks for the comment Jing. I guess need some smart policy here because we 
don't want to feed the decoder with very small buffers either. For example, if 
cell size is small, like 16KB, it is beneficial to accumulate multiple of them 
before sending to decode. [~drankye] Could probably suggest a threshold size.

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, 
 HDFS-8481-HDFS-7285.03.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.


[ 
https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565408#comment-14565408
 ] 

Hadoop QA commented on HDFS-8322:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 46s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 13s | The applied patch generated  1 
new checkstyle issues (total was 190, now 191). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 54s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  23m 11s | Tests failed in 
hadoop-common. |
| | |  62m 48s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.security.token.delegation.web.TestWebDelegationToken |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12736192/HDFS-8322.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7817674 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11165/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11165/artifact/patchprocess/testrun_hadoop-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11165/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11165/console |


This message was automatically generated.

 Display warning if defaultFs is not set when running dfs commands.
 --

 Key: HDFS-8322
 URL: https://issues.apache.org/jira/browse/HDFS-8322
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, 
 HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, 
 HDFS-8322.004.patch


 Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs 
 -ls}}.
 This JIRA proposes displaying a warning message if hadoop fs -ls is showing 
 the local filesystem or using default fs.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits


[ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565335#comment-14565335
 ] 

Hudson commented on HDFS-7609:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7926 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7926/])
HDFS-7609. Avoid retry cache collision when Standby NameNode loading edits. 
Contributed by Ming Ma. (jing9: rev 7817674a3a4d097b647dd77f1345787dd376d5ea)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Avoid retry cache collision when Standby NameNode loading edits
 ---

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
Assignee: Ming Ma
Priority: Critical
 Fix For: 2.8.0

 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, 
 HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits

2015-05-29 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565353#comment-14565353
 ] 

Ming Ma commented on HDFS-7609:
---

Thanks Jing and all other folks.

 Avoid retry cache collision when Standby NameNode loading edits
 ---

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
Assignee: Ming Ma
Priority: Critical
 Fix For: 2.8.0

 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, 
 HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.


 [ 
https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-8322:

Summary: Display warning if defaultFs is not set when running dfs commands. 
 (was: Display warning if hadoop fs -ls is showing the local filesystem)

 Display warning if defaultFs is not set when running dfs commands.
 --

 Key: HDFS-8322
 URL: https://issues.apache.org/jira/browse/HDFS-8322
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, 
 HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, 
 HDFS-8322.004.patch


 Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs 
 -ls}}.
 This JIRA proposes displaying a warning message if hadoop fs -ls is showing 
 the local filesystem or using default fs.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits


 [ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7609:

Labels:   (was: BB2015-05-RFC)

 Avoid retry cache collision when Standby NameNode loading edits
 ---

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
Assignee: Ming Ma
Priority: Critical
 Fix For: 2.8.0

 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, 
 HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering


[ 
https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565447#comment-14565447
 ] 

Jing Zhao commented on HDFS-8481:
-

Yes, to have this kind of accumulation will be great. But looks to me this will 
be mainly a performance optimization. Not reusing user buffer may cause more 
serious issue as Walter described and should be more critical to fix.

 Erasure coding: remove workarounds in client side stripped blocks recovering
 

 Key: HDFS-8481
 URL: https://issues.apache.org/jira/browse/HDFS-8481
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8481-HDFS-7285.00.patch, 
 HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, 
 HDFS-8481-HDFS-7285.03.patch


 After HADOOP-11847 and related fixes, we should be able to properly calculate 
 decoded contents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8463) Calling DFSInputStream.seekToNewSource just after stream creation causes NullPointerException

2015-05-29 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565454#comment-14565454
 ] 

Kihwal Lee commented on HDFS-8463:
--

It might be better to simply call {{blockSeekTo(targetPos)}} and return true, 
if {{currentNode}} is null.

 Calling DFSInputStream.seekToNewSource just after stream creation causes  
 NullPointerException
 --

 Key: HDFS-8463
 URL: https://issues.apache.org/jira/browse/HDFS-8463
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: HDFS-8463.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering