[jira] [Updated] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-14 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8838:

Attachment: HDFS-8838-HDFS-7285-20150809-test.patch

A test patch HDFS-8838-HDFS-7285-20150809-test.patch to trigger Jenkins to test 
if HDFS-8896 works.

 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: HDFS-8838-HDFS-7285-000.patch, 
 HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, 
 h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, 
 h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-08-14 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696880#comment-14696880
 ] 

Vinayakumar B commented on HDFS-7285:
-

I have tried to rebase current {{HDFS-7285}} branch against the current trunk 
using {{git rebase trunk}}. It was not smooth as expected. Since I did not 
wanted to push the rebase directly onto {{HDFS-7285}}, created one more branch 
{{HDFS-7285-REBASE}}. This branch is just for reference purpose.

The advantage of this is, it retained all the commits along with message,date 
and author details, even after resolving conflicts. I skipped one commit 
purposefully HDFS-8787 to be in sync with trunk. it was just rename of files. 
other than this, no other commits got squashed.

There were 192 commits to be rebased against trunk, including the intermediate 
merge conflict resolved commits. Since I couldnt edit each and every commit to 
resolve compilation errors after each commit, resolved remaining compilation 
errors at the end, with one more commit.

If anyone wants to verify, please checkout the branch HDFS-7285-REBASE. and can 
compare against the Consolidated patch.

Since this is only for trying to check the possibility of rebase, I am not 
saying this should be considered as final branch.
If everyone feels good to go like this approach, I could do some more detailed 
rebase next week, ( may be verify the compilation after each commit ? Not sure 
whether its possible to stop for each commit rebase)

-Thanks

 Erasure Coding Support inside HDFS
 --

 Key: HDFS-7285
 URL: https://issues.apache.org/jira/browse/HDFS-7285
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Weihua Jiang
Assignee: Zhe Zhang
 Attachments: Consolidated-20150707.patch, 
 Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, 
 ECParser.py, HDFS-7285-initial-PoC.patch, 
 HDFS-7285-merge-consolidated-01.patch, 
 HDFS-7285-merge-consolidated-trunk-01.patch, 
 HDFS-7285-merge-consolidated.trunk.03.patch, 
 HDFS-7285-merge-consolidated.trunk.04.patch, 
 HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, 
 HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, 
 HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, 
 HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, 
 fsimage-analysis-20150105.pdf


 Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
 of data reliability, comparing to the existing HDFS 3-replica approach. For 
 example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
 with storage overhead only being 40%. This makes EC a quite attractive 
 alternative for big data storage, particularly for cold data. 
 Facebook had a related open source project called HDFS-RAID. It used to be 
 one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
 on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
 cold files that are intended not to be appended anymore; 3) the pure Java EC 
 coding implementation is extremely slow in practical use. Due to these, it 
 might not be a good idea to just bring HDFS-RAID back.
 We (Intel and Cloudera) are working on a design to build EC into HDFS that 
 gets rid of any external dependencies, makes it self-contained and 
 independently maintained. This design lays the EC feature on the storage type 
 support and considers compatible with existing HDFS features like caching, 
 snapshot, encryption, high availability and etc. This design will also 
 support different EC coding schemes, implementations and policies for 
 different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
 ISA-L library), an implementation can greatly improve the performance of EC 
 encoding/decoding and makes the EC solution even more attractive. We will 
 post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696547#comment-14696547
 ] 

Hudson commented on HDFS-7213:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8298 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8298/])
HDFS-7213. processIncrementalBlockReport performance degradation. Contributed 
by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 processIncrementalBlockReport performance degradation
 -

 Key: HDFS-7213
 URL: https://issues.apache.org/jira/browse/HDFS-7213
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Daryn Sharp
Assignee: Eric Payne
Priority: Critical
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt


 {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
 missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
 with the increase in incremental block reports from receiving blocks, under 
 heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696548#comment-14696548
 ] 

Hudson commented on HDFS-7235:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8298 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8298/])
HDFS-7235. DataNode#transferBlock should report blocks that don't exist using 
reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev 
f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 DataNode#transferBlock should report blocks that don't exist using 
 reportBadBlock
 -

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
 HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696558#comment-14696558
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8299 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8299/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (vinayakumarb: rev 
fa2641143c0d74c4fef122d79f27791e15d3b43f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Snapshot read can reveal future bytes for appended files.
 -

 Key: HDFS-7263
 URL: https://issues.apache.org/jira/browse/HDFS-7263
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
 TestSnapshotRead.java


 The following sequence of steps will produce extra bytes, that should not be 
 visible, because they are not in the snapshot.
 * Create a file of size L, where {{L % blockSize != 0}}.
 * Create a snapshot
 * Append bytes to the file
 * Read file in the snapshot (not the current file)
 * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696524#comment-14696524
 ] 

Vinayakumar B commented on HDFS-7213:
-

Cherry-picked to 2.6.1.

 processIncrementalBlockReport performance degradation
 -

 Key: HDFS-7213
 URL: https://issues.apache.org/jira/browse/HDFS-7213
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Daryn Sharp
Assignee: Eric Payne
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.6.1

 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt


 {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
 missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
 with the increase in incremental block reports from receiving blocks, under 
 heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7213:

Fix Version/s: (was: 2.7.0)
   2.6.1

 processIncrementalBlockReport performance degradation
 -

 Key: HDFS-7213
 URL: https://issues.apache.org/jira/browse/HDFS-7213
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Daryn Sharp
Assignee: Eric Payne
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.6.1

 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt


 {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
 missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
 with the increase in incremental block reports from receiving blocks, under 
 heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7213:

Fix Version/s: 2.7.0

 processIncrementalBlockReport performance degradation
 -

 Key: HDFS-7213
 URL: https://issues.apache.org/jira/browse/HDFS-7213
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Daryn Sharp
Assignee: Eric Payne
Priority: Critical
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt


 {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
 missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
 with the increase in incremental block reports from receiving blocks, under 
 heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7235:

Labels:   (was: 2.6.1-candidate)

 DataNode#transferBlock should report blocks that don't exist using 
 reportBadBlock
 -

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
 HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7235:

Fix Version/s: 2.6.1

 DataNode#transferBlock should report blocks that don't exist using 
 reportBadBlock
 -

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
 HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696535#comment-14696535
 ] 

Vinayakumar B commented on HDFS-7235:
-

Cherry-picked to 2.6.1

 DataNode#transferBlock should report blocks that don't exist using 
 reportBadBlock
 -

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
 HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7263:

Labels:   (was: 2.6.1-candidate)

 Snapshot read can reveal future bytes for appended files.
 -

 Key: HDFS-7263
 URL: https://issues.apache.org/jira/browse/HDFS-7263
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
 TestSnapshotRead.java


 The following sequence of steps will produce extra bytes, that should not be 
 visible, because they are not in the snapshot.
 * Create a file of size L, where {{L % blockSize != 0}}.
 * Create a snapshot
 * Append bytes to the file
 * Read file in the snapshot (not the current file)
 * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696550#comment-14696550
 ] 

Vinayakumar B commented on HDFS-7263:
-

Cherry-picked to 2.6.1

 Snapshot read can reveal future bytes for appended files.
 -

 Key: HDFS-7263
 URL: https://issues.apache.org/jira/browse/HDFS-7263
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
  Labels: 2.6.1-candidate
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
 TestSnapshotRead.java


 The following sequence of steps will produce extra bytes, that should not be 
 visible, because they are not in the snapshot.
 * Create a file of size L, where {{L % blockSize != 0}}.
 * Create a snapshot
 * Append bytes to the file
 * Read file in the snapshot (not the current file)
 * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7263:

Fix Version/s: 2.6.1

 Snapshot read can reveal future bytes for appended files.
 -

 Key: HDFS-7263
 URL: https://issues.apache.org/jira/browse/HDFS-7263
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
  Labels: 2.6.1-candidate
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
 TestSnapshotRead.java


 The following sequence of steps will produce extra bytes, that should not be 
 visible, because they are not in the snapshot.
 * Create a file of size L, where {{L % blockSize != 0}}.
 * Create a snapshot
 * Append bytes to the file
 * Read file in the snapshot (not the current file)
 * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7225:

Labels:   (was: 2.6.1-candidate)

 Remove stale block invalidation work when DN re-registers with different UUID
 -

 Key: HDFS-7225
 URL: https://issues.apache.org/jira/browse/HDFS-7225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
 HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch


 {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
 {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
 {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
 {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
 which will use it to lookup in a {{TreeMap}}. Since the key type is 
 {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
 will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7225:

Fix Version/s: 2.6.1

 Remove stale block invalidation work when DN re-registers with different UUID
 -

 Key: HDFS-7225
 URL: https://issues.apache.org/jira/browse/HDFS-7225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
 HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch


 {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
 {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
 {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
 {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
 which will use it to lookup in a {{TreeMap}}. Since the key type is 
 {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
 will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696620#comment-14696620
 ] 

Vinayakumar B commented on HDFS-7225:
-

Cherry-picked to 2.6.1

 Remove stale block invalidation work when DN re-registers with different UUID
 -

 Key: HDFS-7225
 URL: https://issues.apache.org/jira/browse/HDFS-7225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
 HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch


 {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
 {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
 {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
 {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
 which will use it to lookup in a {{TreeMap}}. Since the key type is 
 {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
 will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696624#comment-14696624
 ] 

Hudson commented on HDFS-7225:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8302 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8302/])
HDFS-7225. Remove stale block invalidation work when DN re-registers with 
different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 
08bd4edf4092901273da0d73a5cc760fdc11052b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Remove stale block invalidation work when DN re-registers with different UUID
 -

 Key: HDFS-7225
 URL: https://issues.apache.org/jira/browse/HDFS-7225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
 HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch


 {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
 {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
 {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
 {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
 which will use it to lookup in a {{TreeMap}}. Since the key type is 
 {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
 will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696642#comment-14696642
 ] 

Hadoop QA commented on HDFS-8891:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 42s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  1s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 33s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  3s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 177m  6s | Tests failed in hadoop-hdfs. |
| | | 219m 23s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestRbwSpaceReservation |
| Timed out tests | 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer |
|   | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750423/HDFS-8891.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0a03054 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11994/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11994/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11994/console |


This message was automatically generated.

 HDFS concat should keep srcs order
 --

 Key: HDFS-8891
 URL: https://issues.apache.org/jira/browse/HDFS-8891
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yong Zhang
Assignee: Yong Zhang
 Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch


 FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
 order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-08-14 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696639#comment-14696639
 ] 

Yi Liu commented on HDFS-8859:
--

The two test failures are not related.

 Improve DataNode ReplicaMap memory footprint to save about 45%
 --

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch, HDFS-8859.004.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4 a reference to next element in ReplicaInfo
 {noformat}
 Total:  +4 bytes
 So totally we can save 40bytes for each block replica 
 And currently one finalized replica needs around 46 bytes (notice: we ignore 
 memory alignment here).
 We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
 in DataNode.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7213:

Labels:   (was: 2.6.1-candidate)

 processIncrementalBlockReport performance degradation
 -

 Key: HDFS-7213
 URL: https://issues.apache.org/jira/browse/HDFS-7213
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Daryn Sharp
Assignee: Eric Payne
Priority: Critical
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt


 {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
 missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
 with the increase in incremental block reports from receiving blocks, under 
 heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696635#comment-14696635
 ] 

Hadoop QA commented on HDFS-8859:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 56s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 45s | The applied patch generated  6 
new checkstyle issues (total was 12, now 16). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 22s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 173m 14s | Tests failed in hadoop-hdfs. |
| | | 240m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.net.TestNetUtils |
|   | hadoop.ha.TestZKFailoverController |
| Timed out tests | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750254/HDFS-8859.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0a03054 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11992/console |


This message was automatically generated.

 Improve DataNode ReplicaMap memory footprint to save about 45%
 --

 Key: HDFS-8859
 URL: https://issues.apache.org/jira/browse/HDFS-8859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
 HDFS-8859.003.patch, HDFS-8859.004.patch


 By using following approach we can save about *45%* memory footprint for each 
 block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
 DataNode), the details are:
 In ReplicaMap, 
 {code}
 private final MapString, MapLong, ReplicaInfo map =
 new HashMapString, MapLong, ReplicaInfo();
 {code}
 Currently we use a HashMap {{MapLong, ReplicaInfo}} to store the replicas 
 in memory.  The key is block id of the block replica which is already 
 included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
 has a object overhead.  We can implement a lightweight Set which is  similar 
 to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
 size for the entries array, usually it's a big value, an example is 
 {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
 should be able to get Element through key.
 Following is comparison of memory footprint If we implement a lightweight set 
 as described:
 We can save:
 {noformat}
 SIZE (bytes)   ITEM
 20The Key: Long (12 bytes object overhead + 8 
 bytes long)
 12HashMap Entry object overhead
 4  reference to the key in Entry
 4  reference to the value in Entry
 4  hash in Entry
 {noformat}
 Total:  -44 bytes
 We need to add:
 {noformat}
 SIZE (bytes)   ITEM
 4  

[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697031#comment-14697031
 ] 

Hudson commented on HDFS-8270:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-8270. create() always retried with hardcoded timeout when file already 
exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 
84bf71295a5e52b2a7bb69440a885a25bc75f544)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 create() always retried with hardcoded timeout when file already exists with 
 open lease
 ---

 Key: HDFS-8270
 URL: https://issues.apache.org/jira/browse/HDFS-8270
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Andrey Stepachev
Assignee: J.Andreina
 Fix For: 2.6.1, 2.7.1

 Attachments: HDFS-8270-branch-2.6-v3.patch, 
 HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
 HDFS-8270.3.patch


 In Hbase we stumbled on unexpected behaviour, which could 
 break things. 
 HDFS-6478 fixed wrong exception
 translation, but that apparently led to unexpected bahaviour:
 clients trying to create file without override=true will be forced
 to retry hardcoded amount of time (60 seconds).
 That could break or slowdown systems, that use filesystem
 for locks (like hbase fsck did, and we got it broken HBASE-13574).
 We should make this behaviour configurable, do client really need
 to wait lease timeout to be sure that file doesn't exists, or it it should
 be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697033#comment-14697033
 ] 

Hudson commented on HDFS-7235:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-7235. DataNode#transferBlock should report blocks that don't exist using 
reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev 
f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 DataNode#transferBlock should report blocks that don't exist using 
 reportBadBlock
 -

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
 HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697034#comment-14697034
 ] 

Hudson commented on HDFS-7213:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-7213. processIncrementalBlockReport performance degradation. Contributed 
by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 processIncrementalBlockReport performance degradation
 -

 Key: HDFS-7213
 URL: https://issues.apache.org/jira/browse/HDFS-7213
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Daryn Sharp
Assignee: Eric Payne
Priority: Critical
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt


 {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
 missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
 with the increase in incremental block reports from receiving blocks, under 
 heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697036#comment-14697036
 ] 

Hudson commented on HDFS-7225:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-7225. Remove stale block invalidation work when DN re-registers with 
different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 
08bd4edf4092901273da0d73a5cc760fdc11052b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Remove stale block invalidation work when DN re-registers with different UUID
 -

 Key: HDFS-7225
 URL: https://issues.apache.org/jira/browse/HDFS-7225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
 HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch


 {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
 {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
 {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
 {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
 which will use it to lookup in a {{TreeMap}}. Since the key type is 
 {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
 will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697037#comment-14697037
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (vinayakumarb: rev 
fa2641143c0d74c4fef122d79f27791e15d3b43f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Snapshot read can reveal future bytes for appended files.
 -

 Key: HDFS-7263
 URL: https://issues.apache.org/jira/browse/HDFS-7263
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
 TestSnapshotRead.java


 The following sequence of steps will produce extra bytes, that should not be 
 visible, because they are not in the snapshot.
 * Create a file of size L, where {{L % blockSize != 0}}.
 * Create a snapshot
 * Append bytes to the file
 * Read file in the snapshot (not the current file)
 * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697057#comment-14697057
 ] 

Hudson commented on HDFS-7213:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-7213. processIncrementalBlockReport performance degradation. Contributed 
by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 processIncrementalBlockReport performance degradation
 -

 Key: HDFS-7213
 URL: https://issues.apache.org/jira/browse/HDFS-7213
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Daryn Sharp
Assignee: Eric Payne
Priority: Critical
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt


 {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
 missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
 with the increase in incremental block reports from receiving blocks, under 
 heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697059#comment-14697059
 ] 

Hudson commented on HDFS-7225:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-7225. Remove stale block invalidation work when DN re-registers with 
different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 
08bd4edf4092901273da0d73a5cc760fdc11052b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Remove stale block invalidation work when DN re-registers with different UUID
 -

 Key: HDFS-7225
 URL: https://issues.apache.org/jira/browse/HDFS-7225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
 HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch


 {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
 {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
 {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
 {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
 which will use it to lookup in a {{TreeMap}}. Since the key type is 
 {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
 will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697065#comment-14697065
 ] 

Hudson commented on HDFS-7649:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. 
(Contributed by Brahma Reddy Battula) (arp: rev 
ae57d60d8239916312bca7149e2285b2ed3b123a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md


 Multihoming docs should emphasize using hostnames in configurations
 ---

 Key: HDFS-7649
 URL: https://issues.apache.org/jira/browse/HDFS-7649
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Arpit Agarwal
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: HDFS-7649.patch


 The docs should emphasize that master and slave configurations should 
 hostnames wherever possible.
 Link to current docs: 
 https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697042#comment-14697042
 ] 

Hudson commented on HDFS-7649:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/])
HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. 
(Contributed by Brahma Reddy Battula) (arp: rev 
ae57d60d8239916312bca7149e2285b2ed3b123a)
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Multihoming docs should emphasize using hostnames in configurations
 ---

 Key: HDFS-7649
 URL: https://issues.apache.org/jira/browse/HDFS-7649
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Arpit Agarwal
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: HDFS-7649.patch


 The docs should emphasize that master and slave configurations should 
 hostnames wherever possible.
 Link to current docs: 
 https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697060#comment-14697060
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (vinayakumarb: rev 
fa2641143c0d74c4fef122d79f27791e15d3b43f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Snapshot read can reveal future bytes for appended files.
 -

 Key: HDFS-7263
 URL: https://issues.apache.org/jira/browse/HDFS-7263
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
 TestSnapshotRead.java


 The following sequence of steps will produce extra bytes, that should not be 
 visible, because they are not in the snapshot.
 * Create a file of size L, where {{L % blockSize != 0}}.
 * Create a snapshot
 * Append bytes to the file
 * Read file in the snapshot (not the current file)
 * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697054#comment-14697054
 ] 

Hudson commented on HDFS-8270:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-8270. create() always retried with hardcoded timeout when file already 
exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 
84bf71295a5e52b2a7bb69440a885a25bc75f544)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 create() always retried with hardcoded timeout when file already exists with 
 open lease
 ---

 Key: HDFS-8270
 URL: https://issues.apache.org/jira/browse/HDFS-8270
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Andrey Stepachev
Assignee: J.Andreina
 Fix For: 2.6.1, 2.7.1

 Attachments: HDFS-8270-branch-2.6-v3.patch, 
 HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
 HDFS-8270.3.patch


 In Hbase we stumbled on unexpected behaviour, which could 
 break things. 
 HDFS-6478 fixed wrong exception
 translation, but that apparently led to unexpected bahaviour:
 clients trying to create file without override=true will be forced
 to retry hardcoded amount of time (60 seconds).
 That could break or slowdown systems, that use filesystem
 for locks (like hbase fsck did, and we got it broken HBASE-13574).
 We should make this behaviour configurable, do client really need
 to wait lease timeout to be sure that file doesn't exists, or it it should
 be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697056#comment-14697056
 ] 

Hudson commented on HDFS-7235:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/])
HDFS-7235. DataNode#transferBlock should report blocks that don't exist using 
reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev 
f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 DataNode#transferBlock should report blocks that don't exist using 
 reportBadBlock
 -

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
 HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8093) BP does not exist or is not under Constructionnull

2015-08-14 Thread Felix Borchers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696757#comment-14696757
 ] 

Felix Borchers commented on HDFS-8093:
--

I have a very similar problem while running the balancer.
{{hdfs fsck /}} returned HEALTHY and the block, causing the balancer to throw 
an exception is not in the HDFS anymore.
{{hdfs fsck / -files -blocks | grep blk_1074256920_516292}} - returned nothing

Digging in the logs of the DataNode shows, that the block was deleted on the 
node.  (see below for log file excerpt)
Digging in the logs of the NameNode shows, something like block  does not 
belong to any file (see below for log file excerpt)

It seems, there is a problem with removed/deleted blocks ?!

DataNode Logs
=
only lines with: blk_1074256920_516292 displayed:
{code}
2015-08-14 00:30:03,893 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving BP-322804774-10.13.54.1-1412684451669:blk_1074256920_516292 src: 
/10.13.53.16:37605 dest: /10.13.53.19:50010
2015-08-14 00:30:07,841 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Scheduling blk_1074256920_516292 file 
/data/is24/hadoop/1/dfs/dataNode/current/BP-322804774-10.13.54.1-1412684451669/current/rbw/blk_1074256920
 for deletion
2015-08-14 00:30:09,092 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
 Deleted BP-322804774-10.13.54.1-1412684451669 blk_1074256920_516292 file 
/data/is24/hadoop/1/dfs/dataNode/current/BP-322804774-10.13.54.1-1412684451669/current/rbw/blk_1074256920
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append 
to a non-existent replica 
BP-322804774-10.13.54.1-1412684451669:blk_1074256920_516292
2015-08-14 00:46:44,916 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder: BP-322804774-10.13.54.1-1412684451669:blk_1074256920_516292, 
type=LAST_IN_PIPELINE, downstreams=0:[]
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append 
to a non-existent replica 
BP-322804774-10.13.54.1-1412684451669:blk_1074256920_516292
2015-08-14 00:46:44,916 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder: BP-322804774-10.13.54.1-1412684451669:blk_1074256920_516292, 
type=LAST_IN_PIPELINE, downstreams=0:[] terminating
{code}

NameNode Logs
=

only lines with: blk_1074256920_516292 displayed:

{code}
2015-08-14 00:30:03,843 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: /system/balancer.id. BP-322804774-10.13.54.1-1412684451669 
blk_1074256920_516292{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-4db312aa-bc23-47dc-b768-52a2d72b09d3:NORMAL:10.13.53.30:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-c7db1b58-8e25-435f-8af8-08b6754c021c:NORMAL:10.13.53.16:50010|RBW],
 
ReplicaUnderConstruction[[DISK]DS-4457ae11-7684-4187-b4ad-56466d79fba2:NORMAL:10.13.53.19:50010|RBW]]}
2015-08-14 00:30:04,000 INFO BlockStateChange: BLOCK* addBlock: c 
blk_1074256920_516292 on node 10.13.53.16:50010 size 134217728 does not belong 
to any file
2015-08-14 00:30:04,000 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_1074256920_516292 to 10.13.53.16:50010
2015-08-14 00:30:04,000 INFO BlockStateChange: BLOCK* BlockManager: ask 
10.13.53.16:50010 to delete [blk_1074256920_516292]
2015-08-14 00:30:04,840 INFO BlockStateChange: BLOCK* addBlock: block 
blk_1074256920_516292 on node 10.13.53.19:50010 size 134217728 does not belong 
to any file
2015-08-14 00:30:04,840 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_1074256920_516292 to 10.13.53.19:50010
2015-08-14 00:30:05,925 INFO BlockStateChange: BLOCK* addBlock: block 
blk_1074256920_516292 on node 10.13.53.30:50010 size 134217728 does not belong 
to any file
2015-08-14 00:30:05,925 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_1074256920_516292 to 10.13.53.30:50010
2015-08-14 00:30:07,000 INFO BlockStateChange: BLOCK* BlockManager: ask 
10.13.53.19:50010 to delete [blk_1074208004_467362, blk_1074224392_483753, 
blk_1074093070_352362, blk_1074240530_499900, blk_1074256920_516292, 
blk_1074224154_483515, blk_1074240554_499924, blk_1074240556_499926, 
blk_1074240561_499931, blk_1074224178_483539, blk_1074240563_499933, 
blk_1074207795_467153, blk_1074093108_352429, blk_1074207797_467155, 
blk_1073798197_57374, blk_1074224182_483543, blk_1074240569_499939, 
blk_1074207802_467160, blk_1074224187_483548, blk_1074224188_483549, 
blk_1074207805_467163, blk_1074158653_418001, blk_1074207806_467164, 
blk_1074224191_483552, blk_1074207809_467167, blk_1074207817_467175, 
blk_1074207818_467176, blk_1074207820_467178, blk_1074207822_467180, 
blk_1074207830_467188, blk_1074224216_483577, blk_1074224217_483578, 
blk_1073798237_57414, blk_1073929310_188502, blk_1074207843_467201, 
blk_1073847400_106577, blk_1074207852_467210, 

[jira] [Commented] (HDFS-7116) Add a command to get the bandwidth of balancer

2015-08-14 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696698#comment-14696698
 ] 

Rakesh R commented on HDFS-7116:


Hi All,

In the proposed patch, Datanode is sending {{balancerBandwidth}} value to the 
Namenode through heartbeats. As we know this is done for consistency discussed 
earlier in this jira. On a second look, I have another idea which will have 
less overhead. 

- We already have a set of datanode metrics exposed which can be used by 
admins/monitoring tools. How about exposing {{balancerBandwidth}} value as a 
Datanode metric? Here, admin/monitoring tool has to individually collect the 
metrics from every Datanode.

 Add a command to get the bandwidth of balancer
 --

 Key: HDFS-7116
 URL: https://issues.apache.org/jira/browse/HDFS-7116
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: balancer  mover
Reporter: Akira AJISAKA
Assignee: Rakesh R
 Attachments: HDFS-7116-00.patch, HDFS-7116-01.patch


 Now reading logs is the only way to check how the balancer bandwidth is set. 
 It would be useful for administrators if they can get the parameter via CLI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks

2015-08-14 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated HDFS-8850:
--
Labels:   (was: 2.6.1-candidate)

Removing the 2.6.1-candidate label, as this is an issue for a class that exists 
only in 2.7 and as such I don't think it applies to 2.6. Let me know if you 
think this issue exists in 2.6.

 VolumeScanner thread exits with exception if there is no block pool to be 
 scanned but there are suspicious blocks
 -

 Key: HDFS-8850
 URL: https://issues.apache.org/jira/browse/HDFS-8850
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.8.0

 Attachments: HDFS-8850.001.patch


 The VolumeScanner threads inside the BlockScanner exit with an exception if 
 there is no block pool to be scanned but there are suspicious blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order

2015-08-14 Thread Yong Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696691#comment-14696691
 ] 

Yong Zhang commented on HDFS-8891:
--

Failed test cases are not related to this patch.

 HDFS concat should keep srcs order
 --

 Key: HDFS-8891
 URL: https://issues.apache.org/jira/browse/HDFS-8891
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yong Zhang
Assignee: Yong Zhang
 Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch


 FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
 order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7116) Add a command to get the bandwidth of balancer

2015-08-14 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696692#comment-14696692
 ] 

Rakesh R commented on HDFS-7116:


Hi All,


 Add a command to get the bandwidth of balancer
 --

 Key: HDFS-7116
 URL: https://issues.apache.org/jira/browse/HDFS-7116
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: balancer  mover
Reporter: Akira AJISAKA
Assignee: Rakesh R
 Attachments: HDFS-7116-00.patch, HDFS-7116-01.patch


 Now reading logs is the only way to check how the balancer bandwidth is set. 
 It would be useful for administrators if they can get the parameter via CLI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697117#comment-14697117
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (vinayakumarb: rev 
fa2641143c0d74c4fef122d79f27791e15d3b43f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Snapshot read can reveal future bytes for appended files.
 -

 Key: HDFS-7263
 URL: https://issues.apache.org/jira/browse/HDFS-7263
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
 TestSnapshotRead.java


 The following sequence of steps will produce extra bytes, that should not be 
 visible, because they are not in the snapshot.
 * Create a file of size L, where {{L % blockSize != 0}}.
 * Create a snapshot
 * Append bytes to the file
 * Read file in the snapshot (not the current file)
 * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697113#comment-14697113
 ] 

Hudson commented on HDFS-7235:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-7235. DataNode#transferBlock should report blocks that don't exist using 
reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev 
f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 DataNode#transferBlock should report blocks that don't exist using 
 reportBadBlock
 -

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
 HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697116#comment-14697116
 ] 

Hudson commented on HDFS-7225:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-7225. Remove stale block invalidation work when DN re-registers with 
different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 
08bd4edf4092901273da0d73a5cc760fdc11052b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Remove stale block invalidation work when DN re-registers with different UUID
 -

 Key: HDFS-7225
 URL: https://issues.apache.org/jira/browse/HDFS-7225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
 HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch


 {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
 {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
 {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
 {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
 which will use it to lookup in a {{TreeMap}}. Since the key type is 
 {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
 will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697114#comment-14697114
 ] 

Hudson commented on HDFS-7213:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-7213. processIncrementalBlockReport performance degradation. Contributed 
by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 processIncrementalBlockReport performance degradation
 -

 Key: HDFS-7213
 URL: https://issues.apache.org/jira/browse/HDFS-7213
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Daryn Sharp
Assignee: Eric Payne
Priority: Critical
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt


 {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
 missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
 with the increase in incremental block reports from receiving blocks, under 
 heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697122#comment-14697122
 ] 

Hudson commented on HDFS-7649:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. 
(Contributed by Brahma Reddy Battula) (arp: rev 
ae57d60d8239916312bca7149e2285b2ed3b123a)
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Multihoming docs should emphasize using hostnames in configurations
 ---

 Key: HDFS-7649
 URL: https://issues.apache.org/jira/browse/HDFS-7649
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Arpit Agarwal
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: HDFS-7649.patch


 The docs should emphasize that master and slave configurations should 
 hostnames wherever possible.
 Link to current docs: 
 https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697111#comment-14697111
 ] 

Hudson commented on HDFS-8270:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/])
HDFS-8270. create() always retried with hardcoded timeout when file already 
exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 
84bf71295a5e52b2a7bb69440a885a25bc75f544)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 create() always retried with hardcoded timeout when file already exists with 
 open lease
 ---

 Key: HDFS-8270
 URL: https://issues.apache.org/jira/browse/HDFS-8270
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Andrey Stepachev
Assignee: J.Andreina
 Fix For: 2.6.1, 2.7.1

 Attachments: HDFS-8270-branch-2.6-v3.patch, 
 HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
 HDFS-8270.3.patch


 In Hbase we stumbled on unexpected behaviour, which could 
 break things. 
 HDFS-6478 fixed wrong exception
 translation, but that apparently led to unexpected bahaviour:
 clients trying to create file without override=true will be forced
 to retry hardcoded amount of time (60 seconds).
 That could break or slowdown systems, that use filesystem
 for locks (like hbase fsck did, and we got it broken HBASE-13574).
 We should make this behaviour configurable, do client really need
 to wait lease timeout to be sure that file doesn't exists, or it it should
 be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7225) Remove stale block invalidation work when DN re-registers with different UUID

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697138#comment-14697138
 ] 

Hudson commented on HDFS-7225:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-7225. Remove stale block invalidation work when DN re-registers with 
different UUID. (Zhe Zhang and Andrew Wang) (vinayakumarb: rev 
08bd4edf4092901273da0d73a5cc760fdc11052b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Remove stale block invalidation work when DN re-registers with different UUID
 -

 Key: HDFS-7225
 URL: https://issues.apache.org/jira/browse/HDFS-7225
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7225-v1.patch, HDFS-7225-v2.patch, 
 HDFS-7225-v3.patch, HDFS-7225.004.patch, HDFS-7225.005.patch


 {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the 
 {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to 
 {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated 
 {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} 
 which will use it to lookup in a {{TreeMap}}. Since the key type is 
 {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key 
 will crash the NameNode with an NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists with open lease

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697133#comment-14697133
 ] 

Hudson commented on HDFS-8270:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-8270. create() always retried with hardcoded timeout when file already 
exists with open lease (Contributed by J.Andreina) (vinayakumarb: rev 
84bf71295a5e52b2a7bb69440a885a25bc75f544)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 create() always retried with hardcoded timeout when file already exists with 
 open lease
 ---

 Key: HDFS-8270
 URL: https://issues.apache.org/jira/browse/HDFS-8270
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Andrey Stepachev
Assignee: J.Andreina
 Fix For: 2.6.1, 2.7.1

 Attachments: HDFS-8270-branch-2.6-v3.patch, 
 HDFS-8270-branch-2.7-03.patch, HDFS-8270.1.patch, HDFS-8270.2.patch, 
 HDFS-8270.3.patch


 In Hbase we stumbled on unexpected behaviour, which could 
 break things. 
 HDFS-6478 fixed wrong exception
 translation, but that apparently led to unexpected bahaviour:
 clients trying to create file without override=true will be forced
 to retry hardcoded amount of time (60 seconds).
 That could break or slowdown systems, that use filesystem
 for locks (like hbase fsck did, and we got it broken HBASE-13574).
 We should make this behaviour configurable, do client really need
 to wait lease timeout to be sure that file doesn't exists, or it it should
 be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697135#comment-14697135
 ] 

Hudson commented on HDFS-7235:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-7235. DataNode#transferBlock should report blocks that don't exist using 
reportBadBlock (yzhang via cmccabe) (vinayakumarb: rev 
f2b4bc9b6a1bd3f9dbfc4e85c1b9bde238da3627)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 DataNode#transferBlock should report blocks that don't exist using 
 reportBadBlock
 -

 Key: HDFS-7235
 URL: https://issues.apache.org/jira/browse/HDFS-7235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, 
 HDFS-7235.003.patch, HDFS-7235.004.patch, HDFS-7235.005.patch, 
 HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch


 When to decommission a DN, the process hangs. 
 What happens is, when NN chooses a replica as a source to replicate data on 
 the to-be-decommissioned DN to other DNs, it favors choosing this DN 
 to-be-decommissioned as the source of transfer (see BlockManager.java).  
 However, because of the bad disk, the DN would detect the source block to be 
 transfered as invalidBlock with the following logic in FsDatasetImpl.java:
 {code}
 /** Does the block exist and have the given state? */
   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
 final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
 b.getLocalBlock());
 return replicaInfo != null
  replicaInfo.getState() == state
  replicaInfo.getBlockFile().exists();
   }
 {code}
 The reason that this method returns false (detecting invalid block) is 
 because the block file doesn't exist due to bad disk in this case. 
 The key issue we found here is, after DN detects an invalid block for the 
 above reason, it doesn't report the invalid block back to NN, thus NN doesn't 
 know that the block is corrupted, and keeps sending the data transfer request 
 to the same DN to be decommissioned, again and again. This caused an infinite 
 loop, so the decommission process hangs.
 Thanks [~qwertymaniac] for reporting the issue and initial analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7213) processIncrementalBlockReport performance degradation

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697136#comment-14697136
 ] 

Hudson commented on HDFS-7213:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-7213. processIncrementalBlockReport performance degradation. Contributed 
by Eric Payne. (vinayakumarb: rev d25cb8fe12d00faf3e8f3bfd23fd1b01981a340f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 processIncrementalBlockReport performance degradation
 -

 Key: HDFS-7213
 URL: https://issues.apache.org/jira/browse/HDFS-7213
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Daryn Sharp
Assignee: Eric Payne
Priority: Critical
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7213.1412804753, HDFS-7213.1412806496.txt


 {{BlockManager#processIncrementalBlockReport}} has a debug line that is 
 missing a {{isDebugEnabled}} check.  The write lock is being held.  Coupled 
 with the increase in incremental block reports from receiving blocks, under 
 heavy load this log line noticeably degrades performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7263) Snapshot read can reveal future bytes for appended files.

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697139#comment-14697139
 ] 

Hudson commented on HDFS-7263:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-7263. Snapshot read can reveal future bytes for appended files. 
Contributed by Tao Luo. (vinayakumarb: rev 
fa2641143c0d74c4fef122d79f27791e15d3b43f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Snapshot read can reveal future bytes for appended files.
 -

 Key: HDFS-7263
 URL: https://issues.apache.org/jira/browse/HDFS-7263
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Tao Luo
 Fix For: 2.7.0, 2.6.1

 Attachments: HDFS-7263.patch, HDFS-7263.patch, HDFS-7263.patch, 
 TestSnapshotRead.java


 The following sequence of steps will produce extra bytes, that should not be 
 visible, because they are not in the snapshot.
 * Create a file of size L, where {{L % blockSize != 0}}.
 * Create a snapshot
 * Append bytes to the file
 * Read file in the snapshot (not the current file)
 * You will see the bytes are read beoynd the original file size L



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697144#comment-14697144
 ] 

Hudson commented on HDFS-7649:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/])
HDFS-7649. Multihoming docs should emphasize using hostnames in configurations. 
(Contributed by Brahma Reddy Battula) (arp: rev 
ae57d60d8239916312bca7149e2285b2ed3b123a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsMultihoming.md


 Multihoming docs should emphasize using hostnames in configurations
 ---

 Key: HDFS-7649
 URL: https://issues.apache.org/jira/browse/HDFS-7649
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Arpit Agarwal
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: HDFS-7649.patch


 The docs should emphasize that master and slave configurations should 
 hostnames wherever possible.
 Link to current docs: 
 https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8897) Loadbalancer

2015-08-14 Thread LINTE (JIRA)
LINTE created HDFS-8897:
---

 Summary: Loadbalancer 
 Key: HDFS-8897
 URL: https://issues.apache.org/jira/browse/HDFS-8897
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Affects Versions: 2.7.1
 Environment: Centos 6.6
Reporter: LINTE


When balancer is launched, it should test if there is already a 
/system/balancer.id file in HDFS.

When the file doesn't exist, the balancer don't want to run : 

15/08/14 16:35:12 INFO balancer.Balancer: namenodes  = [hdfs://sandbox/, 
hdfs://sandbox]
15/08/14 16:35:12 INFO balancer.Balancer: parameters = 
Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 
5, number of nodes to be excluded = 0, number of nodes to be included = 0]
Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
Bytes Being Moved
15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from 
NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 
30mins, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from 
NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 
30mins, 0sec
java.io.IOException: Another Balancer is running..  Exiting ...
Aug 14, 2015 4:35:14 PM  Balancing took 2.408 seconds


Looking at the audit log file when trying to run the balancer, the balancer 
create the /system/balancer.id and then delete it on exiting ... 

2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
src=/system/balancer.id dst=nullperm=null   proto=rpc
2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=create  
src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r-  
proto=rpc
2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
src=/system/balancer.id dst=nullperm=null   proto=rpc
2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
src=/system/balancer.id dst=nullperm=null   proto=rpc
2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
src=/system/balancer.id dst=nullperm=null   proto=rpc
2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true   
ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=delete  
src=/system/balancer.id dst=nullperm=null   proto=rpc

The error seems to be located in 
org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java 

The function checkAndMarkRunning return null even if the /system/balancer.id 
doesn't exist before entering this function; if it exists, then it is deleted 
and the balancer exit with the same error.




  private OutputStream checkAndMarkRunning() throws IOException {
try {
  if (fs.exists(idPath)) {
// try appending to it so that it will fail fast if another balancer is
// running.
IOUtils.closeStream(fs.append(idPath));
fs.delete(idPath, true);
  }
  final FSDataOutputStream fsout = fs.create(idPath, false);
  // mark balancer idPath to be deleted during filesystem closure
  fs.deleteOnExit(idPath);
  if (write2IdFile) {
fsout.writeBytes(InetAddress.getLocalHost().getHostName());
fsout.hflush();
  }
  return fsout;
} catch(RemoteException e) {
  if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){
return null;
  } else {
throw e;
  }
}
  }



Regards




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8897) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ...

2015-08-14 Thread LINTE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LINTE updated HDFS-8897:

Summary: Loadbalancer always exits with : java.io.IOException: Another 
Balancer is running..  Exiting ...  (was: Loadbalancer )

 Loadbalancer always exits with : java.io.IOException: Another Balancer is 
 running..  Exiting ...
 

 Key: HDFS-8897
 URL: https://issues.apache.org/jira/browse/HDFS-8897
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Affects Versions: 2.7.1
 Environment: Centos 6.6
Reporter: LINTE

 When balancer is launched, it should test if there is already a 
 /system/balancer.id file in HDFS.
 When the file doesn't exist, the balancer don't want to run : 
 15/08/14 16:35:12 INFO balancer.Balancer: namenodes  = [hdfs://sandbox/, 
 hdfs://sandbox]
 15/08/14 16:35:12 INFO balancer.Balancer: parameters = 
 Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration 
 = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
 Bytes Being Moved
 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from 
 NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 
 30mins, 0sec
 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from 
 NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 
 30mins, 0sec
 java.io.IOException: Another Balancer is running..  Exiting ...
 Aug 14, 2015 4:35:14 PM  Balancing took 2.408 seconds
 Looking at the audit log file when trying to run the balancer, the balancer 
 create the /system/balancer.id and then delete it on exiting ... 
 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true   
 ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
 src=/system/balancer.id dst=nullperm=null   proto=rpc
 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true   
 ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=create  
 src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r-  
 proto=rpc
 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true   
 ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
 src=/system/balancer.id dst=nullperm=null   proto=rpc
 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true   
 ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
 src=/system/balancer.id dst=nullperm=null   proto=rpc
 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true   
 ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=getfileinfo 
 src=/system/balancer.id dst=nullperm=null   proto=rpc
 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true   
 ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x   cmd=delete  
 src=/system/balancer.id dst=nullperm=null   proto=rpc
 The error seems to be located in 
 org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java 
 The function checkAndMarkRunning return null even if the /system/balancer.id 
 doesn't exist before entering this function; if it exists, then it is deleted 
 and the balancer exit with the same error.
 
   private OutputStream checkAndMarkRunning() throws IOException {
 try {
   if (fs.exists(idPath)) {
 // try appending to it so that it will fail fast if another balancer 
 is
 // running.
 IOUtils.closeStream(fs.append(idPath));
 fs.delete(idPath, true);
   }
   final FSDataOutputStream fsout = fs.create(idPath, false);
   // mark balancer idPath to be deleted during filesystem closure
   fs.deleteOnExit(idPath);
   if (write2IdFile) {
 fsout.writeBytes(InetAddress.getLocalHost().getHostName());
 fsout.hflush();
   }
   return fsout;
 } catch(RemoteException e) {
   
 if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){
 return null;
   } else {
 throw e;
   }
 }
   }
 
 Regards



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8896) DataNode object isn't GCed when shutdown, because it has GC root in ShutdownHookManager

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697154#comment-14697154
 ] 

Hadoop QA commented on HDFS-8896:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 12s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 23s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 25s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 174m 49s | Tests failed in hadoop-hdfs. |
| | | 242m 23s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ha.TestZKFailoverController |
|   | hadoop.net.TestNetUtils |
| Timed out tests | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750491/HDFS-8896.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 84bf712 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11995/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11995/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11995/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11995/console |


This message was automatically generated.

 DataNode object isn't GCed when shutdown, because it has GC root in 
 ShutdownHookManager
 ---

 Key: HDFS-8896
 URL: https://issues.apache.org/jira/browse/HDFS-8896
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-8896.01.patch, screenshot_1.PNG, screenshot_2.PNG


 The anonymous {{Thread}} object created in {{ShutdownHookManager}} is a GC 
 root.
 screenshot_1 shows how DN object be traced to the GC root.
 It's not a problem in production.
 It's a problem in test, especially when MiniDFSCluster starts/shutdowns many 
 DNs, which could cause {{OutOfMemoryError}}.
 screenshot_2 shows many DN objects are not GCed when run the test of 
 HDFS-8838.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697240#comment-14697240
 ] 

Hadoop QA commented on HDFS-8838:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 56s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 43s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 20s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 20s | The patch appears to introduce 5 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  21m 55s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 204m 28s | Tests failed in hadoop-hdfs. |
| | | 268m 52s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.net.TestNetUtils |
|   | hadoop.ha.TestZKFailoverController |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
|   | hadoop.hdfs.TestCrcCorruption |
|   | hadoop.hdfs.TestWriteStripedFileWithFailure |
| Timed out tests | org.apache.hadoop.cli.TestHDFSCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750492/HDFS-8838-HDFS-7285-20150809-test.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 1d37a88 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11996/console |


This message was automatically generated.

 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: HDFS-8838-HDFS-7285-000.patch, 
 HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, 
 h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, 
 h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8896) DataNode object isn't GCed when shutdown, because it has GC root in ShutdownHookManager

2015-08-14 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697316#comment-14697316
 ] 

Walter Su commented on HDFS-8896:
-

The failed tests failed before the 
patch([link|https://builds.apache.org/job/PreCommit-HDFS-Build/11989/testReport/]).
 So it's not related.

 DataNode object isn't GCed when shutdown, because it has GC root in 
 ShutdownHookManager
 ---

 Key: HDFS-8896
 URL: https://issues.apache.org/jira/browse/HDFS-8896
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-8896.01.patch, screenshot_1.PNG, screenshot_2.PNG


 The anonymous {{Thread}} object created in {{ShutdownHookManager}} is a GC 
 root.
 screenshot_1 shows how DN object be traced to the GC root.
 It's not a problem in production.
 It's a problem in test, especially when MiniDFSCluster starts/shutdowns many 
 DNs, which could cause {{OutOfMemoryError}}.
 screenshot_2 shows many DN objects are not GCed when run the test of 
 HDFS-8838.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697551#comment-14697551
 ] 

Rakesh R commented on HDFS-8220:


I've rebased previous patch on {{HDFS-7285-merge}} branch and attached the same 
here.

 Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
 doesn't satisfy BlockGroupSize
 ---

 Key: HDFS-8220
 URL: https://issues.apache.org/jira/browse/HDFS-8220
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
 HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
 HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285-merge-10.patch, 
 HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
 HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
 HDFS-8220-HDFS-7285.008.patch


 During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
 validate the available datanodes against the {{BlockGroupSize}}. Please see 
 the exception to understand more:
 {code}
 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
 DataStreamer Exception
 java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
 (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
 java.io.IOException: DataStreamer Exception: 
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   ... 1 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8898) Create API and command-line argument to get quota without need to get file and directory counts

2015-08-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697547#comment-14697547
 ] 

Jason Lowe commented on HDFS-8898:
--

This would solve a significant annoyance with computing quotas on a shared 
tree.  However I think it has security implications.  If one can get the quota 
totals for the entire tree then they can calculate what must be used by the 
parts they cannot access via quota_usage - usage_visible.  If what is being 
stored in the restricted area is sensitive (e.g.: records related to 
financials) then knowing how many files or the size of the restricted data 
could leak sensitive information.

 Create API and command-line argument to get quota without need to get file 
 and directory counts
 ---

 Key: HDFS-8898
 URL: https://issues.apache.org/jira/browse/HDFS-8898
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs
Reporter: Joep Rottinghuis

 On large directory structures it takes significant time to iterate through 
 the file and directory counts recursively to get a complete ContentSummary.
 When you want to just check for the quota on a higher level directory it 
 would be good to have an option to skip the file and directory counts.
 Moreover, currently one can only check the quota if you have access to all 
 the directories underneath. For example, if I have a large home directory 
 under /user/joep and I host some files for another user in a sub-directory, 
 the moment they create an unreadable sub-directory under my home I can no 
 longer check what my quota is. Understood that I cannot check the current 
 file counts unless I can iterate through all the usage, but for 
 administrative purposes it is nice to be able to get the current quota 
 setting on a directory without the need to iterate through and run into 
 permission issues on sub-directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8565) Typo in dfshealth.html - Decomissioning

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697646#comment-14697646
 ] 

Hudson commented on HDFS-8565:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8307 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8307/])
HDFS-8565. Typo in dfshealth.html - Decomissioning. (nijel via xyao) (xyao: rev 
1569228ec9090823186f062257fdf1beb5ee1781)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html


 Typo in dfshealth.html - Decomissioning
 -

 Key: HDFS-8565
 URL: https://issues.apache.org/jira/browse/HDFS-8565
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: HDFS-8565.patch


 div class=page-headerh1smallDecomissioning/small/h1/div
 change to 
 div class=page-headerh1smallDecommissioning/small/h1/div
 in dfshealth.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697564#comment-14697564
 ] 

Hadoop QA commented on HDFS-6244:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 24s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:red}-1{color} | javac |   1m 37s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750579/HDFS-6244.v5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 84bf712 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11999/console |


This message was automatically generated.

 Make Trash Interval configurable for each of the namespaces
 ---

 Key: HDFS-6244
 URL: https://issues.apache.org/jira/browse/HDFS-6244
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
 HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch


 Somehow we need to avoid the cluster filling up.
 One solution is to have a different trash policy per namespace. However, if 
 we can simply make the property configurable per namespace, then the same 
 config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697693#comment-14697693
 ] 

Arpit Agarwal commented on HDFS-7649:
-

Thanks for catching and taking care of this Nicholas.

 Multihoming docs should emphasize using hostnames in configurations
 ---

 Key: HDFS-7649
 URL: https://issues.apache.org/jira/browse/HDFS-7649
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Arpit Agarwal
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: HDFS-7649.patch


 The docs should emphasize that master and slave configurations should 
 hostnames wherever possible.
 Link to current docs: 
 https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6244:
--
Status: Open  (was: Patch Available)

 Make Trash Interval configurable for each of the namespaces
 ---

 Key: HDFS-6244
 URL: https://issues.apache.org/jira/browse/HDFS-6244
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
 HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch


 Somehow we need to avoid the cluster filling up.
 One solution is to have a different trash policy per namespace. However, if 
 we can simply make the property configurable per namespace, then the same 
 config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6244:
--
Status: Patch Available  (was: Open)

 Make Trash Interval configurable for each of the namespaces
 ---

 Key: HDFS-6244
 URL: https://issues.apache.org/jira/browse/HDFS-6244
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
 HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch


 Somehow we need to avoid the cluster filling up.
 One solution is to have a different trash policy per namespace. However, if 
 we can simply make the property configurable per namespace, then the same 
 config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6244:
--
Attachment: HDFS-6244.v5.patch

 Make Trash Interval configurable for each of the namespaces
 ---

 Key: HDFS-6244
 URL: https://issues.apache.org/jira/browse/HDFS-6244
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
 HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch


 Somehow we need to avoid the cluster filling up.
 One solution is to have a different trash policy per namespace. However, if 
 we can simply make the property configurable per namespace, then the same 
 config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6244:
--
Attachment: (was: HDFS-6244.v5.patch)

 Make Trash Interval configurable for each of the namespaces
 ---

 Key: HDFS-6244
 URL: https://issues.apache.org/jira/browse/HDFS-6244
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
 HDFS-6244.v3.patch, HDFS-6244.v4.patch


 Somehow we need to avoid the cluster filling up.
 One solution is to have a different trash policy per namespace. However, if 
 we can simply make the property configurable per namespace, then the same 
 config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8565) Typo in dfshealth.html - Decomissioning

2015-08-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8565:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks [~nijel] for the contribution. The patch has been committed to trunk and 
branch-2.

 Typo in dfshealth.html - Decomissioning
 -

 Key: HDFS-8565
 URL: https://issues.apache.org/jira/browse/HDFS-8565
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: HDFS-8565.patch


 div class=page-headerh1smallDecomissioning/small/h1/div
 change to 
 div class=page-headerh1smallDecommissioning/small/h1/div
 in dfshealth.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697669#comment-14697669
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7649:
---

Merged this to branch-2.

 Multihoming docs should emphasize using hostnames in configurations
 ---

 Key: HDFS-7649
 URL: https://issues.apache.org/jira/browse/HDFS-7649
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Arpit Agarwal
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: HDFS-7649.patch


 The docs should emphasize that master and slave configurations should 
 hostnames wherever possible.
 Link to current docs: 
 https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7649) Multihoming docs should emphasize using hostnames in configurations

2015-08-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697668#comment-14697668
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7649:
---

It seems that this was only committed to trunk but not yet merged to branch-2.

 Multihoming docs should emphasize using hostnames in configurations
 ---

 Key: HDFS-7649
 URL: https://issues.apache.org/jira/browse/HDFS-7649
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Arpit Agarwal
Assignee: Brahma Reddy Battula
 Fix For: 2.8.0

 Attachments: HDFS-7649.patch


 The docs should emphasize that master and slave configurations should 
 hostnames wherever possible.
 Link to current docs: 
 https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8565) Typo in dfshealth.html - Decomissioning

2015-08-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8565:
-
Fix Version/s: 2.8.0

 Typo in dfshealth.html - Decomissioning
 -

 Key: HDFS-8565
 URL: https://issues.apache.org/jira/browse/HDFS-8565
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Fix For: 2.8.0

 Attachments: HDFS-8565.patch


 div class=page-headerh1smallDecomissioning/small/h1/div
 change to 
 div class=page-headerh1smallDecommissioning/small/h1/div
 in dfshealth.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697529#comment-14697529
 ] 

Hadoop QA commented on HDFS-6244:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  1s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750575/HDFS-6244.v5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 84bf712 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11998/console |


This message was automatically generated.

 Make Trash Interval configurable for each of the namespaces
 ---

 Key: HDFS-6244
 URL: https://issues.apache.org/jira/browse/HDFS-6244
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
 HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch


 Somehow we need to avoid the cluster filling up.
 One solution is to have a different trash policy per namespace. However, if 
 we can simply make the property configurable per namespace, then the same 
 config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8220:
---
Attachment: HDFS-8220-HDFS-7285-merge-10.patch

 Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
 doesn't satisfy BlockGroupSize
 ---

 Key: HDFS-8220
 URL: https://issues.apache.org/jira/browse/HDFS-8220
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
 HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
 HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285-merge-10.patch, 
 HDFS-8220-HDFS-7285.005.patch, HDFS-8220-HDFS-7285.006.patch, 
 HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.007.patch, 
 HDFS-8220-HDFS-7285.008.patch


 During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
 validate the available datanodes against the {{BlockGroupSize}}. Please see 
 the exception to understand more:
 {code}
 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
 DataStreamer Exception
 java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
 (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
 java.io.IOException: DataStreamer Exception: 
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   ... 1 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697681#comment-14697681
 ] 

Ravi Prakash commented on HDFS-8344:


Hi Haohui! There are arguments on both sides (time based vs count based). e.g. 
I may take down the cluster and bring it back up after enough time to expire 
the timeout in which case we wouldn't have retried enough times. 
Please let me know if you feel strongly though, and I can add one more 
configuration for the timeout (in addition to the number of retries). It feels 
like we are over-designing now. This is a rare enough event (client dies, and 
before the lease expiration so do the nodes it wrote to).

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 2.8.0

 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
 HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8824) Do not use small blocks for balancing the cluster

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697679#comment-14697679
 ] 

Hudson commented on HDFS-8824:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8308 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8308/])
HDFS-8824. Do not use small blocks for balancing the cluster. (szetszwo: rev 
2bc0a4f299fbd8035e29f62ce9cd22e209a62805)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java


 Do not use small blocks for balancing the cluster
 -

 Key: HDFS-8824
 URL: https://issues.apache.org/jira/browse/HDFS-8824
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer  mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h8824_20150727b.patch, h8824_20150811b.patch


 Balancer gets datanode block lists from NN and then move the blocks in order 
 to balance the cluster.  It should not use the blocks with small size since 
 moving the small blocks generates a lot of overhead and the small blocks do 
 not help balancing the cluster much.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces

2015-08-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-6244:
--
Attachment: HDFS-6244.v5.patch

 Make Trash Interval configurable for each of the namespaces
 ---

 Key: HDFS-6244
 URL: https://issues.apache.org/jira/browse/HDFS-6244
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, 
 HDFS-6244.v3.patch, HDFS-6244.v4.patch, HDFS-6244.v5.patch


 Somehow we need to avoid the cluster filling up.
 One solution is to have a different trash policy per namespace. However, if 
 we can simply make the property configurable per namespace, then the same 
 config can be rolled everywhere and we'd be done. This seems simple enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8565) Typo in dfshealth.html - Decomissioning

2015-08-14 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697592#comment-14697592
 ] 

Xiaoyu Yao commented on HDFS-8565:
--

+1. Patch LGTM. I will commit it shortly.

 Typo in dfshealth.html - Decomissioning
 -

 Key: HDFS-8565
 URL: https://issues.apache.org/jira/browse/HDFS-8565
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
Priority: Trivial
 Attachments: HDFS-8565.patch


 div class=page-headerh1smallDecomissioning/small/h1/div
 change to 
 div class=page-headerh1smallDecommissioning/small/h1/div
 in dfshealth.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8899) Erasure Coding: use threadpool for EC recovery tasks

2015-08-14 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8899:
--

 Summary: Erasure Coding: use threadpool for EC recovery tasks
 Key: HDFS-8899
 URL: https://issues.apache.org/jira/browse/HDFS-8899
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea is to use threadpool for processing erasure coding recovery tasks at 
the datanode.

{code}
new Daemon(new ReconstructAndTransferBlock(recoveryInfo)).start();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697324#comment-14697324
 ] 

Rakesh R commented on HDFS-8220:


Any more comments on the attached patch. Hi [~zhz], I hope {{HDFS-7285-merge}} 
is the active branch, should I create another patch now?

 Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
 doesn't satisfy BlockGroupSize
 ---

 Key: HDFS-8220
 URL: https://issues.apache.org/jira/browse/HDFS-8220
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
 HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
 HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285.005.patch, 
 HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, 
 HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch


 During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
 validate the available datanodes against the {{BlockGroupSize}}. Please see 
 the exception to understand more:
 {code}
 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
 DataStreamer Exception
 java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
 (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
 java.io.IOException: DataStreamer Exception: 
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   ... 1 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6955) DN should reserve disk space for a full block when creating tmp files

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697397#comment-14697397
 ] 

Hadoop QA commented on HDFS-6955:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 23s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 49s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 22s | The applied patch generated  3 
new checkstyle issues (total was 154, now 155). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  3s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 55s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 146m  6s | Tests failed in hadoop-hdfs. |
| | | 191m 58s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.blockmanagement.TestNodeCount |
| Timed out tests | 
org.apache.hadoop.hdfs.server.namenode.ha.TestFailureOfSharedDir |
|   | org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750505/HDFS-6955-01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 84bf712 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11997/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11997/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11997/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11997/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11997/console |


This message was automatically generated.

 DN should reserve disk space for a full block when creating tmp files
 -

 Key: HDFS-6955
 URL: https://issues.apache.org/jira/browse/HDFS-6955
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.5.0
Reporter: Arpit Agarwal
Assignee: kanaka kumar avvaru
 Attachments: HDFS-6955-01.patch


 HDFS-6898 is introducing disk space reservation for RBW files to avoid 
 running out of disk space midway through block creation.
 This Jira is to introduce similar reservation for tmp files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-14 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697330#comment-14697330
 ] 

Walter Su commented on HDFS-8838:
-

failed tests not related.
+1 for the last patch. (20150809.patch)

 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: HDFS-8838-HDFS-7285-000.patch, 
 HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, 
 h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, 
 h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize

2015-08-14 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697377#comment-14697377
 ] 

Zhe Zhang commented on HDFS-8220:
-

[~rakeshr] Yes it'd be great if you can create a patch for {{HDFS-7285-merge}}. 
I don't think there will be much conflict since this change is on client.

 Erasure Coding: StripedDataStreamer fails to handle the blocklocations which 
 doesn't satisfy BlockGroupSize
 ---

 Key: HDFS-8220
 URL: https://issues.apache.org/jira/browse/HDFS-8220
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, 
 HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285-09.patch, 
 HDFS-8220-HDFS-7285-10.patch, HDFS-8220-HDFS-7285.005.patch, 
 HDFS-8220-HDFS-7285.006.patch, HDFS-8220-HDFS-7285.007.patch, 
 HDFS-8220-HDFS-7285.007.patch, HDFS-8220-HDFS-7285.008.patch


 During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to 
 validate the available datanodes against the {{BlockGroupSize}}. Please see 
 the exception to understand more:
 {code}
 2015-04-22 14:56:11,313 WARN  hdfs.DFSClient (DataStreamer.java:run(538)) - 
 DataStreamer Exception
 java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 2015-04-22 14:56:11,313 INFO  hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster
 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient 
 (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387
 java.io.IOException: DataStreamer Exception: 
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1)
 Caused by: java.lang.NullPointerException
   at 
 java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374)
   at 
 org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157)
   at 
 org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332)
   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424)
   ... 1 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8898) Create API and command-line argument to get quota without need to get file and directory counts

2015-08-14 Thread Joep Rottinghuis (JIRA)
Joep Rottinghuis created HDFS-8898:
--

 Summary: Create API and command-line argument to get quota without 
need to get file and directory counts
 Key: HDFS-8898
 URL: https://issues.apache.org/jira/browse/HDFS-8898
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs
Reporter: Joep Rottinghuis


On large directory structures it takes significant time to iterate through the 
file and directory counts recursively to get a complete ContentSummary.
When you want to just check for the quota on a higher level directory it would 
be good to have an option to skip the file and directory counts.

Moreover, currently one can only check the quota if you have access to all the 
directories underneath. For example, if I have a large home directory under 
/user/joep and I host some files for another user in a sub-directory, the 
moment they create an unreadable sub-directory under my home I can no longer 
check what my quota is. Understood that I cannot check the current file counts 
unless I can iterate through all the usage, but for administrative purposes it 
is nice to be able to get the current quota setting on a directory without the 
need to iterate through and run into permission issues on sub-directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8093) BP does not exist or is not under Constructionnull

2015-08-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697395#comment-14697395
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8093:
---

The file /system/balancer.id seems to be deleted.  Could you grep 
/system/balancer.id from the NN log?

Also, are there other log messages between 2015-08-14 00:30:03,843 and 
2015-08-14 00:30:04,000?

 BP does not exist or is not under Constructionnull
 --

 Key: HDFS-8093
 URL: https://issues.apache.org/jira/browse/HDFS-8093
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Affects Versions: 2.6.0
 Environment: Centos 6.5
Reporter: LINTE

 HDFS balancer run during several hours blancing blocs beetween datanode, it 
 ended by failing with the following error.
 getStoredBlock function return a null BlockInfo.
 java.io.IOException: Bad response ERROR for block 
 BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 from 
 datanode 192.168.0.18:1004
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:897)
 15/04/08 05:52:51 WARN hdfs.DFSClient: Error Recovery for block 
 BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 in pipeline 
 192.168.0.63:1004, 192.168.0.1:1004, 192.168.0.18:1004: bad datanode 
 192.168.0.18:1004
 15/04/08 05:52:51 WARN hdfs.DFSClient: DataStreamer Exception
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
 BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not 
 exist or is not under Constructionnull
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:717)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:931)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 at org.apache.hadoop.ipc.Client.call(Client.java:1468)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy11.updateBlockForPipeline(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:877)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy12.updateBlockForPipeline(Unknown Source)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1266)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548)
 15/04/08 05:52:51 ERROR hdfs.DFSClient: Failed to close inode 19801755
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
 BP-970443206-192.168.0.208-1397583979378:blk_1086729930_13046030 does not 
 exist or is not under Constructionnull
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6913)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6980)
 at 
 

[jira] [Commented] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697787#comment-14697787
 ] 

Jing Zhao commented on HDFS-8801:
-

Actually converting BlockInfoUnderConstruction can bring us some benefits. 
Currently when processing block reports, if a finalized replica is reported, we 
may replace the corresponding UC blockInfo object with a newly created complete 
blockInfo object inside of the INodeFile. This replacement mixes the states of 
the block storage management and the NameSystem management, and forces the 
block report processing to take the Namesystem write lock.

To convert BlockInfoUC as a feature can avoid the BlockInfo object replacement. 
It helps separating the storage level and file system level, and allows us to 
do further block report processing improvement (e.g., separating the lock for 
namesystem and blockmanager).

 Convert BlockInfoUnderConstruction as a feature
 ---

 Key: HDFS-8801
 URL: https://issues.apache.org/jira/browse/HDFS-8801
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang

 Per discussion under HDFS-8499, with the erasure coding feature, there will 
 be 4 types of {{BlockInfo}} forming a multi-inheritance: 
 {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
 {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
 was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
 aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8891) HDFS concat should keep srcs order

2015-08-14 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8891:

Issue Type: Bug  (was: Improvement)

 HDFS concat should keep srcs order
 --

 Key: HDFS-8891
 URL: https://issues.apache.org/jira/browse/HDFS-8891
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yong Zhang
Assignee: Yong Zhang
 Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch


 FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
 order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order

2015-08-14 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697814#comment-14697814
 ] 

Jing Zhao commented on HDFS-8891:
-

+1. I will commit the patch shortly.

 HDFS concat should keep srcs order
 --

 Key: HDFS-8891
 URL: https://issues.apache.org/jira/browse/HDFS-8891
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yong Zhang
Assignee: Yong Zhang
 Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch


 FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
 order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8891) HDFS concat should keep srcs order

2015-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697842#comment-14697842
 ] 

Hudson commented on HDFS-8891:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8309 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8309/])
HDFS-8891. HDFS concat should keep srcs order. Contributed by Yong Zhang. 
(jing9: rev dc7a061668a3f4d86fe1b07a40d46774b5386938)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestHDFSConcat.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 HDFS concat should keep srcs order
 --

 Key: HDFS-8891
 URL: https://issues.apache.org/jira/browse/HDFS-8891
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yong Zhang
Assignee: Yong Zhang
 Fix For: 2.8.0

 Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch


 FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
 order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones

2015-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697844#comment-14697844
 ] 

Hadoop QA commented on HDFS-8833:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12750606/HDFS-8833-HDFS-7285-merge.00.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 1d37a88 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12002/console |


This message was automatically generated.

 Erasure coding: store EC schema and cell size in INodeFile and eliminate 
 notion of EC zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8833-HDFS-7285-merge.00.patch


 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697981#comment-14697981
 ] 

Haohui Mai commented on HDFS-8344:
--

bq. Even if its simpler, there's a chance that recovery is never attempted, and 
that is not acceptable IMHO.

Can you explain how the NN never try to recover the lease? All leases are 
periodically checked in {{LeaseManager#checkLease()}}, where the recovery 
happens.

 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 2.8.0

 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
 HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch, HDFS-8344.09.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8801:

Attachment: HDFS-8801.000.patch

Initial patch to demo the idea.

 Convert BlockInfoUnderConstruction as a feature
 ---

 Key: HDFS-8801
 URL: https://issues.apache.org/jira/browse/HDFS-8801
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhe Zhang
 Attachments: HDFS-8801.000.patch


 Per discussion under HDFS-8499, with the erasure coding feature, there will 
 be 4 types of {{BlockInfo}} forming a multi-inheritance: 
 {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
 {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
 was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
 aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to

2015-08-14 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697725#comment-14697725
 ] 

Ming Ma commented on HDFS-7446:
---

For the 2.6.1 effort, the backport is straightforward. But the API has changed 
compared to 2.6.0. This incompatibility only impacts folks who have been using 
inotify functionality introduced in 2.6.0.

 HDFS inotify should have the ability to determine what txid it has read up to
 -

 Key: HDFS-7446
 URL: https://issues.apache.org/jira/browse/HDFS-7446
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: HDFS-7446.001.patch, HDFS-7446.002.patch, 
 HDFS-7446.003.patch


 HDFS inotify should have the ability to determine what txid it has read up 
 to.  This will allow users who want to avoid missing any events to record 
 this txid and use it to resume reading events at the spot they left off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8801:

Affects Version/s: (was: 2.7.1)
   Status: Patch Available  (was: Open)

 Convert BlockInfoUnderConstruction as a feature
 ---

 Key: HDFS-8801
 URL: https://issues.apache.org/jira/browse/HDFS-8801
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhe Zhang
Assignee: Jing Zhao
 Attachments: HDFS-8801.000.patch


 Per discussion under HDFS-8499, with the erasure coding feature, there will 
 be 4 types of {{BlockInfo}} forming a multi-inheritance: 
 {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
 {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
 was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
 aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-08-14 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697817#comment-14697817
 ] 

Ravi Prakash commented on HDFS-8344:


bq. If you take down the cluster and bring it back up. All writing pipeline 
will fail and should fail.
That is correct. This JIRA is for the case that data loss has already occurred. 
i.e. client died + the DNs to which it wrote already died. We are trying to 
recover the lease in this JIRA. My argument was that after client+DNs have 
died, if I only have a timeout, I could take down the cluster. When I bring the 
cluster back up after the timeout value, the lease will be recovered without 
trying all the DNs.
bq. This is internal implementation details and I'm very reluctant to make it 
configurable 
Perhaps I should have said internal hard-coded configuration? Similar to 
{{recoveryAttemptsBeforeMarkingBlockMissing}} of version 8 of the patch.

bq.  Having only one concept for detecting failures (i.e., time out) is simpler 
than two (i.e., time out and number of retries).
Even if its simpler, there's a chance that recovery is never attempted, and 
that is not acceptable IMHO.


 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 2.8.0

 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch, 
 HDFS-8344.06.patch, HDFS-8344.07.patch, HDFS-8344.08.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8891) HDFS concat should keep srcs order

2015-08-14 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8891:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks Yong for the contribution!

 HDFS concat should keep srcs order
 --

 Key: HDFS-8891
 URL: https://issues.apache.org/jira/browse/HDFS-8891
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yong Zhang
Assignee: Yong Zhang
 Fix For: 2.8.0

 Attachments: HDFS-8891.001.patch, HDFS-8891.002.patch


 FSDirConcatOp.verifySrcFiles may change src files order, but it should their 
 order as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8853) Erasure Coding: Provide ECSchema validation when creating ECZone

2015-08-14 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697873#comment-14697873
 ] 

Zhe Zhang commented on HDFS-8853:
-

Thanks [~andreina] for the patch. Do you mind rebasing it?

I was also thinking about the issue when creating HDFS-8833 patch. In the long 
term, it might be better for the client to pass a {{String}} to NN instead of 
the actual policy/schema.

 Erasure Coding: Provide ECSchema validation when creating ECZone
 

 Key: HDFS-8853
 URL: https://issues.apache.org/jira/browse/HDFS-8853
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: J.Andreina
 Attachments: HDFS-8853-HDFS-7285-01.patch


 Presently the {{DFS#createErasureCodingZone(path, ecSchema, cellSize)}} 
 doesn't have any validation that the given {{ecSchema}} is available in 
 {{ErasureCodingSchemaManager#activeSchemas}} list. Now, if it doesn't exists 
 then will create the ECZone with {{null}} schema. IMHO we could improve this 
 by doing necessary basic sanity checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697904#comment-14697904
 ] 

Zhe Zhang commented on HDFS-8801:
-

Thanks for initiating the work Jing! The overall structure in the patch looks 
good to me.

Should we take the chance to change {{replicas}} from a List to an array? This 
can offset some of the memory overhead from the feature pointer, and also help 
us reconcile trunk with the striped UC code later.

 Convert BlockInfoUnderConstruction as a feature
 ---

 Key: HDFS-8801
 URL: https://issues.apache.org/jira/browse/HDFS-8801
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhe Zhang
Assignee: Jing Zhao
 Attachments: HDFS-8801.000.patch


 Per discussion under HDFS-8499, with the erasure coding feature, there will 
 be 4 types of {{BlockInfo}} forming a multi-inheritance: 
 {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
 {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
 was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
 aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8838) Tolerate datanode failures in DFSStripedOutputStream when the data length is small

2015-08-14 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697943#comment-14697943
 ] 

Walter Su commented on HDFS-8838:
-

I saw HDFS-8220 just get committed, would you mind rebase this to solve 
conflicts?

 Tolerate datanode failures in DFSStripedOutputStream when the data length is 
 small
 --

 Key: HDFS-8838
 URL: https://issues.apache.org/jira/browse/HDFS-8838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: HDFS-8838-HDFS-7285-000.patch, 
 HDFS-8838-HDFS-7285-20150809-test.patch, HDFS-8838-HDFS-7285-20150809.patch, 
 h8838_20150729.patch, h8838_20150731-HDFS-7285.patch, h8838_20150731.log, 
 h8838_20150731.patch, h8838_20150804-HDFS-7285.patch, h8838_20150809.patch


 Currently, DFSStripedOutputStream cannot tolerate datanode failures when the 
 data length is small.  We fix the bugs here and add more tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-08-14 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697956#comment-14697956
 ] 

Zhe Zhang commented on HDFS-7285:
-

Many thanks to Vinay for the great effort! I just cherry-picked the 2 new 
commits (HDFS-8854 and HDFS-8220) to the branch. Also created a Jenkins [job | 
https://builds.apache.org/job/Hadoop-HDFS-7285-nightly/].

I'll also compare {{HDFS-7285-REBASE}} with the consolidated patch. After that 
and verifying Jenkins results, I'll push it as {{HDFS-7285}} so we can better 
proceed with pending subtasks. I'll also move the current {{HDFS-7285}} branch 
as a backup, in case we want to reconcile differences in individual commits.

 Erasure Coding Support inside HDFS
 --

 Key: HDFS-7285
 URL: https://issues.apache.org/jira/browse/HDFS-7285
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Weihua Jiang
Assignee: Zhe Zhang
 Attachments: Consolidated-20150707.patch, 
 Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, 
 ECParser.py, HDFS-7285-initial-PoC.patch, 
 HDFS-7285-merge-consolidated-01.patch, 
 HDFS-7285-merge-consolidated-trunk-01.patch, 
 HDFS-7285-merge-consolidated.trunk.03.patch, 
 HDFS-7285-merge-consolidated.trunk.04.patch, 
 HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, 
 HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, 
 HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, 
 HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, 
 fsimage-analysis-20150105.pdf


 Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
 of data reliability, comparing to the existing HDFS 3-replica approach. For 
 example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
 with storage overhead only being 40%. This makes EC a quite attractive 
 alternative for big data storage, particularly for cold data. 
 Facebook had a related open source project called HDFS-RAID. It used to be 
 one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
 on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
 cold files that are intended not to be appended anymore; 3) the pure Java EC 
 coding implementation is extremely slow in practical use. Due to these, it 
 might not be a good idea to just bring HDFS-RAID back.
 We (Intel and Cloudera) are working on a design to build EC into HDFS that 
 gets rid of any external dependencies, makes it self-contained and 
 independently maintained. This design lays the EC feature on the storage type 
 support and considers compatible with existing HDFS features like caching, 
 snapshot, encryption, high availability and etc. This design will also 
 support different EC coding schemes, implementations and policies for 
 different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
 ISA-L library), an implementation can greatly improve the performance of EC 
 encoding/decoding and makes the EC solution even more attractive. We will 
 post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature

2015-08-14 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-8801:
---

Assignee: Jing Zhao

 Convert BlockInfoUnderConstruction as a feature
 ---

 Key: HDFS-8801
 URL: https://issues.apache.org/jira/browse/HDFS-8801
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Zhe Zhang
Assignee: Jing Zhao
 Attachments: HDFS-8801.000.patch


 Per discussion under HDFS-8499, with the erasure coding feature, there will 
 be 4 types of {{BlockInfo}} forming a multi-inheritance: 
 {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, 
 {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution 
 was building feature classes like {{FileUnderConstructionFeature}}. This JIRA 
 aims to implement the same idea on {{BlockInfo}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones

2015-08-14 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8833:

Status: Patch Available  (was: Open)

 Erasure coding: store EC schema and cell size in INodeFile and eliminate 
 notion of EC zones
 ---

 Key: HDFS-8833
 URL: https://issues.apache.org/jira/browse/HDFS-8833
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 We have [discussed | 
 https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
  storing EC schema with files instead of EC zones and recently revisited the 
 discussion under HDFS-8059.
 As a recap, the _zone_ concept has severe limitations including renaming and 
 nested configuration. Those limitations are valid in encryption for security 
 reasons and it doesn't make sense to carry them over in EC.
 This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
 simplicity, we should first implement it as an xattr and consider memory 
 optimizations (such as moving it to file header) as a follow-on. We should 
 also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8824) Do not use small blocks for balancing the cluster

2015-08-14 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8824:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Thanks Jitendra for reviewing the patch.

I have committed this.

 Do not use small blocks for balancing the cluster
 -

 Key: HDFS-8824
 URL: https://issues.apache.org/jira/browse/HDFS-8824
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer  mover
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Fix For: 2.8.0

 Attachments: h8824_20150727b.patch, h8824_20150811b.patch


 Balancer gets datanode block lists from NN and then move the blocks in order 
 to balance the cluster.  It should not use the blocks with small size since 
 moving the small blocks generates a lot of overhead and the small blocks do 
 not help balancing the cluster much.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >