[jira] [Commented] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout

2015-04-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504428#comment-14504428
 ] 

Yi Liu commented on HDFS-8033:
--

Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.* 
In DFSInputStream, the stateful read is not to read fully for the output *buf*, 
 {{readWithStrategy}} will call {{readBuffer}} and return on success.  In 
{{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one 
striped block, so the returned result should be something like (cell_0, cell_3, 
).  
This is not incorrect,  in the test, you have tested stateful read, but you do 
fully read and the data size is *BLOCK_GROUP_SIZE*, so the result 
coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of 
{{DFSStripedInputStream}} unless we find the end of file, of course, the final 
read length could be less than the input buf length if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and 
refetchEncryptionKey. And for other IOException, we can throw it.

*3.* 
For the test, do stateful read: read once and fully read (please make the data 
size large than groupSize * cellSize), as I said in #1,

*4.* 
{{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.* 
Why you modify {{SimulatedFSDataset}}?

 Erasure coding: stateful (non-positional) read from files in striped layout
 ---

 Key: HDFS-8033
 URL: https://issues.apache.org/jira/browse/HDFS-8033
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading

2015-04-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504468#comment-14504468
 ] 

Kai Zheng commented on HDFS-8201:
-

I'm not sure. This work would rather end with a unit test, focusing stripping 
writing and reading. I thought HDFS-8197 is good for system integration tests.

 Add an end to end test for stripping file writing and reading
 -

 Key: HDFS-8201
 URL: https://issues.apache.org/jira/browse/HDFS-8201
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 

 According to off-line discussion with [~zhz] and [~xinwei], we need to 
 implement an end to end test for stripping file support:
 * Create an EC zone;
 * Create a file in the zone;
 * Write various typical sizes of content to the file, each size maybe a test 
 method;
 * Read the written content back;
 * Compare the written content and read content to ensure it's good;
 The test facility is subject to add more steps for erasure encoding and 
 recovering. Will open separate issue for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout

2015-04-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504485#comment-14504485
 ] 

Yi Liu commented on HDFS-8033:
--

BTW, I find we also need to handle {{seek}}, zerocopy read for 
{{DFSStripedInputStream}}, I filed HDFS-8203 to handle it.

 Erasure coding: stateful (non-positional) read from files in striped layout
 ---

 Key: HDFS-8033
 URL: https://issues.apache.org/jira/browse/HDFS-8033
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8200) Refactor FSDirStatAndListingOp

2015-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504487#comment-14504487
 ] 

Hadoop QA commented on HDFS-8200:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726772/HDFS-8200.000.patch
  against trunk revision d52de61.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestLeaseRecovery2

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10327//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10327//console

This message is automatically generated.

 Refactor FSDirStatAndListingOp
 --

 Key: HDFS-8200
 URL: https://issues.apache.org/jira/browse/HDFS-8200
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8200.000.patch


 After HDFS-6826 several functions in {{FSDirStatAndListingOp}} are dead. This 
 jira proposes to clean them up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout

2015-04-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504428#comment-14504428
 ] 

Yi Liu edited comment on HDFS-8033 at 4/21/15 6:25 AM:
---

Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.*  In DFSInputStream, the stateful read is not to read fully for the output 
*buf*,  {{readWithStrategy}} will call {{readBuffer}} and return on success.  
In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in 
one striped block, so the returned result should be something like (cell_0, 
cell_3, ).  
This is not incorrect,  in the test, you have tested stateful read, but you do 
fully read and the data size is *BLOCK_GROUP_SIZE*, so the result 
coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of 
{{DFSStripedInputStream}} unless we find the end of file, of course, the final 
read length could be less than the input buf length if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and 
refetchEncryptionKey. And for other IOException, we can throw it.

*3.*  For the test, do stateful read: read once and fully read (please make the 
data size large than groupSize * cellSize), as I said in #1,

*4.*  {{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.*  Why you modify {{SimulatedFSDataset}}?


was (Author: hitliuyi):
Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.* 
In DFSInputStream, the stateful read is not to read fully for the output *buf*, 
 {{readWithStrategy}} will call {{readBuffer}} and return on success.  In 
{{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in one 
striped block, so the returned result should be something like (cell_0, cell_3, 
).  
This is not incorrect,  in the test, you have tested stateful read, but you do 
fully read and the data size is *BLOCK_GROUP_SIZE*, so the result 
coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of 
{{DFSStripedInputStream}} unless we find the end of file, of course, the final 
read length could be less than the input buf length if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and 
refetchEncryptionKey. And for other IOException, we can throw it.

*3.* 
For the test, do stateful read: read once and fully read (please make the data 
size large than groupSize * cellSize), as I said in #1,

*4.* 
{{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.* 
Why you modify {{SimulatedFSDataset}}?

 Erasure coding: stateful (non-positional) read from files in striped layout
 ---

 Key: HDFS-8033
 URL: https://issues.apache.org/jira/browse/HDFS-8033
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8203) Erasure Coding: Seek and other Ops in DFSStripedInputStream.

2015-04-21 Thread Yi Liu (JIRA)
Yi Liu created HDFS-8203:


 Summary: Erasure Coding: Seek and other Ops in 
DFSStripedInputStream.
 Key: HDFS-8203
 URL: https://issues.apache.org/jira/browse/HDFS-8203
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Yi Liu
Assignee: Yi Liu


In HDFS-7782 and HDFS-8033, we handle pread and stateful read for 
{{DFSStripedInputStream}}, we also need handle other operations, such as 
{{seek}}, zerocopy read ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading

2015-04-21 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504476#comment-14504476
 ] 

Kai Sasaki commented on HDFS-8201:
--

[~drankye] I see. If the purpose of this JIRA is like what you mentioned, 
please keep it. Thank you for clarifying!

 Add an end to end test for stripping file writing and reading
 -

 Key: HDFS-8201
 URL: https://issues.apache.org/jira/browse/HDFS-8201
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 

 According to off-line discussion with [~zhz] and [~xinwei], we need to 
 implement an end to end test for stripping file support:
 * Create an EC zone;
 * Create a file in the zone;
 * Write various typical sizes of content to the file, each size maybe a test 
 method;
 * Read the written content back;
 * Compare the written content and read content to ensure it's good;
 The test facility is subject to add more steps for erasure encoding and 
 recovering. Will open separate issue for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8191) Fix byte to integer casting in SimulatedFSDataset#simulatedByte

2015-04-21 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8191:

Attachment: HDFS-8191.001.patch

Thanks Andrew for the review! Yes a unit test is a good idea.

It turns out I need to refactor {{TestSimulatedFSDataset}} quite a bit to 
inject simulated books with negative block IDs. But I think the added 
{{negativeBlkID}} will be useful in the future as well.

Both Jenkins failures pass locally.

 Fix byte to integer casting in SimulatedFSDataset#simulatedByte
 ---

 Key: HDFS-8191
 URL: https://issues.apache.org/jira/browse/HDFS-8191
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-8191.000.patch, HDFS-8191.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned

2015-04-21 Thread J.Andreina (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504404#comment-14504404
 ] 

J.Andreina commented on HDFS-7993:
--

Thanks [~mingma] and [~vinayrpet] for reviewing and correcting me. 
I have Updated the patch addressing all the comments.
Please review .

 Incorrect descriptions in fsck when nodes are decommissioned
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned

2015-04-21 Thread J.Andreina (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-7993:
-
Attachment: HDFS-7993.6.patch

 Incorrect descriptions in fsck when nodes are decommissioned
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned

2015-04-21 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504436#comment-14504436
 ] 

Vinayakumar B commented on HDFS-7993:
-

Thanks [~andreina] for the latest patch.
+1.
Waiting for jenkins

 Incorrect descriptions in fsck when nodes are decommissioned
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8182) Implement topology-aware CDN-style caching

2015-04-21 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1450#comment-1450
 ] 

Gera Shegalov commented on HDFS-8182:
-

Hi Andrew,

I think the said block placement policy works fine for data whose usage we know 
a priori such as binaries in YARN-1492 Shared Cache (few terabytes in our 
case), MR/Spark staging directories, etc. For such cases we/frameworks already 
set a high replication factor. And the solution with rf=#racks is already good 
enough. Except for the replication speed vs YARN scheduling race, which would 
be eliminated with the approach proposed in this JIRA. 

In some cases we have no a priori knowledge. The most prominent ones are some 
primary or temporary files are used as the build input of a hash join in an 
ad-hoc manner. Having a solution that works transparently irrespective of 
specified replication factor is a win.

Another drawback of a block-placement based solution (besides currently being 
global, not per file) is that it's not elastic, and is oblivious of the data 
temperature. I think this JIRA would cover both families of cases above well.

 Implement topology-aware CDN-style caching
 --

 Key: HDFS-8182
 URL: https://issues.apache.org/jira/browse/HDFS-8182
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, namenode
Affects Versions: 2.6.0
Reporter: Gera Shegalov

 To scale reads of hot blocks in large clusters, it would be beneficial if we 
 could read a block across the ToR switches only once. Example scenarios are 
 localization of binaries, MR distributed cache files for map-side joins and 
 similar. There are multiple layers where this could be implemented (YARN 
 service or individual apps such as MR) but I believe it is best done in HDFS 
 or even common FileSystem to support as many use cases as possible. 
 The life cycle could look like this e.g. for the YARN localization scenario:
 1. inputStream = fs.open(path, ..., CACHE_IN_RACK)
 2. instead of reading from a remote DN directly, NN tells the client to read 
 via the local DN1 and the DN1 creates a replica of each block.
 When the next localizer on DN2 in the same rack starts it will learn from NN 
 about the replica in DN1 and the client will read from DN1 using the 
 conventional path.
 When the application ends the AM or NM's can instruct the NN in a fadvise 
 DONTNEED style, it can start telling DN's to discard extraneous replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8033) Erasure coding: stateful (non-positional) read from files in striped layout

2015-04-21 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504428#comment-14504428
 ] 

Yi Liu edited comment on HDFS-8033 at 4/21/15 6:33 AM:
---

Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.*  In DFSInputStream, the stateful read is not to read fully for the output 
*buf*,  {{readWithStrategy}} will call {{readBuffer}} and return on success.  
In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in 
one striped block, so the returned result should be something like (cell_0, 
cell_3, ) and it only contains part of the expected data. 
This is not incorrect,  in the test, you have tested stateful read, but you do 
fully read and the data size is *BLOCK_GROUP_SIZE*, so the result 
coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of 
{{DFSStripedInputStream}} unless we find the end of file, of course, the final 
read length could be less than the input buf length if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and 
refetchEncryptionKey. And for other IOException, we can throw it.

*3.*  For the test, do stateful read: read once and fully read (please make the 
data size large than groupSize * cellSize), as I said in #1,

*4.*  {{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.*  Why you modify {{SimulatedFSDataset}}?


was (Author: hitliuyi):
Thanks [~zhz] for working on this.  The patch is good, my comments:
*1.*  In DFSInputStream, the stateful read is not to read fully for the output 
*buf*,  {{readWithStrategy}} will call {{readBuffer}} and return on success.  
In {{DFSStripedInputStream}} we override {{readBuffer}}, but we only read in 
one striped block, so the returned result should be something like (cell_0, 
cell_3, ).  
This is not incorrect,  in the test, you have tested stateful read, but you do 
fully read and the data size is *BLOCK_GROUP_SIZE*, so the result 
coincidentally is correct. 
I suggest we try to do fully read in {{readBuffer}} of 
{{DFSStripedInputStream}} unless we find the end of file, of course, the final 
read length could be less than the input buf length if we get eof.

*2.* In {{blockSeekTo}}, we need to handle refetchToken and 
refetchEncryptionKey. And for other IOException, we can throw it.

*3.*  For the test, do stateful read: read once and fully read (please make the 
data size large than groupSize * cellSize), as I said in #1,

*4.*  {{connectFailedOnce}} in {{blockSeekTo}} is not necessary.

*5.*  Why you modify {{SimulatedFSDataset}}?

 Erasure coding: stateful (non-positional) read from files in striped layout
 ---

 Key: HDFS-8033
 URL: https://issues.apache.org/jira/browse/HDFS-8033
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8033.000.patch, HDFS-8033.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading

2015-04-21 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504451#comment-14504451
 ] 

Kai Sasaki commented on HDFS-8201:
--

[~drankye] I wonder this JIRA might be duplicate to 
[HDFS-8197|https://issues.apache.org/jira/browse/HDFS-8197]. Can I file this 
JIRA under HDFS-8197?

 Add an end to end test for stripping file writing and reading
 -

 Key: HDFS-8201
 URL: https://issues.apache.org/jira/browse/HDFS-8201
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 

 According to off-line discussion with [~zhz] and [~xinwei], we need to 
 implement an end to end test for stripping file support:
 * Create an EC zone;
 * Create a file in the zone;
 * Write various typical sizes of content to the file, each size maybe a test 
 method;
 * Read the written content back;
 * Compare the written content and read content to ensure it's good;
 The test facility is subject to add more steps for erasure encoding and 
 recovering. Will open separate issue for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7687) Change fsck to support EC files

2015-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505467#comment-14505467
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7687:
---

For #1, see if you want to create a JIRA for trunk to do some refactoring first.

For #2, you may include the test here or in a separated JIRA.  Both are fine.

 Change fsck to support EC files
 ---

 Key: HDFS-7687
 URL: https://issues.apache.org/jira/browse/HDFS-7687
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Takanobu Asanuma

 We need to change fsck so that it can detect under replicated and corrupted 
 EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8133) Improve readability of deleted block check

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505512#comment-14505512
 ] 

Hudson commented on HDFS-8133:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7626 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7626/])
HDFS-8133. Improve readability of deleted block check (Daryn Sharp via Colin P. 
McCabe) (cmccabe: rev 997408eaaceef20b053ee7344468e28cb9a1379b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlocksMap.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoContiguous.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Improve readability of deleted block check
 --

 Key: HDFS-8133
 URL: https://issues.apache.org/jira/browse/HDFS-8133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-8133.patch


 The current means of checking if a block is deleted is checking if its block 
 collection is null.  A more readable approach is an isDeleted method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8193) Add the ability to delay replica deletion for a period of time

2015-04-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505364#comment-14505364
 ] 

Chris Nauroth commented on HDFS-8193:
-

Thank you for the response.  That clarifies it for me.

If possible, would you please see if there is a way to make the delay visible 
through metrics and the web UI?  Perhaps you could even just populate the same 
fields that were added in HDFS-5986 and HDFS-6385.

 Add the ability to delay replica deletion for a period of time
 --

 Key: HDFS-8193
 URL: https://issues.apache.org/jira/browse/HDFS-8193
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Zhe Zhang

 When doing maintenance on an HDFS cluster, users may be concerned about the 
 possibility of administrative mistakes or software bugs deleting replicas of 
 blocks that cannot easily be restored. It would be handy if HDFS could be 
 made to optionally not delete any replicas for a configurable period of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8163) Using monotonicNow for block report scheduling causes test failures on recently restarted systems

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505432#comment-14505432
 ] 

Hudson commented on HDFS-8163:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7624 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7624/])
HDFS-8163. Using monotonicNow for block report scheduling causes test failures 
on recently restarted systems. (Arpit Agarwal) (arp: rev 
dfc1c4c303cf15afc6c3361ed9d3238562f73cbd)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Time.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBpServiceActorScheduler.java


 Using monotonicNow for block report scheduling causes test failures on 
 recently restarted systems
 -

 Key: HDFS-8163
 URL: https://issues.apache.org/jira/browse/HDFS-8163
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.1
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8163.01.patch, HDFS-8163.02.patch, 
 HDFS-8163.03.patch


 {{BPServiceActor#blockReport}} has the following check:
 {code}
   ListDatanodeCommand blockReport() throws IOException {
 // send block report if timer has expired.
 final long startTime = monotonicNow();
 if (startTime - lastBlockReport = dnConf.blockReportInterval) {
   return null;
 }
 {code}
 Many tests trigger an immediate block report via 
 {{BPServiceActor#triggerBlockReportForTests}} which sets {{lastBlockReport = 
 0}}. However if the machine was restarted recently then startTime may be less 
 than {{dnConf.blockReportInterval}} and the block report is not sent.
 {{Time#monotonicNow}} uses {{System#nanoTime}} which represents time elapsed 
 since an arbitrary origin. The time should be used only for comparison with 
 other values returned by {{System#nanoTime}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode

2015-04-21 Thread Nate Edel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate Edel updated HDFS-8078:

Attachment: HDFS-8078.4.patch

 HDFS client gets errors trying to to connect to IPv6 DataNode
 -

 Key: HDFS-8078
 URL: https://issues.apache.org/jira/browse/HDFS-8078
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Nate Edel
Assignee: Nate Edel
  Labels: ipv6
 Attachments: HDFS-8078.4.patch


 1st exception, on put:
 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception
 java.lang.IllegalArgumentException: Does not contain a valid host:port 
 authority: 2401:db00:1010:70ba:face:0:8:0:50010
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212)
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
 Appears to actually stem from code in DataNodeID which assumes it's safe to 
 append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for 
 IPv6.  NetUtils.createSocketAddr( ) assembles a Java URI object, which 
 requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010
 Currently using InetAddress.getByName() to validate IPv6 (guava 
 InetAddresses.forString has been flaky) but could also use our own parsing. 
 (From logging this, it seems like a low-enough frequency call that the extra 
 object creation shouldn't be problematic, and for me the slight risk of 
 passing in bad input that is not actually an IPv4 or IPv6 address and thus 
 calling an external DNS lookup is outweighed by getting the address 
 normalized and avoiding rewriting parsing.)
 Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress()
 ---
 2nd exception (on datanode)
 15/04/13 13:18:07 ERROR datanode.DataNode: 
 dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown 
 operation  src: /2401:db00:20:7013:face:0:7:0:54152 dst: 
 /2401:db00:11:d010:face:0:2f:0:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
 at java.lang.Thread.run(Thread.java:745)
 Which also comes as client error -get: 2401 is not an IP string literal.
 This one has existing parsing logic which needs to shift to the last colon 
 rather than the first.  Should also be a tiny bit faster by using lastIndexOf 
 rather than split.  Could alternatively use the techniques above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8193) Add the ability to delay replica deletion for a period of time

2015-04-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505386#comment-14505386
 ] 

Zhe Zhang commented on HDFS-8193:
-

bq. If possible, would you please see if there is a way to make the delay 
visible through metrics and the web UI?
That's a great point. I believe admins will want to monitor both the delay and 
number of pending deletions. Will either add in this JIRA or a follow-on.

bq. Perhaps you could even just populate the same fields that were added in 
HDFS-5986 and HDFS-6385.
Seems to me these metrics differ for each DN. Maybe we should add them to the 
DN web UI / metrics? We could sum up the number of pending-deletion replicas 
and show on NN. But the per-DN delays are hard to summarize.

 Add the ability to delay replica deletion for a period of time
 --

 Key: HDFS-8193
 URL: https://issues.apache.org/jira/browse/HDFS-8193
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Zhe Zhang

 When doing maintenance on an HDFS cluster, users may be concerned about the 
 possibility of administrative mistakes or software bugs deleting replicas of 
 blocks that cannot easily be restored. It would be handy if HDFS could be 
 made to optionally not delete any replicas for a configurable period of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8193) Add the ability to delay replica deletion for a period of time

2015-04-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505449#comment-14505449
 ] 

Zhe Zhang commented on HDFS-8193:
-

Thanks for the pointers Chris! A mock-up is a very good idea; HDFS-5986 and 
HDFS-6385 are good examples to follow.

 Add the ability to delay replica deletion for a period of time
 --

 Key: HDFS-8193
 URL: https://issues.apache.org/jira/browse/HDFS-8193
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Zhe Zhang

 When doing maintenance on an HDFS cluster, users may be concerned about the 
 possibility of administrative mistakes or software bugs deleting replicas of 
 blocks that cannot easily be restored. It would be handy if HDFS could be 
 made to optionally not delete any replicas for a configurable period of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7687) Change fsck to support EC files

2015-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505463#comment-14505463
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7687:
---

The items looks good.  Just a minor point: A Corrupt EC block group could have 
= 6 blocks but some of the blocks are corrupted.

 ... in (6,3)-Reed-Solomon, these groups have more than 9 blocks. (Are there 
 these cases?)

Yes, it is possible.  E.g. a datanode D0 dies and a EC block in D0 is 
reconstructed in another datanode D1. Later on, D0 comes back.  Then, both D0 
and D1 have the same EC block and the block group could have more than 9 blocks.

 Change fsck to support EC files
 ---

 Key: HDFS-7687
 URL: https://issues.apache.org/jira/browse/HDFS-7687
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Takanobu Asanuma

 We need to change fsck so that it can detect under replicated and corrupted 
 EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8193) Add the ability to delay replica deletion for a period of time

2015-04-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505336#comment-14505336
 ] 

Zhe Zhang commented on HDFS-8193:
-

Thanks Chris for bringing up the questions. 

bq. HDFS-6186 only applies at NameNode startup.  Is the new feature something 
that could be triggered at any time on a running NameNode, such as right before 
a manual HA failover?
Short answer is yes. One can imagine it as a trash for block replicas, fully 
controlled by the DN hosting them. This should shelter block replicas from most 
admin mis-operations and NN bugs (more likely than DN bugs given the 
complexity) for a period of time. 

To answer the question from [~sureshms] under HDFS-6186:
bq. One problem with not deleting the blocks for a deleted file is, how does 
one restore it? Can we address in this jira pausing deletion after startup and 
address the suggestion you have made, along with other changes that might be 
necessary, in another jira.
First, NN bugs could cause block replicas to be deleted without deleting the 
file. Second, it's rather easy to back up NN metadata before performing 
maintenance, but extremely difficult to back up actual DN data. This JIRA aims 
to address that deficiency / discrepancy.

As future work, we plan to investigate an even more radical retention policy, 
where block replicas are never deleted before DN is actually running out of 
space. At that moment, victims are selected among pending-deletion replicas 
using a smart algorithm, and are overwritten by incoming replicas. We'll file a 
separate JIRA for that, after this JIRA builds the basic DN-side replica 
retention machinery.

 Add the ability to delay replica deletion for a period of time
 --

 Key: HDFS-8193
 URL: https://issues.apache.org/jira/browse/HDFS-8193
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Zhe Zhang

 When doing maintenance on an HDFS cluster, users may be concerned about the 
 possibility of administrative mistakes or software bugs deleting replicas of 
 blocks that cannot easily be restored. It would be handy if HDFS could be 
 made to optionally not delete any replicas for a configurable period of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8193) Add the ability to delay replica deletion for a period of time

2015-04-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505402#comment-14505402
 ] 

Chris Nauroth commented on HDFS-8193:
-

bq. Seems to me these metrics differ for each DN.

Ah yes, I missed the point that you were aiming for per-DN granularity.  In 
that case, yes, DN metrics would make sense.  You also could potentially take 
the approach done in HDFS-7604 to publish the counters back to the NN in 
heartbeats, and that would enable the NameNode to display per-DN stats on the 
Datanodes tab.  It's probably worth doing a quick UI mock-up to check if that 
really makes sense though.  Those tables can get crowded quickly.  :-)

Thanks again.


 Add the ability to delay replica deletion for a period of time
 --

 Key: HDFS-8193
 URL: https://issues.apache.org/jira/browse/HDFS-8193
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Zhe Zhang

 When doing maintenance on an HDFS cluster, users may be concerned about the 
 possibility of administrative mistakes or software bugs deleting replicas of 
 blocks that cannot easily be restored. It would be handy if HDFS could be 
 made to optionally not delete any replicas for a configurable period of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.

2015-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505424#comment-14505424
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8204:
---

This seems a duplicate of HDFS-8147.

 Balancer: 2 replicas ends in same node after running balance.
 -

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 is flawed, may causes 2 replicas ends in same node after running balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HDFS-8209) ArrayIndexOutOfBoundsException in MiniDFSCluster.

2015-04-21 Thread surendra singh lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

surendra singh lilhore moved HADOOP-11856 to HDFS-8209:
---

  Component/s: (was: test)
   test
Affects Version/s: (was: 2.6.0)
   2.6.0
  Key: HDFS-8209  (was: HADOOP-11856)
  Project: Hadoop HDFS  (was: Hadoop Common)

 ArrayIndexOutOfBoundsException in MiniDFSCluster.
 -

 Key: HDFS-8209
 URL: https://issues.apache.org/jira/browse/HDFS-8209
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.6.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore

 I want to create MiniDFSCluster with 2 datanode and for each datanode I want 
 to set different number of StorageTypes, but in this case I am getting 
 ArrayIndexOutOfBoundsException.
 My cluster schema is like this.
 {code}
 final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
   .numDataNodes(2)
   .storageTypes(new StorageType[][] {{ 
 StorageType.DISK, StorageType.ARCHIVE },{ StorageType.DISK } })
   .build();
 {code}
 *Exception* :
 {code}
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.makeDataNodeDirs(MiniDFSCluster.java:1218)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1402)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:832)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8200) Refactor FSDirStatAndListingOp

2015-04-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505416#comment-14505416
 ] 

Jing Zhao commented on HDFS-8200:
-

The patch looks good to me. One minor is that we can also pass the 
INodeAttributes into {{createFileStatus(..., needLocation, ...)}} to make the 
style more consistent. +1 after addressing the comments.

 Refactor FSDirStatAndListingOp
 --

 Key: HDFS-8200
 URL: https://issues.apache.org/jira/browse/HDFS-8200
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8200.000.patch


 After HDFS-6826 several functions in {{FSDirStatAndListingOp}} are dead. This 
 jira proposes to clean them up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8133) Improve readability of deleted block check

2015-04-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505479#comment-14505479
 ] 

Colin Patrick McCabe commented on HDFS-8133:


+1.

Thanks, Daryn.

Test failures are unrelated.  I ran the tests locally and they passed.

 Improve readability of deleted block check
 --

 Key: HDFS-8133
 URL: https://issues.apache.org/jira/browse/HDFS-8133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-8133.patch


 The current means of checking if a block is deleted is checking if its block 
 collection is null.  A more readable approach is an isDeleted method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8163) Using monotonicNow for block report scheduling causes test failures on recently restarted systems

2015-04-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505350#comment-14505350
 ] 

Jing Zhao commented on HDFS-8163:
-

Thanks for working on this, [~arpitagarwal]! The patch looks pretty good to me. 
The only nits is that the following code can be reformatted:
{code}
+@VisibleForTesting volatile long nextBlockReportTime = monotonicNow();
+@VisibleForTesting volatile long nextHeartbeatTime = monotonicNow();
+@VisibleForTesting boolean resetBlockReportTime = true;
{code}

I think you can address this while committing the patch. +1.

 Using monotonicNow for block report scheduling causes test failures on 
 recently restarted systems
 -

 Key: HDFS-8163
 URL: https://issues.apache.org/jira/browse/HDFS-8163
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.1
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
Priority: Blocker
 Attachments: HDFS-8163.01.patch, HDFS-8163.02.patch, 
 HDFS-8163.03.patch


 {{BPServiceActor#blockReport}} has the following check:
 {code}
   ListDatanodeCommand blockReport() throws IOException {
 // send block report if timer has expired.
 final long startTime = monotonicNow();
 if (startTime - lastBlockReport = dnConf.blockReportInterval) {
   return null;
 }
 {code}
 Many tests trigger an immediate block report via 
 {{BPServiceActor#triggerBlockReportForTests}} which sets {{lastBlockReport = 
 0}}. However if the machine was restarted recently then startTime may be less 
 than {{dnConf.blockReportInterval}} and the block report is not sent.
 {{Time#monotonicNow}} uses {{System#nanoTime}} which represents time elapsed 
 since an arbitrary origin. The time should be used only for comparison with 
 other values returned by {{System#nanoTime}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8163) Using monotonicNow for block report scheduling causes test failures on recently restarted systems

2015-04-21 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-8163:

  Resolution: Fixed
   Fix Version/s: 2.7.1
Target Version/s:   (was: 2.7.1)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the review Jing.

Fixed the formatting and committed to trunk, branch-2 and branch-2.7. Here is 
the delta:

{code:java}
-// assigned/read by the actor thread. Thus they should be declared as vol
-// to make sure the happens-before consistency.
-@VisibleForTesting volatile long nextBlockReportTime = monotonicNow();
-@VisibleForTesting volatile long nextHeartbeatTime = monotonicNow();
-@VisibleForTesting boolean resetBlockReportTime = true;
+// assigned/read by the actor thread.
+@VisibleForTesting
+volatile long nextBlockReportTime = monotonicNow();
+
+@VisibleForTesting
+volatile long nextHeartbeatTime = monotonicNow();
+
+@VisibleForTesting
+boolean resetBlockReportTime = true;
{code}


 Using monotonicNow for block report scheduling causes test failures on 
 recently restarted systems
 -

 Key: HDFS-8163
 URL: https://issues.apache.org/jira/browse/HDFS-8163
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.1
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8163.01.patch, HDFS-8163.02.patch, 
 HDFS-8163.03.patch


 {{BPServiceActor#blockReport}} has the following check:
 {code}
   ListDatanodeCommand blockReport() throws IOException {
 // send block report if timer has expired.
 final long startTime = monotonicNow();
 if (startTime - lastBlockReport = dnConf.blockReportInterval) {
   return null;
 }
 {code}
 Many tests trigger an immediate block report via 
 {{BPServiceActor#triggerBlockReportForTests}} which sets {{lastBlockReport = 
 0}}. However if the machine was restarted recently then startTime may be less 
 than {{dnConf.blockReportInterval}} and the block report is not sent.
 {{Time#monotonicNow}} uses {{System#nanoTime}} which represents time elapsed 
 since an arbitrary origin. The time should be used only for comparison with 
 other values returned by {{System#nanoTime}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode

2015-04-21 Thread Nate Edel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate Edel updated HDFS-8078:

Attachment: (was: HDFS-8078.4.patch)

 HDFS client gets errors trying to to connect to IPv6 DataNode
 -

 Key: HDFS-8078
 URL: https://issues.apache.org/jira/browse/HDFS-8078
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Nate Edel
Assignee: Nate Edel
  Labels: ipv6

 1st exception, on put:
 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception
 java.lang.IllegalArgumentException: Does not contain a valid host:port 
 authority: 2401:db00:1010:70ba:face:0:8:0:50010
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212)
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
 Appears to actually stem from code in DataNodeID which assumes it's safe to 
 append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for 
 IPv6.  NetUtils.createSocketAddr( ) assembles a Java URI object, which 
 requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010
 Currently using InetAddress.getByName() to validate IPv6 (guava 
 InetAddresses.forString has been flaky) but could also use our own parsing. 
 (From logging this, it seems like a low-enough frequency call that the extra 
 object creation shouldn't be problematic, and for me the slight risk of 
 passing in bad input that is not actually an IPv4 or IPv6 address and thus 
 calling an external DNS lookup is outweighed by getting the address 
 normalized and avoiding rewriting parsing.)
 Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress()
 ---
 2nd exception (on datanode)
 15/04/13 13:18:07 ERROR datanode.DataNode: 
 dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown 
 operation  src: /2401:db00:20:7013:face:0:7:0:54152 dst: 
 /2401:db00:11:d010:face:0:2f:0:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
 at java.lang.Thread.run(Thread.java:745)
 Which also comes as client error -get: 2401 is not an IP string literal.
 This one has existing parsing logic which needs to shift to the last colon 
 rather than the first.  Should also be a tiny bit faster by using lastIndexOf 
 rather than split.  Could alternatively use the techniques above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8133) Improve readability of deleted block check

2015-04-21 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8133:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

 Improve readability of deleted block check
 --

 Key: HDFS-8133
 URL: https://issues.apache.org/jira/browse/HDFS-8133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Fix For: 2.8.0

 Attachments: HDFS-8133.patch


 The current means of checking if a block is deleted is checking if its block 
 collection is null.  A more readable approach is an isDeleted method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-04-21 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504550#comment-14504550
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


Thanks for the feedback and comments. I will try to answer the questions over 
my next few comments. I will also update the document to reflect the discussion 
here.

  The stated limits in the document are more of the design goals, and 
parameters we have in mind while designing for the first phase of the project. 
These are not hard limits and most of these will be configurable. First I will 
state a few technical limits and then describe some back of the envelope 
calculations and heuristics I have used behind these numbers.
  The technical limitations are following.
  # The memory in the storage container manager limits the number of storage 
containers. From the namenode experience, I believe we can go up to a few 100 
million storage containers. In later phases of the project we can have a 
federated architecture with multiple storage container managers for further 
scale up.
  # The size of a storage container is limited by how quick we want to 
replicate the containers when a datanode goes down. The advantage of using a 
large container size is that it reduces the metadata needed to track container 
locations which is proportional to number of containers. However, a very large 
container will reduce the parallelization that cluster can achieve to replicate 
when a node fails. The container size will be configurable. A default size of 
10G seems like a good choice, which is much larger than hdfs block sizes, but 
still allows hundreds of containers on datanodes with a few terabytes of disk.

  The maximum size of an object is stated as 5G. In future we would like to 
even increase this limit when we can support multi-part writes similar to S3. 
However, it is expected that average size of the objects would be much smaller. 
The most common range is expected to be a few hundred KBs to a few hundred MBs.
  Assuming 100 million containers, 1MB average size of an object, and 10G the 
storage container size, it amounts to 10 Trillion objects. I think 10 trillion 
is a lofty goal to have : ). The division of 10 trillion into 10 million 
buckets with a million object in each bucket is kind of arbitrary, but we 
believed users will prefer smaller buckets for better organization. We will 
keep these configurable. 

  The storage volume settings provide admins a control over the usage of the 
storage. In a private cloud, a cluster shared by lots of tenants can have a 
storage volume dedicated to each tenant. A tenant can be a user or a project or 
a group of users. Therefore, a limit of 1000 buckets implying around 1PB of 
storage per tenant seems reasonable. But, I do agree that when we have a quota 
on a storage volume size, an additional limit on number of buckets is not 
really needed.

  We plan to carry out the project in several phases. I would like to propose 
following phases:

  Phase 1
   # Basic API as covered in the document.
   # Storage container machinery, reliability, replication.

  Phase 2
   # High availability
   # Security
   # Secondary index for object listing with prefixes.

  Phase 3
   # Caching to improve latency.
   # Further scalability in terms of number of objects and object sizes.
   # Cross-geo replication.

I have created branch HDFS-7240 for this work. We will start filing jiras and 
posting patches. 

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Ozone-architecture-v1.pdf


 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8136) Client gets and uses EC schema when reads and writes a stripping file

2015-04-21 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504624#comment-14504624
 ] 

Li Bo commented on HDFS-8136:
-

The patch also looks good to me. When I apply the patch to branch 7285, it 
shows {{Reversed (or previously applied) patch detected}}. Your changes to 
{{TestDFSStripedOutputStream}} seems has been committed to branch by other 
patch. Could you update your patch according to current branch code?

 Client gets and uses EC schema when reads and writes a stripping file
 -

 Key: HDFS-8136
 URL: https://issues.apache.org/jira/browse/HDFS-8136
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Kai Zheng
Assignee: Kai Sasaki
 Attachments: HDFS-8136.1.patch, HDFS-8136.2.patch, HDFS-8136.3.patch


 Discussed with [~umamaheswararao] and [~vinayrpet], in client when reading 
 and writing a stripping file, it can invoke a separate call to NameNode to 
 request the EC schema associated with the EC zone where the file is in. Then 
 the schema can be used to guide the reading and writing. Currently it uses 
 hard-coded values.
 Optionally, as an optimization consideration, client may cache schema info 
 per file or per zone or per schema name. We could add schema name in 
 {{HdfsFileStatus}} for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8160) Long delays when calling hdfsOpenFile()

2015-04-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504623#comment-14504623
 ] 

Steve Loughran commented on HDFS-8160:
--

it ultimately worked as after timing out, the DFS client tried a different host.

what may be happening is that the datanodes are reporting in as healthy, but 
the address they publish for clients to get that data isn't accessible. Wrong 
hostname or firewalls being the common causes; network  routing problems 
another

try a telnet to the hostname  port listed, from the machine that isn't able to 
connect, and see what happens

 Long delays when calling hdfsOpenFile()
 ---

 Key: HDFS-8160
 URL: https://issues.apache.org/jira/browse/HDFS-8160
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.5.2
 Environment: 3-node Apache Hadoop 2.5.2 cluster running on Ubuntu 
 14.04 
 dfshealth overview:
 Security is off.
 Safemode is off.
 8 files and directories, 9 blocks = 17 total filesystem object(s).
 Heap Memory used 45.78 MB of 90.5 MB Heap Memory. Max Heap Memory is 889 MB.
 Non Heap Memory used 36.3 MB of 70.44 MB Commited Non Heap Memory. Max Non 
 Heap Memory is 130 MB.
 Configured Capacity:  118.02 GB
 DFS Used: 2.77 GB
 Non DFS Used: 12.19 GB
 DFS Remaining:103.06 GB
 DFS Used%:2.35%
 DFS Remaining%:   87.32%
 Block Pool Used:  2.77 GB
 Block Pool Used%: 2.35%
 DataNodes usages% (Min/Median/Max/stdDev):2.35% / 2.35% / 2.35% / 0.00%
 Live Nodes3 (Decommissioned: 0)
 Dead Nodes0 (Decommissioned: 0)
 Decommissioning Nodes 0
 Number of Under-Replicated Blocks 0
 Number of Blocks Pending Deletion 0
 Datanode Information
 In operation
 Node  Last contactAdmin State CapacityUsedNon DFS Used
 Remaining   Blocks  Block pool used Failed Volumes  Version
 hadoop252-3 (x.x.x.10:50010)  1   In Service  39.34 GB944.85 
 MB   3.63 GB 34.79 GB9   944.85 MB (2.35%)   0   2.5.2
 hadoop252-1 (x.x.x.8:50010)   0   In Service  39.34 GB944.85 
 MB   4.94 GB 33.48 GB9   944.85 MB (2.35%)   0   2.5.2
 hadoop252-2 (x.x.x.9:50010)   1   In Service  39.34 GB944.85 
 MB   3.63 GB 34.79 GB9   944.85 MB (2.35%)   0   2.5.2
 java version 1.7.0_76
 Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)
Reporter: Rod

 Calling hdfsOpenFile on a file residing on target 3-node Hadoop cluster 
 (described in detail in Environment section) blocks for a long time (several 
 minutes).  I've noticed that the delay is related to the size of the target 
 file. 
 For example, attempting to hdfsOpenFile() on a file of filesize 852483361 
 took 121 seconds, but a file of 15458 took less than a second.
 Also, during the long delay, the following stacktrace is routed to standard 
 out:
 2015-04-16 10:32:13,943 WARN  [main] hdfs.BlockReaderFactory 
 (BlockReaderFactory.java:getRemoteBlockReaderFromTcp(693)) - I/O error 
 constructing remote block reader.
 org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while 
 waiting for channel to be ready for connect. ch : 
 java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010]
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
   at 
 org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3101)
   at 
 org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
   at 
 org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
   at 
 org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
   at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:854)
   at 
 org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143)
 2015-04-16 10:32:13,946 WARN  [main] hdfs.DFSClient 
 (DFSInputStream.java:blockSeekTo(612)) - Failed to connect to 
 /10.40.8.10:50010 for block, add to deadNodes and continue. 
 org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while 
 waiting for channel to be ready for connect. ch : 
 java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010]
 org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while 
 waiting for channel to be ready for connect. ch : 
 java.nio.channels.SocketChannel[connection-pending remote=/10.40.8.10:50010]
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
   at 
 

[jira] [Commented] (HDFS-8179) DFSClient#getServerDefaults returns null within 1 hour of system start

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504811#comment-14504811
 ] 

Hudson commented on HDFS-8179:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #161 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/161/])
HDFS-8179. DFSClient#getServerDefaults returns null within 1 hour of system 
start. (Contributed by Xiaoyu Yao) (arp: rev 
c92f6f360515cc21ecb9b9f49b3e59537ef0cb05)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java


 DFSClient#getServerDefaults returns null within 1 hour of system start
 --

 Key: HDFS-8179
 URL: https://issues.apache.org/jira/browse/HDFS-8179
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8179.00.patch, HDFS-8179.01.patch


 We recently hit NPE during Ambari Oozie service check. The failed hdfs 
 command is below. It repros sometimes and then go away after the cluster runs 
 for a while.
 {code}
 [ambari-qa@c6401 ~]$ hadoop --config /etc/hadoop/conf fs -rm -r 
 /user/ambari-qa/mapredsmokeoutput
 rm: Failed to get server trash configuration: null. Consider using -skipTrash 
 option
 {code}
 With additional tracing, the failure was located to the following stack.
 {code}
 15/04/17 20:57:12 DEBUG fs.Trash: Failed to get server trash configuration
 java.lang.NullPointerException
   at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:86)
   at org.apache.hadoop.fs.shell.Delete$Rm.moveToTrash(Delete.java:117)
   at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:104)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:205)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:166)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
 rm: Failed to get server trash configuration: null. Consider using -skipTrash 
 option
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504810#comment-14504810
 ] 

Hudson commented on HDFS-7916:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #161 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/161/])
HDFS-7916. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes 
for infinite loop (Contributed by Vinayakumar B) (vinayakumarb: rev 
ed4137cebf27717e9c79eae515b0b83ab6676465)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for 
 infinite loop
 --

 Key: HDFS-7916
 URL: https://issues.apache.org/jira/browse/HDFS-7916
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-7916-01.patch


 if any badblock found, then BPSA for StandbyNode will go for infinite times 
 to report it.
 {noformat}2015-03-11 19:43:41,528 WARN 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
 stobdtserver3/10.224.54.70:18010
 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed 
 to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
 at 
 org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7993) Provide each Replica details in fsck

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504812#comment-14504812
 ] 

Hudson commented on HDFS-7993:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #161 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/161/])
HDFS-7993. Provide each Replica details in fsck (Contributed by J.Andreina) 
(vinayakumarb: rev 8ddbb8dd433862509bd9b222dddafe2c3a74778a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Provide each Replica details in fsck
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8205) fs -count -q -t -v -h displays wrong information

2015-04-21 Thread Peter Shi (JIRA)
Peter Shi created HDFS-8205:
---

 Summary: fs -count -q -t -v -h displays wrong information
 Key: HDFS-8205
 URL: https://issues.apache.org/jira/browse/HDFS-8205
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Peter Shi
Priority: Minor


{code}./hadoop fs -count -q -t -h -v /
   QUOTA   REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT   
FILE_COUNT   CONTENT_SIZE PATHNAME
15/04/21 15:20:19 INFO hdfs.DFSClient: Sets 
dfs.client.block.write.replace-datanode-on-failure.replication to 0
9223372036854775807 9223372036854775763none inf 
  31   13   1230 /{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block

2015-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504633#comment-14504633
 ] 

Hadoop QA commented on HDFS-7281:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726794/HDFS-7281-4.patch
  against trunk revision d52de61.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10329//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10329//console

This message is automatically generated.

 Missing block is marked as corrupted block
 --

 Key: HDFS-7281
 URL: https://issues.apache.org/jira/browse/HDFS-7281
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
  Labels: supportability
 Attachments: HDFS-7281-2.patch, HDFS-7281-3.patch, HDFS-7281-4.patch, 
 HDFS-7281.patch


 In the situation where the block lost all its replicas, fsck shows the block 
 is missing as well as corrupted. Perhaps it is better not to mark the block 
 corrupted in this case. The reason it is marked as corrupted is 
 numCorruptNodes == numNodes == 0 in the following code.
 {noformat}
 BlockManager
 final boolean isCorrupt = numCorruptNodes == numNodes;
 {noformat}
 Would like to clarify if it is the intent to mark missing block as corrupted 
 or it is just a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7687) Change fsck to support EC files

2015-04-21 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504914#comment-14504914
 ] 

Takanobu Asanuma commented on HDFS-7687:


Sorry for my late work, [~szetszwo].
I'm mainly changing codes in {{NamenodeFsck.check}} to handle EC and I'm going 
to add some metrics for EC, referring to replication. Please would you check 
these metrics?

{{Total EC block groups}}:
The number of all EC block groups on the HDFS.

{{Minimally stored block groups}}:
The number of EC block groups which have enough blocks to recover. For example, 
in (6,3)-Reed-Solomon, these groups have 6 blocks at least.

{{Over EC block groups}}:
The number of EC block groups which have excess blocks for some reason. For 
example, in (6,3)-Reed-Solomon, these groups have more than 9 blocks. (Are 
there these cases?)

{{Under EC block groups}}:
The number of EC block groups which have lost blocks.

{{Mis EC block groups}}:
The number of EC block groups whose rack locations are invalid.

{{Default EC schema}}:
This is usually SYS-DEFAULT-RS-6-3. I think this will be set by a 
configuration file later.

{{Corrupt EC block groups}}:
The number of EC block groups which don't have enough blocks to recovery. For 
example, in (6,3)-Reed-Solomon, these groups have less than 6 blocks, so they 
can't recover.

 Change fsck to support EC files
 ---

 Key: HDFS-7687
 URL: https://issues.apache.org/jira/browse/HDFS-7687
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Takanobu Asanuma

 We need to change fsck so that it can detect under replicated and corrupted 
 EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7687) Change fsck to support EC files

2015-04-21 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504916#comment-14504916
 ] 

Takanobu Asanuma commented on HDFS-7687:


And I have other thoughts. Should I create other tickets about the things below?
# {{Namenodefsck.check}} is a large method. If I add the codes to handle EC in 
this methods, it will become larger and more complicated. So we will refactor 
it later.
# We should add some tests about fsck for EC.

 Change fsck to support EC files
 ---

 Key: HDFS-7687
 URL: https://issues.apache.org/jira/browse/HDFS-7687
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Takanobu Asanuma

 We need to change fsck so that it can detect under replicated and corrupted 
 EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.

2015-04-21 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Component/s: balancer  mover

 Balancer: 2 replicas ends in same node after running balance.
 -

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.

2015-04-21 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Attachment: HDFS-8204.001.patch

 Balancer: 2 replicas ends in same node after running balance.
 -

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.

2015-04-21 Thread Walter Su (JIRA)
Walter Su created HDFS-8204:
---

 Summary: Balancer: 2 replicas ends in same node after running 
balance.
 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Walter Su
Assignee: Walter Su






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.

2015-04-21 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Description: 
Balancer moves blocks between Datanode(Ver. 2.6 ).
Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the 
new version(Ver. =2.6) .
function
{code}
class DBlock extends LocationsStorageGroup
DBlock.isLocatedOn(StorageGroup loc)
{code}
is flawed, may causes 2 replicas ends in same node after running balance.

  was:
Balancer moves blocks between Datanode(Ver. 2.6 ).
Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the 
new version(Ver. =2.6) .
function
{code}
class DBlock extends LocationsStorageGroup
DBlock.isLocatedOn(StorageGroup loc)
{code}
is flawed, may causes 


 Balancer: 2 replicas ends in same node after running balance.
 -

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 is flawed, may causes 2 replicas ends in same node after running balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7993) Provide each Replica details in fsck

2015-04-21 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7993:

Summary: Provide each Replica details in fsck  (was: Incorrect descriptions 
in fsck when nodes are decommissioned)

 Provide each Replica details in fsck
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8205) fs -count -q -t -v -h displays wrong information

2015-04-21 Thread Peter Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Shi updated HDFS-8205:

Assignee: Peter Shi
  Status: Patch Available  (was: Open)

 fs -count -q -t -v -h displays wrong information
 

 Key: HDFS-8205
 URL: https://issues.apache.org/jira/browse/HDFS-8205
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Peter Shi
Assignee: Peter Shi
Priority: Minor
 Attachments: HDFS-8205.patch


 {code}./hadoop fs -count -q -t -h -v /
QUOTA   REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT   
 FILE_COUNT   CONTENT_SIZE PATHNAME
 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets 
 dfs.client.block.write.replace-datanode-on-failure.replication to 0
 9223372036854775807 9223372036854775763none inf   
 31   13   1230 /{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8154) Extract WebHDFS protocol out as a specification to allow easier clients and servers

2015-04-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504618#comment-14504618
 ] 

Steve Loughran commented on HDFS-8154:
--

..no opinions; I think this would make a good experiment to see which worked 
best  integrated with both the development and build processes. Anything where 
the build could at least verify the specification was well formed  consistent 
would be nice

 Extract WebHDFS protocol out as a specification to allow easier clients and 
 servers
 ---

 Key: HDFS-8154
 URL: https://issues.apache.org/jira/browse/HDFS-8154
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: webhdfs
Reporter: Jakob Homan
Assignee: Jakob Homan

 WebHDFS would be more useful if there were a programmatic description of its 
 interface, which would allow one to more easily create servers and clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8205) fs -count -q -t -v -h displays wrong information

2015-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504685#comment-14504685
 ] 

Hadoop QA commented on HDFS-8205:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726825/HDFS-8205.patch
  against trunk revision d52de61.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-common-project/hadoop-common:

org.apache.hadoop.crypto.key.TestValueQueue

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10332//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10332//console

This message is automatically generated.

 fs -count -q -t -v -h displays wrong information
 

 Key: HDFS-8205
 URL: https://issues.apache.org/jira/browse/HDFS-8205
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Peter Shi
Assignee: Peter Shi
Priority: Minor
 Attachments: HDFS-8205.patch


 {code}./hadoop fs -count -q -t -h -v /
QUOTA   REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT   
 FILE_COUNT   CONTENT_SIZE PATHNAME
 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets 
 dfs.client.block.write.replace-datanode-on-failure.replication to 0
 9223372036854775807 9223372036854775763none inf   
 31   13   1230 /{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7993) Provide each Replica details in fsck

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504713#comment-14504713
 ] 

Hudson commented on HDFS-7993:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7623 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7623/])
HDFS-7993. Provide each Replica details in fsck (Contributed by J.Andreina) 
(vinayakumarb: rev 8ddbb8dd433862509bd9b222dddafe2c3a74778a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java


 Provide each Replica details in fsck
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.

2015-04-21 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Description: 
Balancer moves blocks between Datanode(Ver. 2.6 ).
Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the 
new version(Ver. =2.6) .
function
{code}
class DBlock extends LocationsStorageGroup
DBlock.isLocatedOn(StorageGroup loc)
{code}
is flawed, may causes 

 Balancer: 2 replicas ends in same node after running balance.
 -

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 is flawed, may causes 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8176) Provide information about the snapshots compared in audit log

2015-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504533#comment-14504533
 ] 

Hadoop QA commented on HDFS-8176:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726157/HDFS-8176.1.patch
  against trunk revision d52de61.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10328//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10328//console

This message is automatically generated.

 Provide information about the snapshots compared in audit log
 -

 Key: HDFS-8176
 URL: https://issues.apache.org/jira/browse/HDFS-8176
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: J.Andreina
Assignee: J.Andreina
 Attachments: HDFS-8176.1.patch


 Provide information about the snapshots compared in audit log. 
 In current code value null is been passed. 
 {code}
 logAuditEvent(diffs != null, computeSnapshotDiff, null, null, null);
 {code}
 {noformat}
 2015-04-15 09:56:49,328 INFO FSNamesystem.audit: allowed=true   ugi=Rex 
 (auth:SIMPLE)   ip=/Xcmd=computeSnapshotDiff src=null
 dst=nullperm=null   proto=rpc
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8205) fs -count -q -t -v -h displays wrong information

2015-04-21 Thread Peter Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504570#comment-14504570
 ] 

Peter Shi commented on HDFS-8205:
-

This bug is introduced by HDFS-7701, i will attach patch to fix it.

 fs -count -q -t -v -h displays wrong information
 

 Key: HDFS-8205
 URL: https://issues.apache.org/jira/browse/HDFS-8205
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Peter Shi
Priority: Minor

 {code}./hadoop fs -count -q -t -h -v /
QUOTA   REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT   
 FILE_COUNT   CONTENT_SIZE PATHNAME
 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets 
 dfs.client.block.write.replace-datanode-on-failure.replication to 0
 9223372036854775807 9223372036854775763none inf   
 31   13   1230 /{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-04-21 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504576#comment-14504576
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


[~steve_l], thanks for the review.
bq. is there a limit on the #of storage volumes in a cluster? does GET/ return 
all of them?
  Please see my discussion on limits above. The storage volumes are created by 
admins, therefore, are not expected to be too many.
bq. any way to enum users? e.g. GET /admin/user/ ?
  We don't plan to manage users in ozone. In this respect we deviate from 
popular public object stores. This is because, in a private cluster deployment 
the user management is usually tied to corporate user accounts. Instead we 
choose storage volume abstraction for certain administrative settings like 
quota. However, admins can choose to allocate a storage volume for each user.
bq. what if I want to GET the 1001st entry in an object store?
   Not sure I understand the use case. Do you mean the users would like to 
query using some sort of entry number or index?
bq. GET on object must support ranges
Agree, we plan to take up this feature in the 2nd phase.
bq.  HEAD should supply content-length
   This should be easily doable. We will keep it in mind for container 
implementation.

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Ozone-architecture-v1.pdf


 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8136) Client gets and uses EC schema when reads and writes a stripping file

2015-04-21 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki updated HDFS-8136:
-
Attachment: HDFS-8136.4.patch

 Client gets and uses EC schema when reads and writes a stripping file
 -

 Key: HDFS-8136
 URL: https://issues.apache.org/jira/browse/HDFS-8136
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Kai Zheng
Assignee: Kai Sasaki
 Attachments: HDFS-8136.1.patch, HDFS-8136.2.patch, HDFS-8136.3.patch, 
 HDFS-8136.4.patch


 Discussed with [~umamaheswararao] and [~vinayrpet], in client when reading 
 and writing a stripping file, it can invoke a separate call to NameNode to 
 request the EC schema associated with the EC zone where the file is in. Then 
 the schema can be used to guide the reading and writing. Currently it uses 
 hard-coded values.
 Optionally, as an optimization consideration, client may cache schema info 
 per file or per zone or per schema name. We could add schema name in 
 {{HdfsFileStatus}} for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8205) fs -count -q -t -v -h displays wrong information

2015-04-21 Thread Peter Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Shi updated HDFS-8205:

Attachment: HDFS-8205.patch

 fs -count -q -t -v -h displays wrong information
 

 Key: HDFS-8205
 URL: https://issues.apache.org/jira/browse/HDFS-8205
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Peter Shi
Priority: Minor
 Attachments: HDFS-8205.patch


 {code}./hadoop fs -count -q -t -h -v /
QUOTA   REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT   
 FILE_COUNT   CONTENT_SIZE PATHNAME
 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets 
 dfs.client.block.write.replace-datanode-on-failure.replication to 0
 9223372036854775807 9223372036854775763none inf   
 31   13   1230 /{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7993) Provide each Replica details in fsck

2015-04-21 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7993:

   Resolution: Fixed
Fix Version/s: 2.8.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed trunk and branch-2.
Thanks [~andreina] for the great contribution.
Thanks [~mingma] and [~cmccabe] for great suggestions and reviews.

 Provide each Replica details in fsck
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7621) Erasure Coding: update the Balancer/Mover data migration logic

2015-04-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504588#comment-14504588
 ] 

Walter Su commented on HDFS-7621:
-

I'm still reading the code, and thinking how to do it. By the way, I found a 
bug in balancer. HDFS-8204

 Erasure Coding: update the Balancer/Mover data migration logic
 --

 Key: HDFS-7621
 URL: https://issues.apache.org/jira/browse/HDFS-7621
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Walter Su

 Currently the Balancer/Mover only considers the distribution of replicas of 
 the same block during data migration: the migration cannot decrease the 
 number of racks. With EC the Balancer and Mover should also take into account 
 the distribution of blocks belonging to the same block group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8191) Fix byte to integer casting in SimulatedFSDataset#simulatedByte

2015-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504672#comment-14504672
 ] 

Hadoop QA commented on HDFS-8191:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726801/HDFS-8191.001.patch
  against trunk revision d52de61.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10330//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10330//console

This message is automatically generated.

 Fix byte to integer casting in SimulatedFSDataset#simulatedByte
 ---

 Key: HDFS-8191
 URL: https://issues.apache.org/jira/browse/HDFS-8191
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-8191.000.patch, HDFS-8191.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7993) Incorrect descriptions in fsck when nodes are decommissioned

2015-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504674#comment-14504674
 ] 

Hadoop QA commented on HDFS-7993:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726802/HDFS-7993.6.patch
  against trunk revision d52de61.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10331//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10331//console

This message is automatically generated.

 Incorrect descriptions in fsck when nodes are decommissioned
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-04-21 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504691#comment-14504691
 ] 

Jitendra Nath Pandey commented on HDFS-7240:


[~clamb], thanks for a detailed review and feedback.
Some of the answers are below, for others I will post the updated document with 
details as you have pointed out.

bq. Is the 1KB key size limit a hard limit or just a design/implementation 
target
 It is a design target. Amazon's S3 limits the keys to 1KB. I doubt there would 
be many use cases that need beyond it. I see the point that instead of hard 
limit allow for degradation. But at this point in the project, I would prefer 
to have more strict limits, and relax later instead of setting user 
expectations too high to begin with.
bq. Caching to reduce network traffic
I agree that a good caching layer will significantly help the performance. 
Ozone handler seems like a natural place for caching. However, a thick client 
can do its own caching without overloading datanodes. The focus of phase 1 is 
to get the semantics right and lay down the basic architecture in place. We 
plan to attack performance improvements in a later phase of the project.
bq. Security mechanisms
Frankly, I haven't thought about anything other than kerberos. I agree, we 
should evaluate it against what other popular object stores use.
bq. Hot spots in hash partitioning.
It is possible for a pathological sequence of keys, but in practice hash 
partitioning has been successfully used to avoid hot spots e.g. 
hash-partitioned indexes in databases. We would need to pick hash functions 
with nice distribution properties.
bq. Secondary indexing consistency
The secondary index need not be strictly consistent with the bucket. That means 
a listing operation with prefix or key range may not reflect the latest of the 
bucket. We will have a more concrete proposal in the second phase of the 
project.
bq. Storage volume GET for admin
  I believed that it is not a security concern in allowing users to see all 
storage volume names. However, it is possible to conceive a use case where an 
admin would want to restrict that. Probably we can support both the modes.

bq.  no guarantees on partially written objects
The object will not be visible until completely written. Also, no recovery 
is planned for the first phase if the write fails. In future, we would like to 
support multi-part uploads.

bq. Re-using block management implementation for container management.
We intend to reuse the DatanodeProtocol that datanode uses to talk to 
namenode. I will add more details to the document and on the corresponding jira.

bq. storage container prototype using leveldbjni
  We will add lot more details on this in its own jira. The idea is to use 
leveldbjni in the storage container in the datanodes. We plan to prototype a 
storage container that stores objects as individual files within the container 
however, that would need an index within the container to map a key to a file. 
We will use leveldbjni for that index.
  Another possible prototype is to put the entire object in the leveldbjni 
itself. It will take some experimentation to zero-down to the right approach. 
We will also try to make the storage container implementation pluggable to make 
it easy to try different implementations.
bq. How are quotas enabled and set? who enforces them
  All the Ozone APIs are implemented in ozone handler. The quota will also be 
enforced by the ozone handler. I will update the document with the APIs.

 Object store in HDFS
 

 Key: HDFS-7240
 URL: https://issues.apache.org/jira/browse/HDFS-7240
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Ozone-architecture-v1.pdf


 This jira proposes to add object store capabilities into HDFS. 
 As part of the federation work (HDFS-1052) we separated block storage as a 
 generic storage layer. Using the Block Pool abstraction, new kinds of 
 namespaces can be built on top of the storage layer i.e. datanodes.
 In this jira I will explore building an object store using the datanode 
 storage, but independent of namespace metadata.
 I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8179) DFSClient#getServerDefaults returns null within 1 hour of system start

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505075#comment-14505075
 ] 

Hudson commented on HDFS-8179:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #171 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/171/])
HDFS-8179. DFSClient#getServerDefaults returns null within 1 hour of system 
start. (Contributed by Xiaoyu Yao) (arp: rev 
c92f6f360515cc21ecb9b9f49b3e59537ef0cb05)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java


 DFSClient#getServerDefaults returns null within 1 hour of system start
 --

 Key: HDFS-8179
 URL: https://issues.apache.org/jira/browse/HDFS-8179
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8179.00.patch, HDFS-8179.01.patch


 We recently hit NPE during Ambari Oozie service check. The failed hdfs 
 command is below. It repros sometimes and then go away after the cluster runs 
 for a while.
 {code}
 [ambari-qa@c6401 ~]$ hadoop --config /etc/hadoop/conf fs -rm -r 
 /user/ambari-qa/mapredsmokeoutput
 rm: Failed to get server trash configuration: null. Consider using -skipTrash 
 option
 {code}
 With additional tracing, the failure was located to the following stack.
 {code}
 15/04/17 20:57:12 DEBUG fs.Trash: Failed to get server trash configuration
 java.lang.NullPointerException
   at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:86)
   at org.apache.hadoop.fs.shell.Delete$Rm.moveToTrash(Delete.java:117)
   at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:104)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:205)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:166)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
 rm: Failed to get server trash configuration: null. Consider using -skipTrash 
 option
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505074#comment-14505074
 ] 

Hudson commented on HDFS-7916:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #171 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/171/])
HDFS-7916. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes 
for infinite loop (Contributed by Vinayakumar B) (vinayakumarb: rev 
ed4137cebf27717e9c79eae515b0b83ab6676465)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for 
 infinite loop
 --

 Key: HDFS-7916
 URL: https://issues.apache.org/jira/browse/HDFS-7916
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-7916-01.patch


 if any badblock found, then BPSA for StandbyNode will go for infinite times 
 to report it.
 {noformat}2015-03-11 19:43:41,528 WARN 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
 stobdtserver3/10.224.54.70:18010
 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed 
 to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
 at 
 org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7993) Provide each Replica details in fsck

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505076#comment-14505076
 ] 

Hudson commented on HDFS-7993:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #171 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/171/])
HDFS-7993. Provide each Replica details in fsck (Contributed by J.Andreina) 
(vinayakumarb: rev 8ddbb8dd433862509bd9b222dddafe2c3a74778a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java


 Provide each Replica details in fsck
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7993) Provide each Replica details in fsck

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504745#comment-14504745
 ] 

Hudson commented on HDFS-7993:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #170 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/170/])
HDFS-7993. Provide each Replica details in fsck (Contributed by J.Andreina) 
(vinayakumarb: rev 8ddbb8dd433862509bd9b222dddafe2c3a74778a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java


 Provide each Replica details in fsck
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504743#comment-14504743
 ] 

Hudson commented on HDFS-7916:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #170 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/170/])
HDFS-7916. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes 
for infinite loop (Contributed by Vinayakumar B) (vinayakumarb: rev 
ed4137cebf27717e9c79eae515b0b83ab6676465)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for 
 infinite loop
 --

 Key: HDFS-7916
 URL: https://issues.apache.org/jira/browse/HDFS-7916
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-7916-01.patch


 if any badblock found, then BPSA for StandbyNode will go for infinite times 
 to report it.
 {noformat}2015-03-11 19:43:41,528 WARN 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
 stobdtserver3/10.224.54.70:18010
 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed 
 to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
 at 
 org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8179) DFSClient#getServerDefaults returns null within 1 hour of system start

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504744#comment-14504744
 ] 

Hudson commented on HDFS-8179:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #170 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/170/])
HDFS-8179. DFSClient#getServerDefaults returns null within 1 hour of system 
start. (Contributed by Xiaoyu Yao) (arp: rev 
c92f6f360515cc21ecb9b9f49b3e59537ef0cb05)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java


 DFSClient#getServerDefaults returns null within 1 hour of system start
 --

 Key: HDFS-8179
 URL: https://issues.apache.org/jira/browse/HDFS-8179
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8179.00.patch, HDFS-8179.01.patch


 We recently hit NPE during Ambari Oozie service check. The failed hdfs 
 command is below. It repros sometimes and then go away after the cluster runs 
 for a while.
 {code}
 [ambari-qa@c6401 ~]$ hadoop --config /etc/hadoop/conf fs -rm -r 
 /user/ambari-qa/mapredsmokeoutput
 rm: Failed to get server trash configuration: null. Consider using -skipTrash 
 option
 {code}
 With additional tracing, the failure was located to the following stack.
 {code}
 15/04/17 20:57:12 DEBUG fs.Trash: Failed to get server trash configuration
 java.lang.NullPointerException
   at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:86)
   at org.apache.hadoop.fs.shell.Delete$Rm.moveToTrash(Delete.java:117)
   at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:104)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:205)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:166)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
 rm: Failed to get server trash configuration: null. Consider using -skipTrash 
 option
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7993) Provide each Replica details in fsck

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505128#comment-14505128
 ] 

Hudson commented on HDFS-7993:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2120 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2120/])
HDFS-7993. Provide each Replica details in fsck (Contributed by J.Andreina) 
(vinayakumarb: rev 8ddbb8dd433862509bd9b222dddafe2c3a74778a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java


 Provide each Replica details in fsck
 

 Key: HDFS-7993
 URL: https://issues.apache.org/jira/browse/HDFS-7993
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: J.Andreina
 Fix For: 2.8.0

 Attachments: HDFS-7993.1.patch, HDFS-7993.2.patch, HDFS-7993.3.patch, 
 HDFS-7993.4.patch, HDFS-7993.5.patch, HDFS-7993.6.patch


 When you run fsck with -files or -racks, you will get something like 
 below if one of the replicas is decommissioned.
 {noformat}
 blk_x len=y repl=3 [dn1, dn2, dn3, dn4]
 {noformat}
 That is because in NamenodeFsck, the repl count comes from live replicas 
 count; while the actual nodes come from LocatedBlock which include 
 decommissioned nodes.
 Another issue in NamenodeFsck is BlockPlacementPolicy's verifyBlockPlacement 
 verifies LocatedBlock that includes decommissioned nodes. However, it seems 
 better to exclude the decommissioned nodes in the verification; just like how 
 fsck excludes decommissioned nodes when it check for under replicated blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8179) DFSClient#getServerDefaults returns null within 1 hour of system start

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505127#comment-14505127
 ] 

Hudson commented on HDFS-8179:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2120 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2120/])
HDFS-8179. DFSClient#getServerDefaults returns null within 1 hour of system 
start. (Contributed by Xiaoyu Yao) (arp: rev 
c92f6f360515cc21ecb9b9f49b3e59537ef0cb05)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/TrashPolicyDefault.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Trash.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java


 DFSClient#getServerDefaults returns null within 1 hour of system start
 --

 Key: HDFS-8179
 URL: https://issues.apache.org/jira/browse/HDFS-8179
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
Priority: Blocker
 Fix For: 2.7.1

 Attachments: HDFS-8179.00.patch, HDFS-8179.01.patch


 We recently hit NPE during Ambari Oozie service check. The failed hdfs 
 command is below. It repros sometimes and then go away after the cluster runs 
 for a while.
 {code}
 [ambari-qa@c6401 ~]$ hadoop --config /etc/hadoop/conf fs -rm -r 
 /user/ambari-qa/mapredsmokeoutput
 rm: Failed to get server trash configuration: null. Consider using -skipTrash 
 option
 {code}
 With additional tracing, the failure was located to the following stack.
 {code}
 15/04/17 20:57:12 DEBUG fs.Trash: Failed to get server trash configuration
 java.lang.NullPointerException
   at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:86)
   at org.apache.hadoop.fs.shell.Delete$Rm.moveToTrash(Delete.java:117)
   at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:104)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:205)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:166)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
 rm: Failed to get server trash configuration: null. Consider using -skipTrash 
 option
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop

2015-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505126#comment-14505126
 ] 

Hudson commented on HDFS-7916:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2120 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2120/])
HDFS-7916. 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes 
for infinite loop (Contributed by Vinayakumar B) (vinayakumarb: rev 
ed4137cebf27717e9c79eae515b0b83ab6676465)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for 
 infinite loop
 --

 Key: HDFS-7916
 URL: https://issues.apache.org/jira/browse/HDFS-7916
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-7916-01.patch


 if any badblock found, then BPSA for StandbyNode will go for infinite times 
 to report it.
 {noformat}2015-03-11 19:43:41,528 WARN 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
 stobdtserver3/10.224.54.70:18010
 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed 
 to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
 at 
 org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8200) Refactor FSDirStatAndListingOp

2015-04-21 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-8200:
-
Attachment: HDFS-8200.001.patch

 Refactor FSDirStatAndListingOp
 --

 Key: HDFS-8200
 URL: https://issues.apache.org/jira/browse/HDFS-8200
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8200.000.patch, HDFS-8200.001.patch


 After HDFS-6826 several functions in {{FSDirStatAndListingOp}} are dead. This 
 jira proposes to clean them up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505797#comment-14505797
 ] 

Colin Patrick McCabe commented on HDFS-8213:


Hi Billie,

{{DFSClient}} needs to instantiate {{SpanReceiverHost}} in order to implement 
tracing, in the case where the process using the {{DFSClient}} doesn't 
configure its own span receivers.

If you are concerned about multiple span receivers being instantiated, simply 
set {{hadoop.htrace.span.receiver.classes}} to the empty string, and Hadoop 
won't instantiate any span receivers.  That should be its default anyway.

 DFSClient should not instantiate SpanReceiverHost
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Priority: Critical

 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-21 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505854#comment-14505854
 ] 

Billie Rinaldi commented on HDFS-8213:
--

If span receiver initialization in DFSClient is important to the use of the 
hadoop.htrace.sampler configuration property, perhaps a compromise would be to 
perform SpanReceiverHost.getInstance only when the sampler is set to something 
other than NeverSampler.

 DFSClient should not instantiate SpanReceiverHost
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Priority: Critical

 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8156) Add/implement necessary APIs even we just have the system default schema

2015-04-21 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8156:

Attachment: HDFS-8156-v7.patch

How about this one, all your comments addressed.

 Add/implement necessary APIs even we just have the system default schema
 

 Key: HDFS-8156
 URL: https://issues.apache.org/jira/browse/HDFS-8156
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HDFS-8156-v1.patch, HDFS-8156-v2.patch, 
 HDFS-8156-v3.patch, HDFS-8156-v4.patch, HDFS-8156-v5.patch, 
 HDFS-8156-v6.patch, HDFS-8156-v7.patch


 According to the discussion here, this issue was repurposed and modified.
 This is to add and implement some necessary APIs even we just have the system 
 default schema, to resolve some TODOs left for HDFS-7859 and HDFS-7866 as 
 they're still subject to further discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7687) Change fsck to support EC files

2015-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505896#comment-14505896
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7687:
---

Yes, refactoring in trunk first.

 Change fsck to support EC files
 ---

 Key: HDFS-7687
 URL: https://issues.apache.org/jira/browse/HDFS-7687
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Takanobu Asanuma

 We need to change fsck so that it can detect under replicated and corrupted 
 EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8156) Add/implement necessary APIs even we just have the system default schema

2015-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505663#comment-14505663
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8156:
---

- Please do not add extractChunkSize().  Similar to initWith(..), it is 
unnecessary.
- The other fields numDataUnits,  numParityUnits and chunkSize should also be 
final.
- The javadoc is still incorrect.

 Add/implement necessary APIs even we just have the system default schema
 

 Key: HDFS-8156
 URL: https://issues.apache.org/jira/browse/HDFS-8156
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HDFS-8156-v1.patch, HDFS-8156-v2.patch, 
 HDFS-8156-v3.patch, HDFS-8156-v4.patch, HDFS-8156-v5.patch, HDFS-8156-v6.patch


 According to the discussion here, this issue was repurposed and modified.
 This is to add and implement some necessary APIs even we just have the system 
 default schema, to resolve some TODOs left for HDFS-7859 and HDFS-7866 as 
 they're still subject to further discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8194) Add administrative tool to be able to examine the NN's view of DN storages

2015-04-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505748#comment-14505748
 ] 

Chris Nauroth commented on HDFS-8194:
-

There could be some potential overlap here with the work done in HDFS-7604, 
although that feature specifically reported only on volumes/storages that had 
failed, not all volumes.

 Add administrative tool to be able to examine the NN's view of DN storages
 --

 Key: HDFS-8194
 URL: https://issues.apache.org/jira/browse/HDFS-8194
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe

 The NN has long had facilities to be able to list all of the DNs that are 
 registered with it. It would be great if there were an administrative tool be 
 able to list all of the individual storages that the NN is tracking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7005) DFS input streams do not timeout

2015-04-21 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505636#comment-14505636
 ] 

Nick Dimiduk commented on HDFS-7005:


Any chance of bringing this to a 2.5.x patch release? Over on HBASE-13339 we're 
trying to work out how best to support users with minimal impact on 
dependencies for our next minor release (1.1). Bumping Hadoop minor versions (I 
think) will break our semantic versioning compatibility guidelines.

FYI [~eclark], [~busbey], [~cnauroth]

 DFS input streams do not timeout
 

 Key: HDFS-7005
 URL: https://issues.apache.org/jira/browse/HDFS-7005
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.0, 2.5.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-7005.patch


 Input streams lost their timeout.  The problem appears to be 
 {{DFSClient#newConnectedPeer}} does not set the read timeout.  During a 
 temporary network interruption the server will close the socket, unbeknownst 
 to the client host, which blocks on a read forever.
 The results are dire.  Services such as the RM, JHS, NMs, oozie servers, etc 
 all need to be restarted to recover - unless you want to wait many hours for 
 the tcp stack keepalive to detect the broken socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7005) DFS input streams do not timeout

2015-04-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505722#comment-14505722
 ] 

Chris Nauroth commented on HDFS-7005:
-

Hi [~ndimiduk].  I'm not aware of any plans for a 2.5.3 patch release.  To do 
so, we'd need someone to volunteer as release manager and conduct a vote on a 
release candidate.  [~kasha], I'm notifying you just FYI, since you had been 
release manager previously on the 2.5.x release line.

 DFS input streams do not timeout
 

 Key: HDFS-7005
 URL: https://issues.apache.org/jira/browse/HDFS-7005
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.0, 2.5.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-7005.patch


 Input streams lost their timeout.  The problem appears to be 
 {{DFSClient#newConnectedPeer}} does not set the read timeout.  During a 
 temporary network interruption the server will close the socket, unbeknownst 
 to the client host, which blocks on a read forever.
 The results are dire.  Services such as the RM, JHS, NMs, oozie servers, etc 
 all need to be restarted to recover - unless you want to wait many hours for 
 the tcp stack keepalive to detect the broken socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-21 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created HDFS-8213:


 Summary: DFSClient should not instantiate SpanReceiverHost
 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Priority: Critical


DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
SpanReceivers through its own configuration.  This results in the same 
receivers being registered multiple times and spans being delivered more than 
once.  The documentation says SpanReceiverHost.getInstance should be issued 
once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode

2015-04-21 Thread Nate Edel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate Edel updated HDFS-8078:

Status: Open  (was: Patch Available)

 HDFS client gets errors trying to to connect to IPv6 DataNode
 -

 Key: HDFS-8078
 URL: https://issues.apache.org/jira/browse/HDFS-8078
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Nate Edel
Assignee: Nate Edel
  Labels: ipv6
 Attachments: HDFS-8078.4.patch


 1st exception, on put:
 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception
 java.lang.IllegalArgumentException: Does not contain a valid host:port 
 authority: 2401:db00:1010:70ba:face:0:8:0:50010
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212)
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
 Appears to actually stem from code in DataNodeID which assumes it's safe to 
 append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for 
 IPv6.  NetUtils.createSocketAddr( ) assembles a Java URI object, which 
 requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010
 Currently using InetAddress.getByName() to validate IPv6 (guava 
 InetAddresses.forString has been flaky) but could also use our own parsing. 
 (From logging this, it seems like a low-enough frequency call that the extra 
 object creation shouldn't be problematic, and for me the slight risk of 
 passing in bad input that is not actually an IPv4 or IPv6 address and thus 
 calling an external DNS lookup is outweighed by getting the address 
 normalized and avoiding rewriting parsing.)
 Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress()
 ---
 2nd exception (on datanode)
 15/04/13 13:18:07 ERROR datanode.DataNode: 
 dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown 
 operation  src: /2401:db00:20:7013:face:0:7:0:54152 dst: 
 /2401:db00:11:d010:face:0:2f:0:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
 at java.lang.Thread.run(Thread.java:745)
 Which also comes as client error -get: 2401 is not an IP string literal.
 This one has existing parsing logic which needs to shift to the last colon 
 rather than the first.  Should also be a tiny bit faster by using lastIndexOf 
 rather than split.  Could alternatively use the techniques above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7687) Change fsck to support EC files

2015-04-21 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505877#comment-14505877
 ] 

Takanobu Asanuma commented on HDFS-7687:


Thanks for your review, Nicholas!

bq. A Corrupt EC block group could have = 6 blocks but some of the blocks are 
corrupted.
bq. Yes, it is possible. E.g. a datanode D0 dies and a EC block in D0 is 
reconstructed in another datanode D1. Later on, D0 comes back. Then, both D0 
and D1 have the same EC block and the block group could have more than 9 blocks.
OK, I understand.

bq. For #1, see if you want to create a JIRA for trunk to do some refactoring 
first.
You mean, if it needs to do some refactoring, we should do refactoring in trunk 
branch first before we add the logic to handle EC?

 Change fsck to support EC files
 ---

 Key: HDFS-7687
 URL: https://issues.apache.org/jira/browse/HDFS-7687
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Takanobu Asanuma

 We need to change fsck so that it can detect under replicated and corrupted 
 EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-21 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505847#comment-14505847
 ] 

Billie Rinaldi commented on HDFS-8213:
--

As documented, each process must configure its own span receivers if it wants 
to use tracing.  If I set hadoop.htrace.span.receiver.classes to the empty 
string, then the NameNode and DataNode will not do any tracing.

 DFSClient should not instantiate SpanReceiverHost
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Priority: Critical

 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost

2015-04-21 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505860#comment-14505860
 ] 

Nick Dimiduk commented on HDFS-8213:


I think [~billie.rinaldi] is correct here; the client should not instantiate 
it's own SpanReceiverHost, but instead depend on the process in which it 
resides to provide. This is how HBase client works as well.

 DFSClient should not instantiate SpanReceiverHost
 -

 Key: HDFS-8213
 URL: https://issues.apache.org/jira/browse/HDFS-8213
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Billie Rinaldi
Priority: Critical

 DFSClient initializing SpanReceivers is a problem for Accumulo, which manages 
 SpanReceivers through its own configuration.  This results in the same 
 receivers being registered multiple times and spans being delivered more than 
 once.  The documentation says SpanReceiverHost.getInstance should be issued 
 once per process, so there is no expectation that DFSClient should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7687) Change fsck to support EC files

2015-04-21 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505925#comment-14505925
 ] 

Takanobu Asanuma commented on HDFS-7687:


I understand. Thank you.

 Change fsck to support EC files
 ---

 Key: HDFS-7687
 URL: https://issues.apache.org/jira/browse/HDFS-7687
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Takanobu Asanuma

 We need to change fsck so that it can detect under replicated and corrupted 
 EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8185) Separate client related routines in HAUtil into a new class

2015-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505867#comment-14505867
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8185:
---

- Both DFSUtilClient.locatedBlocks2Locations methods are current not used.  
Please move them later.
- DFS_NAMENODE_HTTP_ADDRESS_KEY, DFS_NAMENODE_HTTP_ADDRESS_KEY, etc are 
namenode confs.
-* In DFSConfigKeys, set DFS_NAMENODE_HTTP_ADDRESS_KEY = 
HdfsClientConfigKeys.DFS_NAMENODE_HTTP_PORT_DEFAULT but not deprecate them
-* Namenode, datanode, etc. should keep using the 
DFSConfigKeys.DFS_NAMENODE_HTTP_ADDRESS_KEY
-* Client will uses HdfsClientConfigKeys.DFS_NAMENODE_HTTP_ADDRESS_KEY.



 Separate client related routines in HAUtil into a new class
 ---

 Key: HDFS-8185
 URL: https://issues.apache.org/jira/browse/HDFS-8185
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8185.000.patch, HDFS-8185.001.patch


 This jira proposes to move the routines used by the client implementation in 
 HAUtil to a separate class and to move them into the hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7687) Change fsck to support EC files

2015-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505897#comment-14505897
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7687:
---

Yes, refactoring in trunk first.

 Change fsck to support EC files
 ---

 Key: HDFS-7687
 URL: https://issues.apache.org/jira/browse/HDFS-7687
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Takanobu Asanuma

 We need to change fsck so that it can detect under replicated and corrupted 
 EC files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7005) DFS input streams do not timeout

2015-04-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505739#comment-14505739
 ] 

Karthik Kambatla commented on HDFS-7005:


Thanks for the ping, [~cnauroth]. 

[~ndimiduk] - there are no active plans for 2.5.3. If HDFS committers think 
this issue is serious enough to warrant a point release, I don't mind creating 
the RC and putting it through a vote. 

 DFS input streams do not timeout
 

 Key: HDFS-7005
 URL: https://issues.apache.org/jira/browse/HDFS-7005
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.0, 2.5.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Fix For: 2.6.0

 Attachments: HDFS-7005.patch


 Input streams lost their timeout.  The problem appears to be 
 {{DFSClient#newConnectedPeer}} does not set the read timeout.  During a 
 temporary network interruption the server will close the socket, unbeknownst 
 to the client host, which blocks on a read forever.
 The results are dire.  Services such as the RM, JHS, NMs, oozie servers, etc 
 all need to be restarted to recover - unless you want to wait many hours for 
 the tcp stack keepalive to detect the broken socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8118) Delay in checkpointing Trash can leave trash for 2 intervals before deleting

2015-04-21 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505791#comment-14505791
 ] 

Harsh J commented on HDFS-8118:
---

Thanks for explaining that Casey. It makes sense to constant-ise the checkpoint 
date for uniformity - and the fix for this looks alright to me.

It also may make sense that people want to set checkpoint intervals equal to 
the trash intervals. I think we can remove the change in the patch of capping 
it to 1/2 the value of intervals, but just add a small doc note in 
hdfs-default.xml to the trash checkpoint period property on what the behaviour 
could end up being if its set to equal of the trash clearing interval.

Would it also be possible to come up with a test-case for this? For example, 
load some files into trash such that multiple dirs need to be checkpointed, and 
issue a checkpoint (or await its lowered interval) and ensure only one date is 
observed before clearing occurs? It would help avoid regressions in future, 
just in case.

 Delay in checkpointing Trash can leave trash for 2 intervals before deleting
 

 Key: HDFS-8118
 URL: https://issues.apache.org/jira/browse/HDFS-8118
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Casey Brotherton
Assignee: Casey Brotherton
Priority: Trivial
 Attachments: HDFS-8118.patch


 When the fs.trash.checkpoint.interval and the fs.trash.interval are set 
 non-zero and the same, it is possible for trash to be left for two intervals.
 The TrashPolicyDefault will use a floor and ceiling function to ensure that 
 the Trash will be checkpointed every interval of minutes.
 Each user's trash is checkpointed individually.  The time resolution of the 
 checkpoint timestamp is to the second.
 If the seconds switch while one user is checkpointing, then the next user's 
 timestamp will be later.
 This will cause the next user's checkpoint to not be deleted at the next 
 interval.
 I have recreated this in a lab cluster 
 I also have a suggestion for a patch that I can upload later tonight after 
 testing it further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode

2015-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505884#comment-14505884
 ] 

Hadoop QA commented on HDFS-8078:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726941/HDFS-8078.4.patch
  against trunk revision 424a00d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-client.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10334//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10334//console

This message is automatically generated.

 HDFS client gets errors trying to to connect to IPv6 DataNode
 -

 Key: HDFS-8078
 URL: https://issues.apache.org/jira/browse/HDFS-8078
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Nate Edel
Assignee: Nate Edel
  Labels: ipv6
 Attachments: HDFS-8078.4.patch


 1st exception, on put:
 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception
 java.lang.IllegalArgumentException: Does not contain a valid host:port 
 authority: 2401:db00:1010:70ba:face:0:8:0:50010
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212)
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
 Appears to actually stem from code in DataNodeID which assumes it's safe to 
 append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for 
 IPv6.  NetUtils.createSocketAddr( ) assembles a Java URI object, which 
 requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010
 Currently using InetAddress.getByName() to validate IPv6 (guava 
 InetAddresses.forString has been flaky) but could also use our own parsing. 
 (From logging this, it seems like a low-enough frequency call that the extra 
 object creation shouldn't be problematic, and for me the slight risk of 
 passing in bad input that is not actually an IPv4 or IPv6 address and thus 
 calling an external DNS lookup is outweighed by getting the address 
 normalized and avoiding rewriting parsing.)
 Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress()
 ---
 2nd exception (on datanode)
 15/04/13 13:18:07 ERROR datanode.DataNode: 
 dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown 
 operation  src: /2401:db00:20:7013:face:0:7:0:54152 dst: 
 /2401:db00:11:d010:face:0:2f:0:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
 at java.lang.Thread.run(Thread.java:745)
 Which also comes as client error -get: 2401 is not an IP string literal.
 This one has existing parsing logic which needs to shift to the last colon 
 rather than the first.  Should also be a tiny bit faster by using lastIndexOf 
 rather than split.  Could alternatively use the techniques above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8211) DataNode UUID is always null in the JMX counter

2015-04-21 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505968#comment-14505968
 ] 

Anu Engineer commented on HDFS-8211:


[~aw] would you like to take a look at this to see if this is related to the 
new changes to build ?


 DataNode UUID is always null in the JMX counter
 ---

 Key: HDFS-8211
 URL: https://issues.apache.org/jira/browse/HDFS-8211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.7.1
Reporter: Anu Engineer
Assignee: Anu Engineer
Priority: Minor

 The DataNode JMX counters are tagged with DataNode UUID, but it always gets a 
 null value instead of the UUID.
 {code}
 Hadoop:service=DataNode,name=FSDatasetState*-null*.
 {code}
 This null is supposed be the datanode UUID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8211) DataNode UUID is always null in the JMX counter

2015-04-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-8211:
---
Status: Patch Available  (was: Open)

 DataNode UUID is always null in the JMX counter
 ---

 Key: HDFS-8211
 URL: https://issues.apache.org/jira/browse/HDFS-8211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.7.1
Reporter: Anu Engineer
Assignee: Anu Engineer
Priority: Minor
 Attachments: hdfs-8211.001.patch


 The DataNode JMX counters are tagged with DataNode UUID, but it always gets a 
 null value instead of the UUID.
 {code}
 Hadoop:service=DataNode,name=FSDatasetState*-null*.
 {code}
 This null is supposed be the datanode UUID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8211) DataNode UUID is always null in the JMX counter

2015-04-21 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505982#comment-14505982
 ] 

Anu Engineer commented on HDFS-8211:


Also verified that FSDatasetState-UUID appears correctly using jconsole.

 DataNode UUID is always null in the JMX counter
 ---

 Key: HDFS-8211
 URL: https://issues.apache.org/jira/browse/HDFS-8211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.7.1
Reporter: Anu Engineer
Assignee: Anu Engineer
Priority: Minor
 Attachments: hdfs-8211.001.patch


 The DataNode JMX counters are tagged with DataNode UUID, but it always gets a 
 null value instead of the UUID.
 {code}
 Hadoop:service=DataNode,name=FSDatasetState*-null*.
 {code}
 This null is supposed be the datanode UUID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8211) DataNode UUID is always null in the JMX counter

2015-04-21 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506017#comment-14506017
 ] 

Brahma Reddy Battula commented on HDFS-8211:


Nice catch, Patch,LGTM +1 ( non binding)

 DataNode UUID is always null in the JMX counter
 ---

 Key: HDFS-8211
 URL: https://issues.apache.org/jira/browse/HDFS-8211
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.7.1
Reporter: Anu Engineer
Assignee: Anu Engineer
Priority: Minor
 Attachments: hdfs-8211.001.patch


 The DataNode JMX counters are tagged with DataNode UUID, but it always gets a 
 null value instead of the UUID.
 {code}
 Hadoop:service=DataNode,name=FSDatasetState*-null*.
 {code}
 This null is supposed be the datanode UUID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8185) Separate client related routines in HAUtil into a new class

2015-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8185:
--
 Component/s: hdfs-client
Hadoop Flags: Reviewed

+1 the new patch looks good.  Thanks, Haohui.

 Separate client related routines in HAUtil into a new class
 ---

 Key: HDFS-8185
 URL: https://issues.apache.org/jira/browse/HDFS-8185
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build, hdfs-client
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8185.000.patch, HDFS-8185.001.patch, 
 HDFS-8185.002.patch


 This jira proposes to move the routines used by the client implementation in 
 HAUtil to a separate class and to move them into the hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8216) TestDFSStripedOutputStream should use BlockReaderTestUtil to create BlockReader

2015-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8216:
--
Attachment: h8216_20150421.patch

h8216_20150421.patch: use BlockReaderTestUtil.getBlockReader.

 TestDFSStripedOutputStream should use BlockReaderTestUtil to create 
 BlockReader
 ---

 Key: HDFS-8216
 URL: https://issues.apache.org/jira/browse/HDFS-8216
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h8216_20150421.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8204) Balancer: 2 replicas ends in same node after running balance.

2015-04-21 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su resolved HDFS-8204.
-
Resolution: Duplicate

 Balancer: 2 replicas ends in same node after running balance.
 -

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 is flawed, may causes 2 replicas ends in same node after running balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8156) Add/implement necessary APIs even we just have the system default schema

2015-04-21 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-8156:

Attachment: HDFS-8156-v8.patch

Thanks for the review and comments.
bq.Should the DEFAULT_CODEC_NAME be RS? 
Currently we have no chance to use the codec name yet. In the codec framework 
we have {{RSErasureCodec}} for the code, and the codec name RS would help us 
locate it in all the maintained codecs map. 

Updated the patch adding extractIntOption method to handle the repeated logic 
and error handling together. A more review? Thanks!

 Add/implement necessary APIs even we just have the system default schema
 

 Key: HDFS-8156
 URL: https://issues.apache.org/jira/browse/HDFS-8156
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HDFS-8156-v1.patch, HDFS-8156-v2.patch, 
 HDFS-8156-v3.patch, HDFS-8156-v4.patch, HDFS-8156-v5.patch, 
 HDFS-8156-v6.patch, HDFS-8156-v7.patch, HDFS-8156-v8.patch


 According to the discussion here, this issue was repurposed and modified.
 This is to add and implement some necessary APIs even we just have the system 
 default schema, to resolve some TODOs left for HDFS-7859 and HDFS-7866 as 
 they're still subject to further discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8147) Mover should not select the DN storage as target where already same replica exists.

2015-04-21 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8147:

Attachment: HDFS-8147_2.patch

 Mover should not select the DN storage as target where already same replica 
 exists.
 ---

 Key: HDFS-8147
 URL: https://issues.apache.org/jira/browse/HDFS-8147
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Affects Versions: 2.6.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
 Attachments: HDFS-8147.patch, HDFS-8147_1.patch, HDFS-8147_2.patch


 *Scenario:*
 1. Three DN cluster.  For DNs storage type is like this.
 DN1 : DISK,ARCHIVE
 DN2 : DISK
 DN3 : DISK,ARCHIVE (All DNs are in same rack)
 2. One file with two replicas (In DN1 and DN2)
 3. Set file storage policy COLD
 4. Now execute Mover.
 *Expected Result:* File blocks should move in DN1:ARCHIVE and DN3:ARCHIVE
 *Actual Result:* {{chooseTargetInSameNode()}} move D1:DISK block to 
 D1:ARCHIVE, but in next iteration {{chooseTarget()}} for same rake is 
 selecting again DN1:ARCHIVE for target where already same block exists.
 {{chooseTargetInSameNode()}} and {{chooseTarget()}} should not select the 
 node as target where already same replica exists.
 *Logs*
 {code}
 15/04/15 10:47:17 WARN balancer.Dispatcher: Failed to move 
 blk_1073741852_1028 with size=11990 from 10.19.92.74:50010:DISK to 
 10.19.92.73:50010:ARCHIVE through 10.19.92.73:50010: Got error, status 
 message opReplaceBlock 
 BP-1258709199-10.19.92.74-1428292615636:blk_1073741852_1028 received 
 exception 
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Replica 
 FinalizedReplica, blk_1073741852_1028, FINALIZED
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >