[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-24 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7547:

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
> -
>
> Key: HDFS-7547
> URL: https://issues.apache.org/jira/browse/HDFS-7547
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: HDFS-7547.001.patch
>
>
> HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
> toString method to keep the old desc string format, test 
> TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, 
> so this test always fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7339) NameNode support for erasure coding block groups

2014-12-24 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7339:

Summary: NameNode support for erasure coding block groups  (was: Create 
block groups for initial block encoding)

> NameNode support for erasure coding block groups
> 
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding

2014-12-24 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7339:

Attachment: NN-stripping.jpg

{{ECManager}} design.

> Create block groups for initial block encoding
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, Meta-striping.jpg, NN-stripping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding

2014-12-24 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7339:

Attachment: (was: Encoding-design-NN.jpg)

> Create block groups for initial block encoding
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, Meta-striping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding

2014-12-24 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7339:

Attachment: Meta-striping.jpg

Extending {{INodeFile}} for striping.

> Create block groups for initial block encoding
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-7339-001.patch, Meta-striping.jpg
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding

2014-12-24 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7339:

Status: Open  (was: Patch Available)

Will update patch for striping design

> Create block groups for initial block encoding
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: Encoding-design-NN.jpg, HDFS-7339-001.patch
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creates and maintains {{BlockGroup}} instances through the new 
> {{ECManager}} component; the attached figure has an illustration of the 
> architecture. As a simple example, when a {_Striping+EC_} file is created and 
> written to, it will serve requests from the client to allocate new 
> {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
> {{BlockGroups}} are allocated both in initial online encoding and in the 
> conversion from replication to EC. {{ECManager}} also facilitates the lookup 
> of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding

2014-12-24 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7339:

Description: 
All erasure codec operations center around the concept of _block group_; they 
are formed in initial encoding and looked up in recoveries and conversions. A 
lightweight class {{BlockGroup}} is created to record the original and parity 
blocks in a coding group, as well as a pointer to the codec schema (pluggable 
codec schemas will be supported in HDFS-7337). With the striping layout, the 
HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
Therefore we propose to extend a file’s inode to switch between _contiguous_ 
and _striping_ modes, with the current mode recorded in a binary flag. An array 
of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
“traditional” HDFS files with contiguous block layout.

The NameNode creates and maintains {{BlockGroup}} instances through the new 
{{ECManager}} component; the attached figure has an illustration of the 
architecture. As a simple example, when a {_Striping+EC_} file is created and 
written to, it will serve requests from the client to allocate new 
{{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
{{BlockGroups}} are allocated both in initial online encoding and in the 
conversion from replication to EC. {{ECManager}} also facilitates the lookup of 
{{BlockGroup}} information for block recovery work.

  was:
All erasure codec operations center around the concept of _block groups_, which 
are formed in encoding and looked up in decoding. This JIRA creates a 
lightweight {{BlockGroup}} class to record the original and parity blocks in an 
encoding group, as well as a pointer to the codec schema. Pluggable codec 
schemas will be supported in HDFS-7337. 

The NameNode creates and maintains {{BlockGroup}} instances through 2 new 
components; the attached figure has an illustration of the architecture.

{{ECManager}}: This module manages {{BlockGroups}} and associated codec 
schemas. As a simple example, it stores the codec schema of Reed-Solomon 
algorithm with 3 original and 2 parity blocks (5 blocks in each group). Each 
{{BlockGroup}} points to the schema it uses. To facilitate lookups during 
recovery requests, {{BlockGroups}} should be oraganized as a map keyed by 
{{Blocks}}.

{{ErasureCodingBlocks}}: Block encoding work is triggered by multiple events. 
This module analyzes the incoming events, and dispatches tasks to 
{{UnderReplicatedBlocks}} to create parity blocks. A new queue 
({{QUEUE_INITIAL_ENCODING}}) will be added to the 5 existing priority queues to 
maintain the relative order of encoding and replication tasks.
* Whenever a block is finalized and meets EC criteria -- including 1) block 
size is full; 2) the file’s storage policy allows EC -- {{ErasureCodingBlocks}} 
tries to form a {{BlockGroup}}. In order to do so it needs to store a set of 
blocks waiting to be encoded. Different grouping algorithms can be applied -- 
e.g., always grouping blocks in the same file. Blocks in a group should also 
reside on different DataNodes, and ideally on different racks, to tolerate node 
and rack failures. If successful, it records the formed group with 
{{ECManager}} and insert the parity blocks into {{QUEUE_INITIAL_ENCODING}}.
* When a parity block or a raw block in {{ENCODED}} state is found missing, 
{{ErasureCodingBlocks}} adds it to existing priority queues in 
{{UnderReplicatedBlocks}}. E.g., if all parity blocks in a group are lost, they 
should be added to {{QUEUE_HIGHEST_PRIORITY}}. New priorities might be added 
for fine grained differentiation (e.g., loss of a raw block versus a parity 
one).


> Create block groups for initial block encoding
> --
>
> Key: HDFS-7339
> URL: https://issues.apache.org/jira/browse/HDFS-7339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: Encoding-design-NN.jpg, HDFS-7339-001.patch
>
>
> All erasure codec operations center around the concept of _block group_; they 
> are formed in initial encoding and looked up in recoveries and conversions. A 
> lightweight class {{BlockGroup}} is created to record the original and parity 
> blocks in a coding group, as well as a pointer to the codec schema (pluggable 
> codec schemas will be supported in HDFS-7337). With the striping layout, the 
> HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
> Therefore we propose to extend a file’s inode to switch between _contiguous_ 
> and _striping_ modes, with the current mode recorded in a binary flag. An 
> array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
> “traditional” HDFS files with contiguous block layout.
> The NameNode creat

[jira] [Created] (HDFS-7570) DataXceiver could leak FileDescriptor

2014-12-24 Thread Juan Yu (JIRA)
Juan Yu created HDFS-7570:
-

 Summary: DataXceiver could leak FileDescriptor
 Key: HDFS-7570
 URL: https://issues.apache.org/jira/browse/HDFS-7570
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Juan Yu


DataXceiver doesn't close inputstream all the time, There could be FD leakage 
and overtime cause FDs exceed limit.

{code}
finally {
  if (LOG.isDebugEnabled()) {
LOG.debug(datanode.getDisplayName() + ":Number of active connections 
is: "
+ datanode.getXceiverCount());
  }
  updateCurrentThreadName("Cleaning up");
  if (peer != null) {
dataXceiverServer.closePeer(peer);
IOUtils.closeStream(in);
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7569) BlockReceiver did not close ReplicaOuputStreams

2014-12-24 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7569:

Resolution: Invalid
Status: Resolved  (was: Patch Available)

Two {{FileOutputStreams}} are exposed to {{BlockReceiver}} and are closed there 
instead.

> BlockReceiver did not close ReplicaOuputStreams 
> 
>
> Key: HDFS-7569
> URL: https://issues.apache.org/jira/browse/HDFS-7569
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7569.000.patch
>
>
> {{BlockReceiver#streams}} is a {{ReplicaOutputStreams}}, which holds two 
> {{FileOutputStream}}s, i.e., {{ReplicaOutputStream#dataOut}} and 
> {{checksumOut}}. The {{ReplicaOutputStreams#close}} is never be called in 
> non-test code to close these two streams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7040) HDFS dangerously uses @Beta methods from very old versions of Guava

2014-12-24 Thread Christopher Tubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs updated HDFS-7040:

  Resolution: Won't Fix
Target Version/s:   (was: 3.0.0, 2.6.0, 2.7.0, 2.5.1)
  Status: Resolved  (was: Patch Available)

This issue is essentially fixed with HADOOP-11286 for 2.6 and later, so there's 
no reason to keep it open, unless somebody intends to fix versions 2.4 and 2.5, 
which I would argue is probably not worth the effort. So, I'm going to close it.

> HDFS dangerously uses @Beta methods from very old versions of Guava
> ---
>
> Key: HDFS-7040
> URL: https://issues.apache.org/jira/browse/HDFS-7040
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
>Reporter: Christopher Tubbs
>  Labels: beta, deprecated, guava
> Attachments: 0001-HDFS-7040-Avoid-beta-LimitInputStream-in-Guava.patch
>
>
> HDFS uses LimitInputStream from Guava. This was introduced as @Beta and is 
> risky for any application to use.
> The problem is further exacerbated by Hadoop's dependency on Guava version 
> 11.0.2, which is quite old for an active project (Feb. 2012).
> Because Guava is very stable, projects which depend on Hadoop and use Guava 
> themselves, can use up through Guava version 14.x
> However, in version 14, Guava deprecated LimitInputStream and provided a 
> replacement. Because they make no guarantees about compatibility about @Beta 
> classes, they removed it in version 15.
> What should be done: Hadoop should updated its dependency on Guava to at 
> least version 14 (currently Guava is on version 19). This should have little 
> impact on users, because Guava is so stable.
> HDFS should then be patched to use the provided alternative to 
> LimitInputStream, so that downstream packagers, users, and application 
> developers requiring more recent versions of Guava (to fix bugs, to use new 
> features, etc.) will be able to swap out the Guava dependency without 
> breaking Hadoop.
> Alternative: While Hadoop cannot predict the marking and removal of 
> deprecated code, it can, and should, avoid the use of @Beta classes and 
> methods that do not offer guarantees. If the dependency cannot be bumped, 
> then it should be relatively trivial to provide an internal class with the 
> same functionality, that does not rely on the older version of Guava.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6681) TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops

2014-12-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258199#comment-14258199
 ] 

Hadoop QA commented on HDFS-6681:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655697/HDFS-6681.patch
  against trunk revision 4f18018.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9122//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9122//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9122//console

This message is automatically generated.

> TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is 
> flaky and sometimes gets stuck in infinite loops
> --
>
> Key: HDFS-6681
> URL: https://issues.apache.org/jira/browse/HDFS-6681
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.1
> Environment: Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
> Linux [hostname] 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Attachments: HDFS-6681.patch
>
>
> This testcase has 3 infinite loops which break only on certain conditions 
> being satisfied.
> 1st loop checks if there should be a single live replica. It assumes this to 
> be true since it has just corrupted a block on one of the datanodes (testcase 
> has replication factor as 2). One scenario in which this loop will never 
> break is if the Namenode invalidates the corrupt replica, schedules a 
> replication command, and the new copied replica is added all before this 
> testcase has the chance to check the live-replica count.
> 2nd loop checks there should be 2 live replicas. It assumes this to be true 
> (in some time) since the first loop has broken implying there is a single 
> replica and now it is only a matter of time when the Namenode schedules a 
> replication command to copy a replica to another datanode. One scenario in 
> which this loop will never break is when the Namenode tries to schedule a new 
> replica on the same node on which we actually corrupted the block. That dst. 
> datanode will not copy the block, complaining that it already has the 
> (corrupted) replica in the create state. The situation that results is that 
> Namenode has scheduled a copy to a datanode, the block is now in the 
> namenode's pending replication queue, this block will never be removed from 
> the pending replication queue because the namenode will never receive a 
> report from the datanodes that the block is 'added'.
> Note: The block can be transferred from the 'pending replication' to "needed 
> replication" queue once the pending timeout (5 minutes) expires. The Namenode 
> then actively tries to schedule a replication for blocks in 'needed 
> replication' queue. This can cause the 2nd loop to break but the time in 
> which this process gets kicked in is more than 5 minutes.
> 3rd loop: This loops checks if there are no corrupt replicas. I don't see a 
> scenario in which this loop can go on for ever, since once the live replica 
> count goes back to normal (2), the corrupted block will be removed
> I guess increasing the heart beat interval time, so that the testcase has 
> enough time to check condition in loop 1 before a datanode reports a 
> successful copy should help avoid race condition in loop1. Regarding loop2 I 
> guess we can reduce the timeout after which the block is transferred from the 
> pending replication to the needed replication queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7040) HDFS dangerously uses @Beta methods from very old versions of Guava

2014-12-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258198#comment-14258198
 ] 

Hadoop QA commented on HDFS-7040:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12667911/12667911_0001-HDFS-7040-Avoid-beta-LimitInputStream-in-Guava.patch
  against trunk revision 4f18018.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9123//console

This message is automatically generated.

> HDFS dangerously uses @Beta methods from very old versions of Guava
> ---
>
> Key: HDFS-7040
> URL: https://issues.apache.org/jira/browse/HDFS-7040
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
>Reporter: Christopher Tubbs
>  Labels: beta, deprecated, guava
> Attachments: 0001-HDFS-7040-Avoid-beta-LimitInputStream-in-Guava.patch
>
>
> HDFS uses LimitInputStream from Guava. This was introduced as @Beta and is 
> risky for any application to use.
> The problem is further exacerbated by Hadoop's dependency on Guava version 
> 11.0.2, which is quite old for an active project (Feb. 2012).
> Because Guava is very stable, projects which depend on Hadoop and use Guava 
> themselves, can use up through Guava version 14.x
> However, in version 14, Guava deprecated LimitInputStream and provided a 
> replacement. Because they make no guarantees about compatibility about @Beta 
> classes, they removed it in version 15.
> What should be done: Hadoop should updated its dependency on Guava to at 
> least version 14 (currently Guava is on version 19). This should have little 
> impact on users, because Guava is so stable.
> HDFS should then be patched to use the provided alternative to 
> LimitInputStream, so that downstream packagers, users, and application 
> developers requiring more recent versions of Guava (to fix bugs, to use new 
> features, etc.) will be able to swap out the Guava dependency without 
> breaking Hadoop.
> Alternative: While Hadoop cannot predict the marking and removal of 
> deprecated code, it can, and should, avoid the use of @Beta classes and 
> methods that do not offer guarantees. If the dependency cannot be bumped, 
> then it should be relatively trivial to provide an internal class with the 
> same functionality, that does not rely on the older version of Guava.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7467) Provide storage tier information for a directory via fsck

2014-12-24 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258172#comment-14258172
 ] 

Hari Sekhon commented on HDFS-7467:
---

There does need to be a way to figure out if a given file or directory of files 
are using fallback storage.

There should also be a global method of seeing if any files are using fallback 
storage as an indicator that there isn't enough SSD for example.

Adding this information to fsck seems like a sensible way to go - the main 
question is how to represent that information concisely.

Are all storage policies in fallback storage equivalent to other storage 
policies that this output can always be fully described by the percentages that 
Tsz has suggested?

There should also be some warning messages as well in fsck for all files that 
are unable to meet the requested ideal for their storage policy and are using 
fallback storage, perhaps with a switch since that could become overly volumous 
output.

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon

> Provide storage tier information for a directory via fsck
> -
>
> Key: HDFS-7467
> URL: https://issues.apache.org/jira/browse/HDFS-7467
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-7467.patch
>
>
> Currently _fsck_  provides information regarding blocks for a directory.
> It should be augmented to provide storage tier information (optionally). 
> The sample report could be as follows :
> {code}
> Storage Tier Combination# of blocks   % of blocks
> DISK:1,ARCHIVE:2  340730   97.7393%
>  
> ARCHIVE:3   39281.1268%
>  
> DISK:2,ARCHIVE:231220.8956%
>  
> DISK:2,ARCHIVE:1 7480.2146%
>  
> DISK:1,ARCHIVE:3  440.0126%
>  
> DISK:3,ARCHIVE:2  300.0086%
>  
> DISK:3,ARCHIVE:1   90.0026%
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-6681) TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops

2014-12-24 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti reassigned HDFS-6681:
-

Assignee: Ratandeep Ratti

> TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is 
> flaky and sometimes gets stuck in infinite loops
> --
>
> Key: HDFS-6681
> URL: https://issues.apache.org/jira/browse/HDFS-6681
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.1
> Environment: Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
> Linux [hostname] 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Attachments: HDFS-6681.patch
>
>
> This testcase has 3 infinite loops which break only on certain conditions 
> being satisfied.
> 1st loop checks if there should be a single live replica. It assumes this to 
> be true since it has just corrupted a block on one of the datanodes (testcase 
> has replication factor as 2). One scenario in which this loop will never 
> break is if the Namenode invalidates the corrupt replica, schedules a 
> replication command, and the new copied replica is added all before this 
> testcase has the chance to check the live-replica count.
> 2nd loop checks there should be 2 live replicas. It assumes this to be true 
> (in some time) since the first loop has broken implying there is a single 
> replica and now it is only a matter of time when the Namenode schedules a 
> replication command to copy a replica to another datanode. One scenario in 
> which this loop will never break is when the Namenode tries to schedule a new 
> replica on the same node on which we actually corrupted the block. That dst. 
> datanode will not copy the block, complaining that it already has the 
> (corrupted) replica in the create state. The situation that results is that 
> Namenode has scheduled a copy to a datanode, the block is now in the 
> namenode's pending replication queue, this block will never be removed from 
> the pending replication queue because the namenode will never receive a 
> report from the datanodes that the block is 'added'.
> Note: The block can be transferred from the 'pending replication' to "needed 
> replication" queue once the pending timeout (5 minutes) expires. The Namenode 
> then actively tries to schedule a replication for blocks in 'needed 
> replication' queue. This can cause the 2nd loop to break but the time in 
> which this process gets kicked in is more than 5 minutes.
> 3rd loop: This loops checks if there are no corrupt replicas. I don't see a 
> scenario in which this loop can go on for ever, since once the live replica 
> count goes back to normal (2), the corrupted block will be removed
> I guess increasing the heart beat interval time, so that the testcase has 
> enough time to check condition in loop 1 before a datanode reports a 
> successful copy should help avoid race condition in loop1. Regarding loop2 I 
> guess we can reduce the timeout after which the block is transferred from the 
> pending replication to the needed replication queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)