date:20150928


[ 
https://issues.apache.org/jira/browse/HDFS-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934604#comment-14934604
 ] 

Hadoop QA commented on HDFS-9114:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764055/HDFS-9114-trunk.01.patch
 |
| Optional Tests | shellcheck javadoc javac unit findbugs checkstyle |
| git revision | trunk / 151fca5 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12729/console |


This message was automatically generated.

> NameNode and DataNode metric log file name should follow the other log file 
> name format.
> 
>
> Key: HDFS-9114
> URL: https://issues.apache.org/jira/browse/HDFS-9114
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9114-branch-2.01.patch, HDFS-9114-trunk.01.patch
>
>
> Currently datanode and namenode metric log file name is 
> {{datanode-metrics.log}} and {{namenode-metrics.log}}.
> This file name should be like {{hadoop-hdfs-namenode-metric-host192.log}} 
> same as namenode log file {{hadoop-hdfs-namenode-host192.log}}.
> This will help when we will copy log for issue analysis from different node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9171) [OIV] : ArrayIndexOutOfBoundsException thrown when step is more than maxsize in FileDistribution processor

2015-09-28 Thread Archana T (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934617#comment-14934617
 ] 

Archana T commented on HDFS-9171:
-

Assigning to Nijel as he is already looking into OIV improvement task

> [OIV] : ArrayIndexOutOfBoundsException thrown when step is more than maxsize 
> in FileDistribution processor
> --
>
> Key: HDFS-9171
> URL: https://issues.apache.org/jira/browse/HDFS-9171
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Archana T
>Assignee: nijel
>Priority: Minor
>
> When step size is more than maxsize in File Distribution processor --
> hdfs oiv -i /NAME_DIR/fsimage_0007854 -o out --processor 
> FileDistribution {color:red} -maxSize 1000 -step 5000 {color} ; cat out
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.FileDistributionCalculator.run(FileDistributionCalculator.java:131)
> at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.FileDistributionCalculator.visit(FileDistributionCalculator.java:108)
> at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:165)
> at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:124)
> Processed 0 inodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree


[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934634#comment-14934634
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks for the comments, Nicholas.

{quote}
Where are the numbers, especially the 4s, from? Do we assume a 32-bit world?
{quote}

{code}
public class BTree implements Iterable {
  ...
  private final int degree;
  private Node root;
  private int size;
  private transient int modCount = 0;
 ...
}

private final class Node {
static final int DEFAULT_CAPACITY = 5;
private Object[] elements;
private int elementsSize; 
private Object[] children;
private int childrenSize;
...
}
{code}
Sorry, I should use 64-bits system/JVM, and details are:

Compared to ArrayList, we increases following things:
private final int degree;   <-   4 bytes Integer
private Node root;   <-  reference, 4 bytes on 32-bits 
system/JVM, 8 bytes on 64-bits system/JVM
private int size;<-  4 bytes Integer

{{Node}} object overhead   <--  12 bytes
private Object[] children; <-  null reference, 4 bytes on 
32-bits system/JVM, 8 bytes on 64-bits system/JVM
private int childrenSize;   <-  4 bytes Integer.

So totally  12+4+4+4+4+4+4 = 32 bytes on 32-bits system/JVM, and 12+4+8+4+8+4 = 
40 bytes on 64-bits system/JVM. (I have not counted object alignment)


> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2015-09-28 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934589#comment-14934589
 ] 

Brahma Reddy Battula commented on HDFS-8647:


[~mingma] kindly review the latest patch..dn't know why jenkins are not 
triggered on latest patch.

> Abstract BlockManager's rack policy into BlockPlacementPolicy
> -
>
> Key: HDFS-8647
> URL: https://issues.apache.org/jira/browse/HDFS-8647
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, 
> HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch
>
>
> Sometimes we want to have namenode use alternative block placement policy 
> such as upgrade domains in HDFS-7541.
> BlockManager has built-in assumption about rack policy in functions such as 
> useDelHint, blockHasEnoughRacks. That means when we have new block placement 
> policy, we need to modify BlockManager to account for the new policy. Ideally 
> BlockManager should ask BlockPlacementPolicy object instead. That will allow 
> us to provide new BlockPlacementPolicy without changing BlockManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9100) HDFS Balancer does not respect dfs.client.use.datanode.hostname

2015-09-28 Thread Casey Brotherton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934593#comment-14934593
 ] 

Casey Brotherton commented on HDFS-9100:


I am sorry, the failure appears to be due to changes with HDFS-8053.

Will reapply the patch, and run tests tonight.

> HDFS Balancer does not respect dfs.client.use.datanode.hostname
> ---
>
> Key: HDFS-9100
> URL: https://issues.apache.org/jira/browse/HDFS-9100
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover, HDFS
>Reporter: Yongjun Zhang
>Assignee: Casey Brotherton
> Attachments: HDFS-9100.000.patch, HDFS-9100.001.patch, 
> HDFS-9100.002.patch
>
>
> In Balancer Dispatch.java:
> {code}
>private void dispatch() {
>   LOG.info("Start moving " + this);
>   Socket sock = new Socket();
>   DataOutputStream out = null;
>   DataInputStream in = null;
>   try {
> sock.connect(
> NetUtils.createSocketAddr(target.getDatanodeInfo().getXferAddr()),
> HdfsConstants.READ_TIMEOUT);
> {code}
> getXferAddr() is called without taking into consideration of 
> dfs.client.use.datanode.hostname setting, this would possibly fail balancer 
> run issued from outside a cluster.
> Thanks [~caseyjbrotherton] for reporting the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9151) Mover should print the exit status/reason on console like balancer tool.

2015-09-28 Thread Surendra Singh Lilhore (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9151:
-
Attachment: HDFS-9151.02.patch

Thanks [~templedf] for review...
Attached updated patch , please review...

> Mover should print the exit status/reason on console like balancer tool.
> 
>
> Key: HDFS-9151
> URL: https://issues.apache.org/jira/browse/HDFS-9151
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Reporter: Archana T
>Assignee: Surendra Singh Lilhore
>Priority: Minor
> Attachments: HDFS-9151.01.patch, HDFS-9151.02.patch
>
>
> Mover should print exit reason on console --
> In cases where there is No blocks to move or unavailable Storages or any 
> other, Mover tool gives No information on exit reason on the console--
> {code}
> # ./hdfs mover
> ...
> Sep 28, 2015 12:31:25 PM Mover took 10sec
> # echo $?
> 0
> # ./hdfs mover
> ...
> Sep 28, 2015 12:33:10 PM Mover took 1sec
> # echo $?
> 254
> {code}
> Unlike Balancer prints exit reason 
> example--
> #./hdfs balancer
> ...
> {color:red}The cluster is balanced. Exiting...{color}
> Sep 28, 2015 12:18:02 PM Balancing took 1.744 seconds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-9053) Support large directories efficiently using B-Tree


[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934634#comment-14934634
 ] 

Yi Liu edited comment on HDFS-9053 at 9/29/15 5:15 AM:
---

Thanks for the comments, Nicholas.

{quote}
Where are the numbers, especially the 4s, from? Do we assume a 32-bit world?
{quote}

{code}
public class BTree implements Iterable {
  ...
  private final int degree;
  private Node root;
  private int size;
  private transient int modCount = 0;
 ...
}

private final class Node {
static final int DEFAULT_CAPACITY = 5;
private Object[] elements;
private int elementsSize; 
private Object[] children;
private int childrenSize;
...
}
{code}
Sorry, I should use 64-bits system/JVM to describe, and details are:

Compared to ArrayList, we increases following things:
private final int degree;   <-   4 bytes Integer
private Node root;   <-  reference, 4 bytes on 32-bits 
system/JVM, 8 bytes on 64-bits system/JVM
private int size;<-  4 bytes Integer

{{Node}} object overhead   <--  12 bytes
private Object[] children; <-  null reference, 4 bytes on 
32-bits system/JVM, 8 bytes on 64-bits system/JVM
private int childrenSize;   <-  4 bytes Integer.

So totally  12+4+4+4+4+4+4 = 32 bytes on 32-bits system/JVM, and 12+4+8+4+8+4 = 
40 bytes on 64-bits system/JVM. (I have not counted object alignment)



was (Author: hitliuyi):
Thanks for the comments, Nicholas.

{quote}
Where are the numbers, especially the 4s, from? Do we assume a 32-bit world?
{quote}

{code}
public class BTree implements Iterable {
  ...
  private final int degree;
  private Node root;
  private int size;
  private transient int modCount = 0;
 ...
}

private final class Node {
static final int DEFAULT_CAPACITY = 5;
private Object[] elements;
private int elementsSize; 
private Object[] children;
private int childrenSize;
...
}
{code}
Sorry, I should use 64-bits system/JVM, and details are:

Compared to ArrayList, we increases following things:
private final int degree;   <-   4 bytes Integer
private Node root;   <-  reference, 4 bytes on 32-bits 
system/JVM, 8 bytes on 64-bits system/JVM
private int size;<-  4 bytes Integer

{{Node}} object overhead   <--  12 bytes
private Object[] children; <-  null reference, 4 bytes on 
32-bits system/JVM, 8 bytes on 64-bits system/JVM
private int childrenSize;   <-  4 bytes Integer.

So totally  12+4+4+4+4+4+4 = 32 bytes on 32-bits system/JVM, and 12+4+8+4+8+4 = 
40 bytes on 64-bits system/JVM. (I have not counted object alignment)


> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree


[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934598#comment-14934598
 ] 

Yi Liu commented on HDFS-9053:
--

Thanks Jing for the further review. 
{quote}
Yeah, sorry for the delay.
{quote}
NP, thanks for the help :)

I also think it's a critical part of the code, thanks a lot for the review, 
[~szetszwo]! also welcome [~kihwal] and anyone who can review this.

{quote}
For insert/delete, the running time is O( n) for ArrayList.
{quote}
Yeah, if we count the re-allocations and copies of big arrays, I just mean the 
search time complexity before we do insert/delete.

{quote}
How about memory usage? One reason to use ArrayList instead of a more 
complicated data structure is to save memory. It seems to me that using B-Tree 
increases memory usage in general. It increases memory usage dramatically in 
some worst cases such as the average branching factor of the tree is small 
(this may also be a common case).
{quote}
That's a great question here.  I have given few description in the my second 
comment above: 
[here|https://issues.apache.org/jira/browse/HDFS-9053?focusedCommentId=14740814=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14740814],
 I would give a detailed explanation:
Generally say, the B-Tree in the patch increases *very few* memory usage, and 
can be ignored:
# *For normal and small size directories*, let's say it's (INode) children size 
is not large and then B-Tree only contains one node. Currently the default 
degree of B-Tree used in INodeDirectory is 1024, so the max degree is 2047, 
that means one B-Tree node can at most contain 2047 elements. And the children 
of the B-Tree root node is null, so the increased memory usage compared to 
ArrayList is: one object overhead + few variables, total increase about 
(12+4+4+4+4+4) = 32 bytes.
# *For large directories*, let's say it's (INode) children size is larger than 
the max degree and then B-Tree contains more than one nodes. For any B-Tree 
node except root, it will contains at least min degree size elements, in our 
case, it's 1023, furthermore, we expand elements of B-Tree node same as 
ArrayList, so it's the same as ArrayList at this point, typically the worst 
case for B-Tree from memory point view is all elements are in-order, in this 
case, when we split a B-Tree node, we allocate the elements size of memory and 
do expanding, but no more elements are added later in the split left B-Tree 
node, and there is a waste of 1/3 size of elements array memory, but actually 
it's almost the same as the worst cases even for ArrayList if there is no/few 
elements added, but in practice, the elements are not in order.  Now back to 
the overhead, for B-Tree, for every B-Tree node, the increased memory usage is: 
one object overhead + one element pointer in the father + 2 variables, so total 
increase about (12 + 4+4+4) = 24 bytes for every new added B-Tree node, but one 
B-Tree node contains at least (degree - 1) elements, and at most (2* degree - 
1) elements.  Another small increased memory is the object overhead of the 
children in the B-Tree inner node.

In conclusion, the increased memory used for B-Tree compared to ArrayList is 
very small, for small/normal directories, it's only 32 bytes overhead which is 
even less than memory of one block in NN, for large directories, besides 
additional increased 12 bytes memory for the variables in B-Tree , it will 
increase about 24 bytes memory for every new added B-Tree node which can 
contain 1023 ~ 2047 INodes in the directory in our case.  We can ignore these 
few memory overhead for a directory. 

And last, the benefits of B-Tree is great and obvious as described in the JIRA 
description, also in the patch, I consider the memory overhead carefully while 
implementing the B-Tree. 

Please let me know if you have further questions. Thanks a lot.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M

[jira] [Updated] (HDFS-9158) [OEV-Doc] : Document does not mention about "-f" and "-r" options


 [ 
https://issues.apache.org/jira/browse/HDFS-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated HDFS-9158:

Attachment: HDFS-9158_03.patch

updated patch for the help message fix also
Keeping indent as same to follow above lines

Thanks

> [OEV-Doc] : Document does not mention about "-f" and "-r" options
> -
>
> Key: HDFS-9158
> URL: https://issues.apache.org/jira/browse/HDFS-9158
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9158.01.patch, HDFS-9158_02.patch, 
> HDFS-9158_03.patch
>
>
> 1. Document does not mention about "-f" and "-r" options
> add these options also in document
> {noformat}
> -f,--fix-txids Renumber the transaction IDs in the input,
>so that there are no gaps or invalid  transaction IDs.
> -r,--recover   When reading binary edit logs, use recovery
>mode.  This will give you the chance to skip
>corrupt parts of the edit log.
> {noformat}
> 2. In help message there is some extra white spaces 
> {code}
> "so that there are no gaps or invalidtransaction IDs."
> {code}
> can remove this also



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2015-09-28 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934622#comment-14934622
 ] 

Ming Ma commented on HDFS-8647:
---

Thanks [~brahmareddy]! Here are couple comments.

* NetworkTopology knows when the # of racks changes from 1 to 2. So instead of 
having DatanodeManager call {{setHasClusterEverBeenMultiRack}} on 
NetworkTopology, perhaps we can have a new method {{boolean 
NetworkTopology.add(Node node)}} where it return true only when the # of racks 
changes from 1 to 2. Then DatanodeManager can act only if the returned value is 
true.
* {{verifyBlockPlacement}}'s parameter changes from {{LocatedBlock}} to 
{{DatanodeInfo[]}}. LocatedBlock has other info such as storage type that 
DatanodeInfo doesn't have. If later we decide to use storage type to verify 
block placement, we will need to add it back. Any way BlockManager can 
construct LocatedBlock from the block id? For example, maybe it can get the 
DatanodeStorageInfo from {{blocksMap.getStorages(block)}} and then construct 
the block via {{newLocatedBlock}} method.
* In {{verifyBlockPlacement}}'s {{if (!clusterMap.hasClusterEverBeenMultiRack() 
&& numRacks <= 1)}}, is {{numRacks <= 1}} still needed?
* {{chooseReplicaToDelete}} becomes an internal method. So there is no need to 
make it public.


> Abstract BlockManager's rack policy into BlockPlacementPolicy
> -
>
> Key: HDFS-8647
> URL: https://issues.apache.org/jira/browse/HDFS-8647
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-8647-001.patch, HDFS-8647-002.patch, 
> HDFS-8647-003.patch, HDFS-8647-004.patch, HDFS-8647-004.patch
>
>
> Sometimes we want to have namenode use alternative block placement policy 
> such as upgrade domains in HDFS-7541.
> BlockManager has built-in assumption about rack policy in functions such as 
> useDelHint, blockHasEnoughRacks. That means when we have new block placement 
> policy, we need to modify BlockManager to account for the new policy. Ideally 
> BlockManager should ask BlockPlacementPolicy object instead. That will allow 
> us to provide new BlockPlacementPolicy without changing BlockManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9092) Nfs silently drops overlapping write requests and causes data copying to fail


[ 
https://issues.apache.org/jira/browse/HDFS-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934655#comment-14934655
 ] 

Hudson commented on HDFS-9092:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #457 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/457/])
HDFS-9092. Nfs silently drops overlapping write requests and causes data 
copying to fail. Contributed by Yongjun Zhang. (yzhang: rev 
151fca5032719e561226ef278e002739073c23ec)
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OffsetRange.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Nfs silently drops overlapping write requests and causes data copying to fail
> -
>
> Key: HDFS-9092
> URL: https://issues.apache.org/jira/browse/HDFS-9092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.1
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-9092.001.patch, HDFS-9092.002.patch
>
>
> When NOT using 'sync' option, the NFS writes may issue the following warning:
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got an overlapping write 
> (1248751616, 1249677312), nextOffset=1248752400. Silently drop it now
> and the size of data copied via NFS will stay at 1248752400.
> Found what happened is:
> 1. The write requests from client are sent asynchronously. 
> 2. The NFS gateway has handler to handle the incoming requests by creating an 
> internal write request structuire and put it into cache;
> 3. In parallel, a separate thread in NFS gateway takes requests out from the 
> cache and writes the data to HDFS.
> The current offset is how much data has been written by the write thread in 
> 3. The detection of overlapping write request happens in 2, but it only 
> checks the write request against the curent offset, and trim the request if 
> necessary. Because the write requests are sent asynchronously, if two 
> requests are beyond the current offset, and they overlap, it's not detected 
> and both are put into the cache. This cause the symptom reported in this case 
> at step 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9114) NameNode and DataNode metric log file name should follow the other log file name format.

2015-09-28 Thread Surendra Singh Lilhore (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9114:
-
Status: Patch Available  (was: Open)

> NameNode and DataNode metric log file name should follow the other log file 
> name format.
> 
>
> Key: HDFS-9114
> URL: https://issues.apache.org/jira/browse/HDFS-9114
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9114-branch-2.01.patch, HDFS-9114-trunk.01.patch
>
>
> Currently datanode and namenode metric log file name is 
> {{datanode-metrics.log}} and {{namenode-metrics.log}}.
> This file name should be like {{hadoop-hdfs-namenode-metric-host192.log}} 
> same as namenode log file {{hadoop-hdfs-namenode-host192.log}}.
> This will help when we will copy log for issue analysis from different node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9092) Nfs silently drops overlapping write requests and causes data copying to fail


[ 
https://issues.apache.org/jira/browse/HDFS-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934625#comment-14934625
 ] 

Hudson commented on HDFS-9092:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2372 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2372/])
HDFS-9092. Nfs silently drops overlapping write requests and causes data 
copying to fail. Contributed by Yongjun Zhang. (yzhang: rev 
151fca5032719e561226ef278e002739073c23ec)
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OffsetRange.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Nfs silently drops overlapping write requests and causes data copying to fail
> -
>
> Key: HDFS-9092
> URL: https://issues.apache.org/jira/browse/HDFS-9092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.1
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-9092.001.patch, HDFS-9092.002.patch
>
>
> When NOT using 'sync' option, the NFS writes may issue the following warning:
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got an overlapping write 
> (1248751616, 1249677312), nextOffset=1248752400. Silently drop it now
> and the size of data copied via NFS will stay at 1248752400.
> Found what happened is:
> 1. The write requests from client are sent asynchronously. 
> 2. The NFS gateway has handler to handle the incoming requests by creating an 
> internal write request structuire and put it into cache;
> 3. In parallel, a separate thread in NFS gateway takes requests out from the 
> cache and writes the data to HDFS.
> The current offset is how much data has been written by the write thread in 
> 3. The detection of overlapping write request happens in 2, but it only 
> checks the write request against the curent offset, and trim the request if 
> necessary. Because the write requests are sent asynchronously, if two 
> requests are beyond the current offset, and they overlap, it's not detected 
> and both are put into the cache. This cause the symptom reported in this case 
> at step 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9172) Erasure Coding: Move DFSStripedIO stream related classes to hadoop-hdfs-client

2015-09-28 Thread Rakesh R (JIRA)

Rakesh R created HDFS-9172:
--

 Summary: Erasure Coding: Move DFSStripedIO stream related classes 
to hadoop-hdfs-client
 Key: HDFS-9172
 URL: https://issues.apache.org/jira/browse/HDFS-9172
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira is to move the striped stream related classes to 
{{hadoop-hdfs-client}} project. This will help to be in sync with the HDFS-6200 
proposal.

- DFSStripedInputStream
- DFSStripedOutputStream
- StripedDataStreamer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree

2015-09-28 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934611#comment-14934611
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9053:
---

> Yeah, if we count the re-allocations and copies of big arrays, I just mean 
> the search time complexity before we do insert/delete.

If we count only the search time complexity before we do insert/delete, it 
should be called "search time complexity" but not insert/delete time complexity.

> ... total increase about (12+4+4+4+4+4) = 32 bytes.

Where are the numbers, especially the 4s, from?  Do we assume a 32-bit world?

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9092) Nfs silently drops overlapping write requests and causes data copying to fail


[ 
https://issues.apache.org/jira/browse/HDFS-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934577#comment-14934577
 ] 

Hudson commented on HDFS-9092:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1195 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1195/])
HDFS-9092. Nfs silently drops overlapping write requests and causes data 
copying to fail. Contributed by Yongjun Zhang. (yzhang: rev 
151fca5032719e561226ef278e002739073c23ec)
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java
* 
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OffsetRange.java


> Nfs silently drops overlapping write requests and causes data copying to fail
> -
>
> Key: HDFS-9092
> URL: https://issues.apache.org/jira/browse/HDFS-9092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.1
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-9092.001.patch, HDFS-9092.002.patch
>
>
> When NOT using 'sync' option, the NFS writes may issue the following warning:
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got an overlapping write 
> (1248751616, 1249677312), nextOffset=1248752400. Silently drop it now
> and the size of data copied via NFS will stay at 1248752400.
> Found what happened is:
> 1. The write requests from client are sent asynchronously. 
> 2. The NFS gateway has handler to handle the incoming requests by creating an 
> internal write request structuire and put it into cache;
> 3. In parallel, a separate thread in NFS gateway takes requests out from the 
> cache and writes the data to HDFS.
> The current offset is how much data has been written by the write thread in 
> 3. The detection of overlapping write request happens in 2, but it only 
> checks the write request against the curent offset, and trim the request if 
> necessary. Because the write requests are sent asynchronously, if two 
> requests are beyond the current offset, and they overlap, it's not detected 
> and both are put into the cache. This cause the symptom reported in this case 
> at step 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly

2015-09-28 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-1172:
---
Attachment: HDFS-1172.010.patch

bq. Also if a block's effective replica number (including pending replica 
number) is >= than its replication factor, the block should not be in 
neededReplication. 

I rethinked about this and fixed {{checkReplication}} accordingly.

I also fixed to address checkstyle warnings. Warning about file length of 
BlockManager.java is not introduced here. The failure of 
{{TestBlockManager.testBlocksAreNotUnderreplicatedInSingleRack}} seems not to 
be related to the patch and I could not reproduce it in my environment.


> Blocks in newly completed files are considered under-replicated too quickly
> ---
>
> Key: HDFS-1172
> URL: https://issues.apache.org/jira/browse/HDFS-1172
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
> Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, 
> HDFS-1172.009.patch, HDFS-1172.010.patch, HDFS-1172.patch, hdfs-1172.txt, 
> hdfs-1172.txt, replicateBlocksFUC.patch, replicateBlocksFUC1.patch, 
> replicateBlocksFUC1.patch
>
>
> I've seen this for a long time, and imagine it's a known issue, but couldn't 
> find an existing JIRA. It often happens that we see the NN schedule 
> replication on the last block of files very quickly after they're completed, 
> before the other DNs in the pipeline have a chance to report the new block. 
> This results in a lot of extra replication work on the cluster, as we 
> replicate the block and then end up with multiple excess replicas which are 
> very quickly deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9171) [OIV] : ArrayIndexOutOfBoundsException thrown when step is more than maxsize in FileDistribution processor

2015-09-28 Thread Archana T (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Archana T updated HDFS-9171:

Description: 
When step size is more than maxsize in File Distribution processor --

hdfs oiv -i /NAME_DIR/fsimage_0007854 -o out --processor 
FileDistribution {color:red} -maxSize 1000 -step 5000 {color} ; cat out
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hdfs.tools.offlineImageViewer.FileDistributionCalculator.run(FileDistributionCalculator.java:131)
at 
org.apache.hadoop.hdfs.tools.offlineImageViewer.FileDistributionCalculator.visit(FileDistributionCalculator.java:108)
at 
org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:165)
at 
org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:124)
Processed 0 inodes.

> [OIV] : ArrayIndexOutOfBoundsException thrown when step is more than maxsize 
> in FileDistribution processor
> --
>
> Key: HDFS-9171
> URL: https://issues.apache.org/jira/browse/HDFS-9171
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Archana T
>Assignee: nijel
>Priority: Minor
>
> When step size is more than maxsize in File Distribution processor --
> hdfs oiv -i /NAME_DIR/fsimage_0007854 -o out --processor 
> FileDistribution {color:red} -maxSize 1000 -step 5000 {color} ; cat out
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.FileDistributionCalculator.run(FileDistributionCalculator.java:131)
> at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.FileDistributionCalculator.visit(FileDistributionCalculator.java:108)
> at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:165)
> at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:124)
> Processed 0 inodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-9053) Support large directories efficiently using B-Tree

[
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934598#comment-14934598
]

Yi Liu edited comment on HDFS-9053 at 9/29/15 4:48 AM:
---

Thanks Jing for the further review.
{quote}
Yeah, sorry for the delay.
{quote}
NP, thanks for the help :)

I also think it's a critical part of the code, thanks a lot for the review,
[~szetszwo]! also welcome [~kihwal] and anyone who can review this.

{quote}
For insert/delete, the running time is O( n) for ArrayList.
{quote}
Yeah, if we count the re-allocations and copies of big arrays, I just mean the
search time complexity before we do insert/delete.

{quote}
How about memory usage? One reason to use ArrayList instead of a more
complicated data structure is to save memory. It seems to me that using B-Tree
increases memory usage in general. It increases memory usage dramatically in
some worst cases such as the average branching factor of the tree is small
(this may also be a common case).
{quote}
That's a great question here. I have given few description in the my second
comment above:
[here|https://issues.apache.org/jira/browse/HDFS-9053?focusedCommentId=14740814=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14740814],
I would give a detailed explanation:
{quote}
in some worst cases such as the average branching factor of the tree is small
{quote}
Don't worry about this, for any B-Tree node except root, it contains at least
min degree size of elements, otherwise there is merging or element shift to
guarantee this property. Besides, we expand the elements in B-Tree node the
same as ArrayList.

Generally say, the B-Tree in the patch increases *very few* memory usage, and
can be ignored:
# *For normal and small size directories*, let's say it's (INode) children size
is not large and then B-Tree only contains one node. Currently the default
degree of B-Tree used in INodeDirectory is 1024, so the max degree is 2047,
that means one B-Tree node can at most contain 2047 elements. And the children
of the B-Tree root node is null, so the increased memory usage compared to
ArrayList is: one object overhead + few variables, total increase about
(12+4+4+4+4+4) = 32 bytes.
# *For large directories*, let's say it's (INode) children size is larger than
the max degree and then B-Tree contains more than one nodes. For any B-Tree
node except root, it contains at least min degree size elements, in our case,
it's 1023, furthermore, we expand elements of B-Tree node same as ArrayList, so
it's the same as ArrayList at this point, typically the worst case for B-Tree
from memory point view is all elements are in-order, in this case, when we
split a B-Tree node, we allocate the elements size of memory and do expanding,
but no more elements are added later in the split left B-Tree node, and there
is a waste of 1/3 size of elements array memory, but actually it's almost the
same as the worst cases even for ArrayList if there is no/few elements added
after expanding, but in practice, the elements are not in order. Now back to
the overhead, for B-Tree, for every B-Tree node, the increased memory usage is:
one object overhead + one element pointer in the father + 2 variables, so total
increase about (12 + 4+4+4) = 24 bytes for every new added B-Tree node, but one
B-Tree node contains at least (degree - 1) elements, and at most (2* degree -
1) elements. Another small increased memory is the object overhead of the
children in the B-Tree inner node.

In conclusion, the increased memory used for B-Tree compared to ArrayList is
very small, for small/normal directories, it's only 32 bytes overhead which is
even less than memory of one block in NN, for large directories, besides
additional increased 12 bytes memory for the variables in B-Tree , it will
increase about 24 bytes memory for every new added B-Tree node which can
contain 1023 ~ 2047 INodes in the directory in our case. We can ignore these
few memory overhead for a directory.

And last, the benefits of B-Tree is great and obvious as described in the JIRA
description, also in the patch, I consider the memory overhead carefully while
implementing the B-Tree.

Please let me know if you have further questions. Thanks a lot.

was (Author: hitliuyi):
Thanks Jing for the further review.
{quote}
Yeah, sorry for the delay.
{quote}
NP, thanks for the help :)

I also think it's a critical part of the code, thanks a lot for the review,
[~szetszwo]! also welcome [~kihwal] and anyone who can review this.

{quote}
How about memory usage? One reason to use ArrayList instead of a more
complicated data structure is to save memory. It seems to me that

[jira] [Created] (HDFS-9171) [OIV] : ArrayIndexOutOfBoundsException thrown when step is more than maxsize in FileDistribution processor

2015-09-28 Thread Archana T (JIRA)

Archana T created HDFS-9171:
---

 Summary: [OIV] : ArrayIndexOutOfBoundsException thrown when step 
is more than maxsize in FileDistribution processor
 Key: HDFS-9171
 URL: https://issues.apache.org/jira/browse/HDFS-9171
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Archana T
Assignee: nijel
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9080) update htrace version to 4.0.1


[ 
https://issues.apache.org/jira/browse/HDFS-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933819#comment-14933819
 ] 

Hudson commented on HDFS-9080:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #428 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/428/])
HDFS-9080. Update htrace version to 4.0.1 (cmccabe) (cmccabe: rev 
892ade689f9bcce76daae8f66fc00a49bee8548e)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsTracer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TracerConfigurationManager.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
* hadoop-project/pom.xml
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShell.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolIterator.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocalLegacy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSOutputStream.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/SetSpanReceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInotifyEventInputStream.java
* hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPacket.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* hadoop-common-project/hadoop-common/src/main/proto/RpcHeader.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
* hadoop-common-project/hadoop-common/pom.xml
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* hadoop-hdfs-project/hadoop-hdfs/pom.xml
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveIterator.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtoUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSPacket.java
* hadoop-common-project/hadoop-common/src/site/markdown/Tracing.md
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileContext.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java
*

[jira] [Commented] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures


[ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933884#comment-14933884
 ] 

Hudson commented on HDFS-9106:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8532 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8532/])
HDFS-9106. Transfer failure during pipeline recovery causes permanent write 
failures. Contributed by Kihwal Lee. (kihwal: rev 
4c9497cbf02ecc82532a4e79e18912d8e0eb4731)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java


> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9106-poc.patch, HDFS-9106.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6264) Provide FileSystem#create() variant which throws exception if parent directory doesn't exist


[ 
https://issues.apache.org/jira/browse/HDFS-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933885#comment-14933885
 ] 

Hadoop QA commented on HDFS-6264:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | patch |   0m  1s | The patch file was not named 
according to hadoop's naming conventions. Please see 
https://wiki.apache.org/hadoop/HowToContribute for instructions. |
| {color:red}-1{color} | pre-patch |  21m 42s | Pre-patch trunk has 2 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  5s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 45s | The applied patch generated  1 
new checkstyle issues (total was 230, now 230). |
| {color:red}-1{color} | checkstyle |   4m 13s | The applied patch generated  1 
new checkstyle issues (total was 149, now 149). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 26s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 19s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |   7m 22s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | tools/hadoop tests |   1m 11s | Tests failed in 
hadoop-azure. |
| {color:red}-1{color} | hdfs tests | 164m 58s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 29s | Tests passed in 
hadoop-hdfs-client. |
| | | 227m 49s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ipc.TestIPC |
|   | hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked |
|   | hadoop.fs.contract.hdfs.TestHDFSContractCreate |
|   | hadoop.hdfs.server.namenode.TestStorageRestore |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764038/hdfs-6264-v3.txt |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 892ade6 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12712/artifact/patchprocess/trunkFindbugsWarningshadoop-common.html
 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12712/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12712/artifact/patchprocess/diffcheckstylehadoop-common.txt
 
https://builds.apache.org/job/PreCommit-HDFS-Build/12712/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12712/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-azure test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12712/artifact/patchprocess/testrun_hadoop-azure.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12712/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12712/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12712/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12712/console |


This message was automatically generated.

> Provide FileSystem#create() variant which throws exception if parent 
> directory doesn't exist
> 
>
> Key: HDFS-6264
> URL: https://issues.apache.org/jira/browse/HDFS-6264
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>  Labels: hbase
> Attachments: hdfs-6264-v1.txt, hdfs-6264-v2.txt, hdfs-6264-v3.txt
>
>
> FileSystem#createNonRecursive() is deprecated.
> However, there is no DistributedFileSystem#create() implementation which 
> throws exception if parent directory doesn't exist.
> This limits clients' migration away from the deprecated method.
> For

[jira] [Commented] (HDFS-6264) Provide FileSystem#create() variant which throws exception if parent directory doesn't exist


[ 
https://issues.apache.org/jira/browse/HDFS-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933894#comment-14933894
 ] 

Ted Yu commented on HDFS-6264:
--

I don't think removing deprecation would result in the following test failure:
{code}
testGlobStatusFilterWithMultiplePathWildcardsAndNonTrivialFilter(org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked)
  Time elapsed: 0.02 sec  <<< ERROR!
java.lang.NullPointerException: null
at org.apache.hadoop.fs.Globber.glob(Globber.java:145)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1688)
at 
org.apache.hadoop.fs.FSMainOperationsBaseTest.testGlobStatusFilterWithMultiplePathWildcardsAndNonTrivialFilter(FSMainOperationsBaseTest.java:624)
{code}

> Provide FileSystem#create() variant which throws exception if parent 
> directory doesn't exist
> 
>
> Key: HDFS-6264
> URL: https://issues.apache.org/jira/browse/HDFS-6264
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>  Labels: hbase
> Attachments: hdfs-6264-v1.txt, hdfs-6264-v2.txt, hdfs-6264-v3.txt
>
>
> FileSystem#createNonRecursive() is deprecated.
> However, there is no DistributedFileSystem#create() implementation which 
> throws exception if parent directory doesn't exist.
> This limits clients' migration away from the deprecated method.
> For HBase, IO fencing relies on the behavior of 
> FileSystem#createNonRecursive().
> Variant of create() method should be added which throws exception if parent 
> directory doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel


[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933590#comment-14933590
 ] 

Hadoop QA commented on HDFS-8578:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 28s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  5s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 32s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 29s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 11s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 191m 15s | Tests failed in hadoop-hdfs. |
| | | 234m 46s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | 
hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764013/HDFS-8578-09.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 66dad85 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12708/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12708/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12708/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12708/console |


This message was automatically generated.

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9155) [OEV] : The inputFile does not follow case insensitiveness incase of XML file


 [ 
https://issues.apache.org/jira/browse/HDFS-9155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated HDFS-9155:

Attachment: HDFS-9155_01.patch

updated the change.
please review

> [OEV] : The inputFile does not follow case insensitiveness incase of XML file
> -
>
> Key: HDFS-9155
> URL: https://issues.apache.org/jira/browse/HDFS-9155
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9155_01.patch
>
>
> As in document and help
> {noformat}
> -i,--inputFileedits file to process, xml (*case
>insensitive*) extension means XML format,
> {noformat}
> But if i give the file with "XML" extension it falls back to binary 
> processing.
> This issue is due the code
> {code}
>  int org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go()
> .
> boolean xmlInput = inputFileName.endsWith(".xml");
> {code}
> Here need to check the xml after converting the file name to lower case



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9158) [OEV-Doc] : Document does not mention about "-f" and "-r" options


[ 
https://issues.apache.org/jira/browse/HDFS-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933537#comment-14933537
 ] 

Hadoop QA commented on HDFS-9158:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   3m 19s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m 18s | Site still builds. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| | |   7m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764042/HDFS-9158_02.patch |
| Optional Tests | site |
| git revision | trunk / 892ade6 |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12714/console |


This message was automatically generated.

> [OEV-Doc] : Document does not mention about "-f" and "-r" options
> -
>
> Key: HDFS-9158
> URL: https://issues.apache.org/jira/browse/HDFS-9158
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9158.01.patch, HDFS-9158_02.patch
>
>
> 1. Document does not mention about "-f" and "-r" options
> add these options also in document
> {noformat}
> -f,--fix-txids Renumber the transaction IDs in the input,
>so that there are no gaps or invalid  transaction IDs.
> -r,--recover   When reading binary edit logs, use recovery
>mode.  This will give you the chance to skip
>corrupt parts of the edit log.
> {noformat}
> 2. In help message there is some extra white spaces 
> {code}
> "so that there are no gaps or invalidtransaction IDs."
> {code}
> can remove this also



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9159) [OIV] : return value of the command is not correct if invalid value specified in "-p (processor)" option


 [ 
https://issues.apache.org/jira/browse/HDFS-9159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated HDFS-9159:

Attachment: HDFS-9159_01.patch

Updated the patch
please review

> [OIV] : return value of the command is not correct if invalid value specified 
> in "-p (processor)" option
> 
>
> Key: HDFS-9159
> URL: https://issues.apache.org/jira/browse/HDFS-9159
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9159_01.patch
>
>
> Return value of the IOV command is not correct if invalid value specified in 
> "-p (processor)" option
> this needs to return error to user.
> code change will be in switch statement of
> {code}
>  try (PrintStream out = outputFile.equals("-") ?
> System.out : new PrintStream(outputFile, "UTF-8")) {
>   switch (processor) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9053) Support large directories efficiently using B-Tree


[ 
https://issues.apache.org/jira/browse/HDFS-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933696#comment-14933696
 ] 

Jing Zhao commented on HDFS-9053:
-

Yeah, sorry for the delay. I'm currently reviewing the remaining part. Will 
post comments later.

> Support large directories efficiently using B-Tree
> --
>
> Key: HDFS-9053
> URL: https://issues.apache.org/jira/browse/HDFS-9053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Critical
> Attachments: HDFS-9053 (BTree with simple benchmark).patch, HDFS-9053 
> (BTree).patch, HDFS-9053.001.patch, HDFS-9053.002.patch
>
>
> This is a long standing issue, we were trying to improve this in the past.  
> Currently we use an ArrayList for the children under a directory, and the 
> children are ordered in the list, for insert/delete/search, the time 
> complexity is O(log n), but insertion/deleting causes re-allocations and 
> copies of big arrays, so the operations are costly.  For example, if the 
> children grow to 1M size, the ArrayList will resize to > 1M capacity, so need 
> > 1M * 4bytes = 4M continuous heap memory, it easily causes full GC in HDFS 
> cluster where namenode heap memory is already highly used.  I recap the 3 
> main issues:
> # Insertion/deletion operations in large directories are expensive because 
> re-allocations and copies of big arrays.
> # Dynamically allocate several MB continuous heap memory which will be 
> long-lived can easily cause full GC problem.
> # Even most children are removed later, but the directory INode still 
> occupies same size heap memory, since the ArrayList will never shrink.
> This JIRA is similar to HDFS-7174 created by [~kihwal], but use B-Tree to 
> solve the problem suggested by [~shv]. 
> So the target of this JIRA is to implement a low memory footprint B-Tree and 
> use it to replace ArrayList. 
> If the elements size is not large (less than the maximum degree of B-Tree 
> node), the B-Tree only has one root node which contains an array for the 
> elements. And if the size grows large enough, it will split automatically, 
> and if elements are removed, then B-Tree nodes can merge automatically (see 
> more: https://en.wikipedia.org/wiki/B-tree).  It will solve the above 3 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

2015-09-28 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933706#comment-14933706
 ] 

Zhe Zhang commented on HDFS-9040:
-

I'm finishing up my review, will post soon.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> ---
>
> Key: HDFS-9040
> URL: https://issues.apache.org/jira/browse/HDFS-9040
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Jing Zhao
> Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040-HDFS-7285.004.patch, 
> HDFS-9040-HDFS-7285.005.patch, HDFS-9040-HDFS-7285.005.patch, 
> HDFS-9040-HDFS-7285.006.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> -Proposal 1:-
> -A BlockGroupDataStreamer to communicate with NN to allocate/update block, 
> and StripedDataStreamer s only have to stream blocks to DNs.-
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-8219) setStoragePolicy with folder behavior is different after cluster restart


 [ 
https://issues.apache.org/jira/browse/HDFS-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-8219.
-

> setStoragePolicy with folder behavior is different after cluster restart
> 
>
> Key: HDFS-8219
> URL: https://issues.apache.org/jira/browse/HDFS-8219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Peter Shi
>Assignee: Surendra Singh Lilhore
>  Labels: 2.6.1-candidate, 2.7.2-candidate, BB2015-05-RFC
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: HDFS-8219.patch, HDFS-8219.unittest-norepro.patch
>
>
> Reproduce steps.
> 1) mkdir named /temp
> 2) put one file A under /temp
> 3) change /temp storage policy to COLD
> 4) use -getStoragePolicy to query file A's storage policy, it is same with 
> /temp
> 5) change /temp folder storage policy again, will see file A's storage policy 
> keep same with parent folder.
> then restart the cluster.
> do 3) 4) again, will find file A's storage policy is not change while parent 
> folder's storage policy changes. It behaves different.
> As i debugged, found the code:
> in INodeFile.getStoragePolicyID
> {code}
>   public byte getStoragePolicyID() {
> byte id = getLocalStoragePolicyID();
> if (id == BLOCK_STORAGE_POLICY_ID_UNSPECIFIED) {
>   return this.getParent() != null ?
>   this.getParent().getStoragePolicyID() : id;
> }
> return id;
>   }
> {code}
> If the file do not have its storage policy, it will use parent's. But after 
> cluster restart, the file turns to have its own storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-8431) hdfs crypto class not found in Windows


 [ 
https://issues.apache.org/jira/browse/HDFS-8431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-8431.
-

> hdfs crypto class not found in Windows
> --
>
> Key: HDFS-8431
> URL: https://issues.apache.org/jira/browse/HDFS-8431
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.6.0
> Environment: Windows only
>Reporter: Sumana Sathish
>Assignee: Anu Engineer
>Priority: Critical
>  Labels: 2.6.1-candidate, 2.7.2-candidate, encryption, scripts, 
> windows
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: Screen Shot 2015-05-18 at 6.27.11 PM.png, 
> hdfs-8431.001.patch, hdfs-8431.002.patch
>
>
> Attached screenshot shows that hdfs could not find class 'crypto' for Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-7314) When the DFSClient lease cannot be renewed, abort open-for-write files rather than the entire DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-7314.
-

> When the DFSClient lease cannot be renewed, abort open-for-write files rather 
> than the entire DFSClient
> ---
>
> Key: HDFS-7314
> URL: https://issues.apache.org/jira/browse/HDFS-7314
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
>  Labels: 2.6.1-candidate, 2.7.2-candidate, BB2015-05-TBR
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: HDFS-7314-2.patch, HDFS-7314-3.patch, HDFS-7314-4.patch, 
> HDFS-7314-5.patch, HDFS-7314-6.patch, HDFS-7314-7.patch, HDFS-7314-8.patch, 
> HDFS-7314-9.patch, HDFS-7314-branch-2.7.2.txt, HDFS-7314.patch
>
>
> It happened in YARN nodemanger scenario. But it could happen to any long 
> running service that use cached instance of DistrbutedFileSystem.
> 1. Active NN is under heavy load. So it became unavailable for 10 minutes; 
> any DFSClient request will get ConnectTimeoutException.
> 2. YARN nodemanager use DFSClient for certain write operation such as log 
> aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's 
> renewLease RPC got ConnectTimeoutException.
> {noformat}
> 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to 
> renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds.  
> Aborting ...
> {noformat}
> 3. After DFSClient is in Aborted state, YARN NM can't use that cached 
> instance of DistributedFileSystem.
> {noformat}
> 2014-10-29 20:26:23,991 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc...
> java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We can make YARN or DFSClient more tolerant to temporary NN unavailability. 
> Given the callstack is YARN -> DistributedFileSystem -> DFSClient, this can 
> be addressed at different layers.
> * YARN closes the DistributedFileSystem object when it receives some well 
> defined exception. Then the next HDFS call will create a new instance of 
> DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS 
> applications need to address this as well.
> * DistributedFileSystem detects Aborted DFSClient and create a new instance 
> of DFSClient. We will need to fix all the places DistributedFileSystem calls 
> DFSClient.
> * After DFSClient gets into Aborted state, it doesn't have to reject all 
> requests , instead it can retry. If NN is available again it can transition 
> to healthy state.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-7489) Incorrect locking in FsVolumeList#checkDirs can hang datanodes


 [ 
https://issues.apache.org/jira/browse/HDFS-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-7489.
-

> Incorrect locking in FsVolumeList#checkDirs can hang datanodes
> --
>
> Key: HDFS-7489
> URL: https://issues.apache.org/jira/browse/HDFS-7489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0, 2.6.0
>Reporter: Noah Lorang
>Assignee: Noah Lorang
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1
>
> Attachments: HDFS-7489-v1.patch, HDFS-7489-v2.patch, 
> HDFS-7489-v2.patch.1
>
>
> Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes 
> hanging their heartbeat and requests from clients. After some digging, I 
> identified the culprit as being the checkDiskError() triggered by catching 
> IOExceptions (in our case, SocketExceptions being triggered on one datanode 
> by ReplicaAlreadyExistsExceptions on another datanode).
> Thread dumps reveal that the checkDiskErrors() thread is holding a lock on 
> the FsVolumeList:
> {code}
> "Thread-409" daemon prio=10 tid=0x7f4e50200800 nid=0x5b8e runnable 
> [0x7f4e2f855000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.list(Native Method)
> at java.io.File.list(File.java:973)
> at java.io.File.listFiles(File.java:1051)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180)
> - locked <0x00063b182ea0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> Other things would then lock the FsDatasetImpl while waiting for the 
> FsVolumeList, e.g.:
> {code}
> "DataXceiver for client  at /10.10.0.52:46643 [Receiving block 
> BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_106962574]" daemon 
> prio=10 tid=0x7f4e55561000 nid=0x406d waiting for monitor entry 
> [0x7f4e3106d000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64)
> - waiting to lock <0x00063b182ea0> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927)
> - locked <0x00063b1f9a48> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:167)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> That lock on the FsDatasetImpl then causes other threads to block:
> {code}
> "Thread-127" daemon prio=10 tid=0x7f4e4c67d800 nid=0x2e02 waiting for 
> monitor entry [0x7f4e3339]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:228)
> - waiting to lock <0x00063b1f9a48> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684)
> at 
>

[jira] [Closed] (HDFS-7425) NameNode block deletion logging uses incorrect appender.


 [ 
https://issues.apache.org/jira/browse/HDFS-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-7425.
-

> NameNode block deletion logging uses incorrect appender.
> 
>
> Key: HDFS-7425
> URL: https://issues.apache.org/jira/browse/HDFS-7425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1
>
> Attachments: HDFS-7425-branch-2.1.patch
>
>
> The NameNode uses 2 separate Log4J appenders for tracking state changes.  The 
> appenders are named "org.apache.hadoop.hdfs.StateChange" and 
> "BlockStateChange".  The intention of BlockStateChange is to separate more 
> verbose block state change logging and allow it to be configured separately.  
> In branch-2, there is some block state change logging that incorrectly goes 
> to the "org.apache.hadoop.hdfs.StateChange" appender though.  The bug is not 
> present in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-8046) Allow better control of getContentSummary


 [ 
https://issues.apache.org/jira/browse/HDFS-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-8046.
-

> Allow better control of getContentSummary
> -
>
> Key: HDFS-8046
> URL: https://issues.apache.org/jira/browse/HDFS-8046
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: HDFS-8046-branch-2.6.1.txt, HDFS-8046.v1.patch
>
>
> On busy clusters, users performing quota checks against a big directory 
> structure can affect the namenode performance. It has become a lot better 
> after HDFS-4995, but as clusters get bigger and busier, it is apparent that 
> we need finer grain control to avoid long read lock causing throughput drop.
> Even with unfair namesystem lock setting, a long read lock (10s of 
> milliseconds) can starve many readers and especially writers. So the locking 
> duration should be reduced, which can be done by imposing a lower 
> count-per-iteration limit in the existing implementation.  But HDFS-4995 came 
> with a fixed amount of sleep between locks. This needs to be made 
> configurable, so that {{getContentSummary()}} doesn't get exceedingly slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases


 [ 
https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-4882.
-

> Prevent the Namenode's LeaseManager from looping forever in checkLeases
> ---
>
> Key: HDFS-4882
> URL: https://issues.apache.org/jira/browse/HDFS-4882
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, namenode
>Affects Versions: 2.0.0-alpha, 2.5.1
>Reporter: Zesheng Wu
>Assignee: Ravi Prakash
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1
>
> Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, 
> HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, 
> HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch
>
>
> Scenario:
> 1. cluster with 4 DNs
> 2. the size of the file to be written is a little more than one block
> 3. write the first block to 3 DNs, DN1->DN2->DN3
> 4. all the data packets of first block is successfully acked and the client 
> sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
> 5. DN2 and DN3 are down
> 6. client recovers the pipeline, but no new DN is added to the pipeline 
> because of the current pipeline stage is PIPELINE_CLOSE
> 7. client continuously writes the last block, and try to close the file after 
> written all the data
> 8. NN finds that the penultimate block doesn't has enough replica(our 
> dfs.namenode.replication.min=2), and the client's close runs into indefinite 
> loop(HDFS-2936), and at the same time, NN makes the last block's state to 
> COMPLETE
> 9. shutdown the client
> 10. the file's lease exceeds hard limit
> 11. LeaseManager realizes that and begin to do lease recovery by call 
> fsnamesystem.internalReleaseLease()
> 12. but the last block's state is COMPLETE, and this triggers lease manager's 
> infinite loop and prints massive logs like this:
> {noformat}
> 2013-06-05,17:42:25,695 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
>  limit
> 2013-06-05,17:42:25,695 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
> /user/h_wuzesheng/test.dat
> 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block 
> blk_-7028017402720175688_1202597,
> lastBLockState=COMPLETE
> 2013-06-05,17:42:25,695 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery 
> for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
> APREDUCE_-1252656407_1, pendingcreates: 1]
> {noformat}
> (the 3rd line log is a debug log added by us)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-7733) NFS: readdir/readdirplus return null directory attribute on failure


 [ 
https://issues.apache.org/jira/browse/HDFS-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-7733.
-

> NFS: readdir/readdirplus return null directory attribute on failure
> ---
>
> Key: HDFS-7733
> URL: https://issues.apache.org/jira/browse/HDFS-7733
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.6.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1
>
> Attachments: HDFS-7733.01.patch
>
>
> NFS readdir and readdirplus operations return a null directory attribute on 
> some failure paths. This causes clients to get a 'Stale file handle' error 
> which can only be fixed by unmounting and remounting the share.
> The issue can be reproduced by running 'ls' against a large directory which 
> is being actively modified, triggering the 'cookie mismatch' failure path.
> {code}
> } else {
>   LOG.error("cookieverf mismatch. request cookieverf: " + cookieVerf
>   + " dir cookieverf: " + dirStatus.getModificationTime());
>   return new READDIRPLUS3Response(Nfs3Status.NFS3ERR_BAD_COOKIE);
> }
> {code}
> Thanks to [~brandonli] for catching the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume


 [ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-7443.
-

> Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
> present in the same volume
> --
>
> Key: HDFS-7443
> URL: https://issues.apache.org/jira/browse/HDFS-7443
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kihwal Lee
>Assignee: Colin Patrick McCabe
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1
>
> Attachments: HDFS-7443.001.patch, HDFS-7443.002.patch
>
>
> When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
> datanodes were not coming up.  They treid data file layout upgrade for 
> BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
> All failures were caused by {{NativeIO.link()}} throwing IOException saying 
> {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
> retried when the block pool initialization was retried whenever 
> {{BPServiceActor}} was registering with the namenode.  After many retries, 
> datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
> no {{VERSION}} file in the block pool slice storage directory.  
> Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
> in the new layout and the subdirs were all newly created ones.  This 
> shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
> removes {{current}} and renames {{previous.tmp}} to {{current}} before 
> retrying.  All successfully upgraded volumes had old state preserved in their 
> {{previous}} directory.
> In summary there were two observed issues.
> - Upgrade failure with {{link()}} failing with {{EEXIST}}
> - {{previous.tmp}} contained not the content of original {{current}}, but 
> half-upgraded one.
> We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly


[ 
https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933772#comment-14933772
 ] 

Jing Zhao commented on HDFS-1172:
-

bq. BlockManager#hasEnoughEffectiveReplicas added by HDFS-8938 takes pending 
replicas into account. numCurrentReplica in BlockManager#addStoredBlock was 
fixed to take pending replicas into account by HDFS-8623.

These two jiras are mainly doing only code refactoring. The logic has been 
there for a while.

bq. I think it is better to leave BlockManager#checkReplication as is here. 
Though it may add block having pending replicas to neededReplications, the 
replication will not be scheduled as far as the replica is in 
pendingReplications because BlockManager#hasEnoughEffectiveReplicas takes it 
into account.

The question is, if we expect later replication monitor to remove the block 
from {{neededReplication}}, why do we add it in the first place? Also if a 
block's effective replica number (including pending replica number) is >= than 
its replication factor, the block should not be in {{neededReplication}}. This 
is more consistent with the current logic.



> Blocks in newly completed files are considered under-replicated too quickly
> ---
>
> Key: HDFS-1172
> URL: https://issues.apache.org/jira/browse/HDFS-1172
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
> Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, 
> HDFS-1172.009.patch, HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, 
> replicateBlocksFUC.patch, replicateBlocksFUC1.patch, replicateBlocksFUC1.patch
>
>
> I've seen this for a long time, and imagine it's a known issue, but couldn't 
> find an existing JIRA. It often happens that we see the NN schedule 
> replication on the last block of files very quickly after they're completed, 
> before the other DNs in the pipeline have a chance to report the new block. 
> This results in a lot of extra replication work on the cluster, as we 
> replicate the block and then end up with multiple excess replicas which are 
> very quickly deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-28 Thread Xiaobing Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-8696:

Attachment: HDFS-8696.009.patch

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.008.patch, 
> HDFS-8696.009.patch, HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9158) [OEV-Doc] : Document does not mention about "-f" and "-r" options


[ 
https://issues.apache.org/jira/browse/HDFS-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933815#comment-14933815
 ] 

Hadoop QA commented on HDFS-9158:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 49s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 19s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m  6s | Site still builds. |
| {color:red}-1{color} | checkstyle |   1m 25s | The applied patch generated  1 
new checkstyle issues (total was 40, now 40). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 163m 58s | Tests passed in hadoop-hdfs. 
|
| | | 215m 49s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764029/HDFS-9158.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 892ade6 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12711/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12711/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12711/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12711/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12711/console |


This message was automatically generated.

> [OEV-Doc] : Document does not mention about "-f" and "-r" options
> -
>
> Key: HDFS-9158
> URL: https://issues.apache.org/jira/browse/HDFS-9158
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9158.01.patch, HDFS-9158_02.patch
>
>
> 1. Document does not mention about "-f" and "-r" options
> add these options also in document
> {noformat}
> -f,--fix-txids Renumber the transaction IDs in the input,
>so that there are no gaps or invalid  transaction IDs.
> -r,--recover   When reading binary edit logs, use recovery
>mode.  This will give you the chance to skip
>corrupt parts of the edit log.
> {noformat}
> 2. In help message there is some extra white spaces 
> {code}
> "so that there are no gaps or invalidtransaction IDs."
> {code}
> can remove this also



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9148) Incorrect assert message in TestWriteToReplica#testWriteToTemporary


[ 
https://issues.apache.org/jira/browse/HDFS-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933853#comment-14933853
 ] 

Hadoop QA commented on HDFS-9148:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |   7m 45s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   1m 15s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 166m 43s | Tests failed in hadoop-hdfs. |
| | | 189m 44s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.web.TestWebHDFS |
|   | hadoop.hdfs.web.TestWebHDFSOAuth2 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762484/hdfs-9148.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 892ade6 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12713/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12713/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12713/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12713/console |


This message was automatically generated.

> Incorrect assert message in TestWriteToReplica#testWriteToTemporary
> ---
>
> Key: HDFS-9148
> URL: https://issues.apache.org/jira/browse/HDFS-9148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Attachments: hdfs-9148.patch
>
>
> The following assert text in TestWriteToReplica#testWriteToTemporary is not 
> correct:
> {code:java}
>   Assert.fail("createRbw() Should have removed the block with the older "
>   + "genstamp and replaced it with the newer one: " + 
> blocks[NON_EXISTENT]);
> {code}
> If the assert is triggered, it can only be due to an temporary replica 
> already exists and has newer generation stamp. It should have nothing to do 
> with createRbw().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9110) Improve upon HDFS-8480

2015-09-28 Thread Charlie Helin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933877#comment-14933877
 ] 

Charlie Helin commented on HDFS-9110:
-

[~vinod] got it, sorry!


> Improve upon HDFS-8480
> --
>
> Key: HDFS-9110
> URL: https://issues.apache.org/jira/browse/HDFS-9110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Charlie Helin
>Assignee: Charlie Helin
>Priority: Minor
> Attachments: HDFS-9110.00.patch, HDFS-9110.01.patch, 
> HDFS-9110.02.patch, HDFS-9110.03.patch, HDFS-9110.04.patch, HDFS-9110.05.patch
>
>
> This is a request to do some cosmetic improvements on top of HDFS-8480. There 
> a couple of File -> java.nio.file.Path conversions which is a little bit 
> distracting. 
> The second aspect is more around efficiency, to be perfectly honest I'm not 
> sure what the number of files that may be processed. However as HDFS-8480 
> eludes to it appears that this number could be significantly large. 
> The current implementation is basically a collect and process where all files 
> first is being examined; put into a collection and after that processed. 
> HDFS-8480 could simply be further enhanced by employing a single iteration 
> without creating an intermediary collection of filenames by using a FileWalker



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9092) Nfs silently drops overlapping write requests and causes data copying to fail

2015-09-28 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9092:

Attachment: HDFS-9092.002.patch

Thanks a lot [~brandonli]!

I found trunk is changed such that the patch no longer compiles. Uploading rev 
002 to address it. In addition, removed extra spaces reported by last jenkins 
job.

I originally planned to get this into 2.6.2, however, I found that both 2.7 and 
2.6 miss some changes thus the patch can not be applied cleanly. I'm targetting 
this change to 2.8 for now.


> Nfs silently drops overlapping write requests and causes data copying to fail
> -
>
> Key: HDFS-9092
> URL: https://issues.apache.org/jira/browse/HDFS-9092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.1
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-9092.001.patch, HDFS-9092.002.patch
>
>
> When NOT using 'sync' option, the NFS writes may issue the following warning:
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got an overlapping write 
> (1248751616, 1249677312), nextOffset=1248752400. Silently drop it now
> and the size of data copied via NFS will stay at 1248752400.
> Found what happened is:
> 1. The write requests from client are sent asynchronously. 
> 2. The NFS gateway has handler to handle the incoming requests by creating an 
> internal write request structuire and put it into cache;
> 3. In parallel, a separate thread in NFS gateway takes requests out from the 
> cache and writes the data to HDFS.
> The current offset is how much data has been written by the write thread in 
> 3. The detection of overlapping write request happens in 2, but it only 
> checks the write request against the curent offset, and trim the request if 
> necessary. Because the write requests are sent asynchronously, if two 
> requests are beyond the current offset, and they overlap, it's not detected 
> and both are put into the cache. This cause the symptom reported in this case 
> at step 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6264) Provide FileSystem#create() variant which throws exception if parent directory doesn't exist


[ 
https://issues.apache.org/jira/browse/HDFS-6264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933904#comment-14933904
 ] 

Ted Yu commented on HDFS-6264:
--

I ran TestIPC, TestHDFSContractCreate and TestStorageRestore locally with patch 
which passed.

TestNativeAzureFileSystemOperationsMocked fails without patch.

> Provide FileSystem#create() variant which throws exception if parent 
> directory doesn't exist
> 
>
> Key: HDFS-6264
> URL: https://issues.apache.org/jira/browse/HDFS-6264
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>  Labels: hbase
> Attachments: hdfs-6264-v1.txt, hdfs-6264-v2.txt, hdfs-6264-v3.txt
>
>
> FileSystem#createNonRecursive() is deprecated.
> However, there is no DistributedFileSystem#create() implementation which 
> throws exception if parent directory doesn't exist.
> This limits clients' migration away from the deprecated method.
> For HBase, IO fencing relies on the behavior of 
> FileSystem#createNonRecursive().
> Variant of create() method should be added which throws exception if parent 
> directory doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9148) Incorrect assert message in TestWriteToReplica#testWriteToTemporary

2015-09-28 Thread Tony Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933910#comment-14933910
 ] 

Tony Wu commented on HDFS-9148:
---

I don't think the failed tests (TestDirectoryScanner, TestWebHDFS, 
TestWebHDFSOAuth2) has anything to do with this patch.

> Incorrect assert message in TestWriteToReplica#testWriteToTemporary
> ---
>
> Key: HDFS-9148
> URL: https://issues.apache.org/jira/browse/HDFS-9148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Attachments: hdfs-9148.patch
>
>
> The following assert text in TestWriteToReplica#testWriteToTemporary is not 
> correct:
> {code:java}
>   Assert.fail("createRbw() Should have removed the block with the older "
>   + "genstamp and replaced it with the newer one: " + 
> blocks[NON_EXISTENT]);
> {code}
> If the assert is triggered, it can only be due to an temporary replica 
> already exists and has newer generation stamp. It should have nothing to do 
> with createRbw().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures


 [ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9106:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.2
   Status: Resolved  (was: Patch Available)

> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: HDFS-9106-poc.patch, HDFS-9106.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures


[ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933911#comment-14933911
 ] 

Kihwal Lee commented on HDFS-9106:
--

committed to branch-2 and trunk.

> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9106-poc.patch, HDFS-9106.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-9165) Move the rest of the entries in META-INF/services/o.a.h.fs.FileSystem to hdfs-client


 [ 
https://issues.apache.org/jira/browse/HDFS-9165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu reassigned HDFS-9165:
---

Assignee: Mingliang Liu

> Move the rest of the entries in META-INF/services/o.a.h.fs.FileSystem to 
> hdfs-client
> 
>
> Key: HDFS-9165
> URL: https://issues.apache.org/jira/browse/HDFS-9165
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
>
> After HDFS-8740 the entries in META-INF/services/o.a.h.fs.FileSystem should 
> be updated accordingly similar to HDFS-9041.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9110) Improve upon HDFS-8480


 [ 
https://issues.apache.org/jira/browse/HDFS-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-9110:
--
Target Version/s: 2.8.0
   Fix Version/s: (was: 2.6.1)
  (was: 2.7.0)

[~chelin], please use the "Target Version" field to express your intention. 
Fix-version is exclusively used by committers when a patch gets committed.

IAC, 2.6.1 and 2.7.0 are both done. Setting "Target Version" 2.8.0 for this 
improvement.

> Improve upon HDFS-8480
> --
>
> Key: HDFS-9110
> URL: https://issues.apache.org/jira/browse/HDFS-9110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Charlie Helin
>Assignee: Charlie Helin
>Priority: Minor
> Attachments: HDFS-9110.00.patch, HDFS-9110.01.patch, 
> HDFS-9110.02.patch, HDFS-9110.03.patch, HDFS-9110.04.patch, HDFS-9110.05.patch
>
>
> This is a request to do some cosmetic improvements on top of HDFS-8480. There 
> a couple of File -> java.nio.file.Path conversions which is a little bit 
> distracting. 
> The second aspect is more around efficiency, to be perfectly honest I'm not 
> sure what the number of files that may be processed. However as HDFS-8480 
> eludes to it appears that this number could be significantly large. 
> The current implementation is basically a collect and process where all files 
> first is being examined; put into a collection and after that processed. 
> HDFS-8480 could simply be further enhanced by employing a single iteration 
> without creating an intermediary collection of filenames by using a FileWalker



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures


[ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933698#comment-14933698
 ] 

Jing Zhao commented on HDFS-9106:
-

+1 for the latest patch. Thanks [~kihwal].

> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9106-poc.patch, HDFS-9106.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7633) BlockPoolSliceScanner fails when Datanode has too many blocks


 [ 
https://issues.apache.org/jira/browse/HDFS-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HDFS-7633.
---
   Resolution: Not A Problem
Fix Version/s: (was: 2.6.1)

Resolving this instead as not-a-problem-anymore.

> BlockPoolSliceScanner fails when Datanode has too many blocks
> -
>
> Key: HDFS-7633
> URL: https://issues.apache.org/jira/browse/HDFS-7633
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-7633.patch
>
>
> issue:
> When Total blocks of one of my DNs reaches 33554432, It refuses to accept 
> more blocks, this is the ERROR.
> 2015-01-16 15:21:44,571 | ERROR | DataXceiver for client  at /172.1.1.8:50490 
> [Receiving block 
> BP-1976278848-172.1.1.2-1419846518085:blk_1221043436_147936990] | 
> datasight-198:25009:DataXceiver error processing WRITE_BLOCK operation  src: 
> /172.1.1.8:50490 dst: /172.1.1.11:25009 | 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
> java.lang.IllegalArgumentException: n must be positive
> at java.util.Random.nextInt(Random.java:300)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.getNewBlockScanTime(BlockPoolSliceScanner.java:263)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.addBlock(BlockPoolSliceScanner.java:276)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.addBlock(DataBlockScanner.java:193)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.closeBlock(DataNode.java:1733)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:765)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> at java.lang.Thread.run(Thread.java:745)
> analysis:
> in function 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.getNewBlockScanTime()
> when blockMap.size() is too big,
> Math.max(blockMap.size(),1)  * 600  is int type, and negtive
> Math.max(blockMap.size(),1) * 600 * 1000L is long type, and negtive
> (int)period  is Integer.MIN_VALUE
> Math.abs((int)period) is Integer.MIN_VALUE , which is negtive
> DFSUtil.getRandom().nextInt(periodInt)  will thows IllegalArgumentException
> I use Java HotSpot (build 1.7.0_05-b05)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel


[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933610#comment-14933610
 ] 

Haohui Mai commented on HDFS-8578:
--

We've seen in production that the upgrades take ~3hours for 24 disks.

I think it makes sense to simply run a thread for each disk instead of having a 
new configuration, unless it is possible to parallelize the hard links for 
individual disks.

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9151) Mover should print the exit status/reason on console like balancer tool.


[ 
https://issues.apache.org/jira/browse/HDFS-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933651#comment-14933651
 ] 

Hadoop QA commented on HDFS-9151:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m  4s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  4s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 14s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 26s | The applied patch generated  1 
new checkstyle issues (total was 19, now 20). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 166m 20s | Tests failed in hadoop-hdfs. |
| | | 212m 26s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestParallelShortCircuitRead |
|   | hadoop.fs.contract.hdfs.TestHDFSContractMkdir |
|   | hadoop.cli.TestDeleteCLI |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.TestFileLengthOnClusterRestart |
|   | hadoop.hdfs.TestAppendSnapshotTruncate |
|   | hadoop.fs.contract.hdfs.TestHDFSContractRootDirectory |
|   | hadoop.cli.TestHDFSCLI |
|   | hadoop.hdfs.tools.TestGetGroups |
|   | hadoop.hdfs.TestDFSStorageStateRecovery |
|   | hadoop.fs.contract.hdfs.TestHDFSContractRename |
|   | hadoop.cli.TestCacheAdminCLI |
|   | hadoop.fs.loadGenerator.TestLoadGenerator |
|   | hadoop.fs.TestFcHdfsSetUMask |
|   | hadoop.hdfs.crypto.TestHdfsCryptoStreams |
|   | hadoop.fs.viewfs.TestViewFsFileStatusHdfs |
|   | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForAcl |
|   | hadoop.tracing.TestTracingShortCircuitLocalRead |
|   | hadoop.tracing.TestTracing |
|   | hadoop.hdfs.tools.TestDelegationTokenFetcher |
|   | hadoop.fs.TestWebHdfsFileContextMainOperations |
|   | hadoop.hdfs.TestParallelShortCircuitReadNoChecksum |
|   | hadoop.hdfs.TestParallelShortCircuitLegacyRead |
|   | hadoop.net.TestNetworkTopology |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.fs.TestFcHdfsCreateMkdir |
|   | hadoop.hdfs.TestFetchImage |
|   | hadoop.fs.TestFcHdfsPermission |
|   | hadoop.hdfs.TestQuota |
|   | hadoop.hdfs.tools.TestDebugAdmin |
|   | hadoop.hdfs.TestPersistBlocks |
|   | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer |
|   | hadoop.hdfs.TestDFSStartupVersions |
|   | hadoop.hdfs.TestRenameWhileOpen |
|   | hadoop.fs.viewfs.TestViewFsWithXAttrs |
|   | hadoop.tools.TestJMXGet |
|   | hadoop.TestGenericRefresh |
|   | hadoop.hdfs.TestDataTransferKeepalive |
|   | hadoop.fs.TestSWebHdfsFileContextMainOperations |
|   | hadoop.fs.contract.hdfs.TestHDFSContractDelete |
|   | hadoop.fs.contract.hdfs.TestHDFSContractSetTimes |
|   | hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForXAttr |
|   | hadoop.hdfs.TestConnCache |
|   | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.hdfs.TestFileAppend |
|   | hadoop.hdfs.TestReservedRawPaths |
|   | hadoop.TestRefreshCallQueue |
|   | hadoop.hdfs.TestDisableConnCache |
|   | hadoop.cli.TestXAttrCLI |
|   | hadoop.hdfs.TestParallelShortCircuitReadUnCached |
|   | hadoop.hdfs.tools.TestGetConf |
|   | hadoop.fs.TestUrlStreamHandler |
|   | hadoop.tools.TestTools |
|   | hadoop.fs.TestEnhancedByteBufferAccess |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.fs.viewfs.TestViewFileSystemWithXAttrs |
|   | hadoop.fs.TestHDFSFileContextMainOperations |
|   | hadoop.hdfs.qjournal.client.TestEpochsAreUnique |
|   | hadoop.hdfs.TestWriteRead |
|   | hadoop.hdfs.tools.TestDFSAdminWithHA |
|   | hadoop.hdfs.TestExternalBlockReader |
|   | hadoop.fs.contract.hdfs.TestHDFSContractAppend |
|   | hadoop.hdfs.TestHdfsAdmin |
|   | hadoop.fs.contract.hdfs.TestHDFSContractOpen |
|   | hadoop.hdfs.TestFileCreation |
|   | hadoop.hdfs.TestClientReportBadBlock |
|   | hadoop.tracing.TestTraceAdmin |
|   | hadoop.hdfs.TestAbandonBlock |
|   |

[jira] [Updated] (HDFS-9114) NameNode and DataNode metric log file name should follow the other log file name format.

2015-09-28 Thread Surendra Singh Lilhore (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9114:
-
Attachment: HDFS-9114-trunk.01.patch
HDFS-9114-branch-2.01.patch

Attached patch for branch-2 and trunk..

*v1 patch :*

1. As discussed with [~arpitagarwal] offline, using common log4j properties for 
NameNode and DataNode metrics logger.
2. Changed metrics log file name.. 

Please review..

> NameNode and DataNode metric log file name should follow the other log file 
> name format.
> 
>
> Key: HDFS-9114
> URL: https://issues.apache.org/jira/browse/HDFS-9114
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9114-branch-2.01.patch, HDFS-9114-trunk.01.patch
>
>
> Currently datanode and namenode metric log file name is 
> {{datanode-metrics.log}} and {{namenode-metrics.log}}.
> This file name should be like {{hadoop-hdfs-namenode-metric-host192.log}} 
> same as namenode log file {{hadoop-hdfs-namenode-host192.log}}.
> This will help when we will copy log for issue analysis from different node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)


[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933703#comment-14933703
 ] 

Jing Zhao commented on HDFS-9040:
-

[~walter.k.su] and [~zhz], any further comments on the patch?

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> ---
>
> Key: HDFS-9040
> URL: https://issues.apache.org/jira/browse/HDFS-9040
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Jing Zhao
> Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040-HDFS-7285.004.patch, 
> HDFS-9040-HDFS-7285.005.patch, HDFS-9040-HDFS-7285.005.patch, 
> HDFS-9040-HDFS-7285.006.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> -Proposal 1:-
> -A BlockGroupDataStreamer to communicate with NN to allocate/update block, 
> and StripedDataStreamer s only have to stream blocks to DNs.-
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures


[ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933718#comment-14933718
 ] 

Kihwal Lee commented on HDFS-9106:
--

Thanks, [~hitliuyi] and [~jingzhao]. I will commit this shortly.

> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9106-poc.patch, HDFS-9106.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9092) Nfs silently drops overlapping write requests and causes data copying to fail

2015-09-28 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9092:

Summary: Nfs silently drops overlapping write requests and causes data 
copying to fail  (was: Nfs silently drops overlapping write requests, thus data 
copying can't complete)

> Nfs silently drops overlapping write requests and causes data copying to fail
> -
>
> Key: HDFS-9092
> URL: https://issues.apache.org/jira/browse/HDFS-9092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.7.1
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-9092.001.patch
>
>
> When NOT using 'sync' option, the NFS writes may issue the following warning:
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got an overlapping write 
> (1248751616, 1249677312), nextOffset=1248752400. Silently drop it now
> and the size of data copied via NFS will stay at 1248752400.
> Found what happened is:
> 1. The write requests from client are sent asynchronously. 
> 2. The NFS gateway has handler to handle the incoming requests by creating an 
> internal write request structuire and put it into cache;
> 3. In parallel, a separate thread in NFS gateway takes requests out from the 
> cache and writes the data to HDFS.
> The current offset is how much data has been written by the write thread in 
> 3. The detection of overlapping write request happens in 2, but it only 
> checks the write request against the curent offset, and trim the request if 
> necessary. Because the write requests are sent asynchronously, if two 
> requests are beyond the current offset, and they overlap, it's not detected 
> and both are put into the cache. This cause the symptom reported in this case 
> at step 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits


 [ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-7609.
-

> Avoid retry cache collision when Standby NameNode loading edits
> ---
>
> Key: HDFS-7609
> URL: https://issues.apache.org/jira/browse/HDFS-7609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.2.0
>Reporter: Carrey Zhan
>Assignee: Ming Ma
>Priority: Critical
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, 
> HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609-branch-2.7.2.txt, 
> HDFS-7609.patch, recovery_do_not_use_retrycache.patch
>
>
> One day my namenode crashed because of two journal node timed out at the same 
> time under very high load, leaving behind about 100 million transactions in 
> edits log.(I still have no idea why they were not rolled into fsimage.)
> I tryed to restart namenode, but it showed that almost 20 hours would be 
> needed before finish, and it was loading fsedits most of the time. I also 
> tryed to restart namenode in recover mode, the loading speed had no different.
> I looked into the stack trace, judged that it is caused by the retry cache. 
> So I set dfs.namenode.enable.retrycache to false, the restart process 
> finished in half an hour.
> I think the retry cached is useless during startup, at least during recover 
> process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-8384) Allow NN to startup if there are files having a lease but are not under construction


 [ 
https://issues.apache.org/jira/browse/HDFS-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-8384.
-

> Allow NN to startup if there are files having a lease but are not under 
> construction
> 
>
> Key: HDFS-8384
> URL: https://issues.apache.org/jira/browse/HDFS-8384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Jing Zhao
>Priority: Minor
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: HDFS-8384-branch-2.6.patch, HDFS-8384-branch-2.7.patch, 
> HDFS-8384.000.patch
>
>
> When there are files having a lease but are not under construction, NN will 
> fail to start up with
> {code}
> 15/05/12 00:36:31 ERROR namenode.FSImage: Unable to save image for 
> /hadoop/hdfs/namenode
> java.lang.IllegalStateException
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
> at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.getINodesUnderConstruction(LeaseManager.java:412)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFilesUnderConstruction(FSNamesystem.java:7124)
> ...
> {code}
> The actually problem is that the image could be corrupted by bugs like 
> HDFS-7587.  We should have an option/conf to allow NN to start up so that the 
> problematic files could possibly be deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9165) Move the rest of the entries in META-INF/services/o.a.h.fs.FileSystem to hdfs-client


 [ 
https://issues.apache.org/jira/browse/HDFS-9165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9165:

Status: Patch Available  (was: Open)

> Move the rest of the entries in META-INF/services/o.a.h.fs.FileSystem to 
> hdfs-client
> 
>
> Key: HDFS-9165
> URL: https://issues.apache.org/jira/browse/HDFS-9165
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9165.000.patch
>
>
> After HDFS-8740 the entries in META-INF/services/o.a.h.fs.FileSystem should 
> be updated accordingly similar to HDFS-9041.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9080) update htrace version to 4.0.1


[ 
https://issues.apache.org/jira/browse/HDFS-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933581#comment-14933581
 ] 

Hudson commented on HDFS-9080:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #457 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/457/])
HDFS-9080. Update htrace version to 4.0.1 (cmccabe) (cmccabe: rev 
892ade689f9bcce76daae8f66fc00a49bee8548e)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtoUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java
* hadoop-common-project/hadoop-common/src/main/proto/RpcHeader.proto
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileContext.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveIterator.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/EncryptionZoneIterator.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/WritableRpcEngine.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsTracer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSPacket.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPacket.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFsShell.java
* hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* hadoop-common-project/hadoop-common/pom.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java
* hadoop-common-project/hadoop-common/src/site/markdown/Tracing.md
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocalLegacy.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShell.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/SetSpanReceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolIterator.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
*

[jira] [Updated] (HDFS-9159) [OIV] : return value of the command is not correct if invalid value specified in "-p (processor)" option


 [ 
https://issues.apache.org/jira/browse/HDFS-9159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated HDFS-9159:

Status: Patch Available  (was: Open)

> [OIV] : return value of the command is not correct if invalid value specified 
> in "-p (processor)" option
> 
>
> Key: HDFS-9159
> URL: https://issues.apache.org/jira/browse/HDFS-9159
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9159_01.patch
>
>
> Return value of the IOV command is not correct if invalid value specified in 
> "-p (processor)" option
> this needs to return error to user.
> code change will be in switch statement of
> {code}
>  try (PrintStream out = outputFile.equals("-") ?
> System.out : new PrintStream(outputFile, "UTF-8")) {
>   switch (processor) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-3443) Fix NPE when namenode transition to active during startup by adding checkNNStartup() in NameNodeRpcServer


 [ 
https://issues.apache.org/jira/browse/HDFS-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-3443.
-

> Fix NPE when namenode transition to active during startup by adding 
> checkNNStartup() in NameNodeRpcServer
> -
>
> Key: HDFS-3443
> URL: https://issues.apache.org/jira/browse/HDFS-3443
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover, ha
>Reporter: suja s
>Assignee: Vinayakumar B
> Fix For: 2.6.1
>
> Attachments: HDFS-3443-003.patch, HDFS-3443-004.patch, 
> HDFS-3443-005.patch, HDFS-3443-006.patch, HDFS-3443-007.patch, 
> HDFS-3443_1.patch, HDFS-3443_1.patch
>
>
> Start NN
> Let NN standby services be started.
> Before the editLogTailer is initialised start ZKFC and allow the 
> activeservices start to proceed further.
> Here editLogTailer.catchupDuringFailover() will throw NPE.
> void startActiveServices() throws IOException {
> LOG.info("Starting services required for active state");
> writeLock();
> try {
>   FSEditLog editLog = dir.fsImage.getEditLog();
>   
>   if (!editLog.isOpenForWrite()) {
> // During startup, we're already open for write during initialization.
> editLog.initJournalsForWrite();
> // May need to recover
> editLog.recoverUnclosedStreams();
> 
> LOG.info("Catching up to latest edits from old active before " +
> "taking over writer role in edits logs.");
> editLogTailer.catchupDuringFailover();
> {noformat}
> 2012-05-18 16:51:27,585 WARN org.apache.hadoop.ipc.Server: IPC Server 
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from 
> XX.XX.XX.55:58003: output error
> 2012-05-18 16:51:27,586 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 8 on 8020, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive 
> from XX.XX.XX.55:58004: error: java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:602)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978)
>   at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>   at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> 2012-05-18 16:51:27,586 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 8020 caught an exception
> java.nio.channels.ClosedChannelException
>   at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>   at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2092)
>   at org.apache.hadoop.ipc.Server.access$2000(Server.java:107)
>   at 
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:930)
>   at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:994)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1738)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HDFS-7633) BlockPoolSliceScanner fails when Datanode has too many blocks


 [ 
https://issues.apache.org/jira/browse/HDFS-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened HDFS-7633:
---

> BlockPoolSliceScanner fails when Datanode has too many blocks
> -
>
> Key: HDFS-7633
> URL: https://issues.apache.org/jira/browse/HDFS-7633
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Fix For: 2.6.1
>
> Attachments: HDFS-7633.patch
>
>
> issue:
> When Total blocks of one of my DNs reaches 33554432, It refuses to accept 
> more blocks, this is the ERROR.
> 2015-01-16 15:21:44,571 | ERROR | DataXceiver for client  at /172.1.1.8:50490 
> [Receiving block 
> BP-1976278848-172.1.1.2-1419846518085:blk_1221043436_147936990] | 
> datasight-198:25009:DataXceiver error processing WRITE_BLOCK operation  src: 
> /172.1.1.8:50490 dst: /172.1.1.11:25009 | 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
> java.lang.IllegalArgumentException: n must be positive
> at java.util.Random.nextInt(Random.java:300)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.getNewBlockScanTime(BlockPoolSliceScanner.java:263)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.addBlock(BlockPoolSliceScanner.java:276)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.addBlock(DataBlockScanner.java:193)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.closeBlock(DataNode.java:1733)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:765)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> at java.lang.Thread.run(Thread.java:745)
> analysis:
> in function 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.getNewBlockScanTime()
> when blockMap.size() is too big,
> Math.max(blockMap.size(),1)  * 600  is int type, and negtive
> Math.max(blockMap.size(),1) * 600 * 1000L is long type, and negtive
> (int)period  is Integer.MIN_VALUE
> Math.abs((int)period) is Integer.MIN_VALUE , which is negtive
> DFSUtil.getRandom().nextInt(periodInt)  will thows IllegalArgumentException
> I use Java HotSpot (build 1.7.0_05-b05)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9158) [OEV-Doc] : Document does not mention about "-f" and "-r" options

2015-09-28 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933546#comment-14933546
 ] 

Daniel Templeton commented on HDFS-9158:


Works for me.  +1 (non-binding)

> [OEV-Doc] : Document does not mention about "-f" and "-r" options
> -
>
> Key: HDFS-9158
> URL: https://issues.apache.org/jira/browse/HDFS-9158
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9158.01.patch, HDFS-9158_02.patch
>
>
> 1. Document does not mention about "-f" and "-r" options
> add these options also in document
> {noformat}
> -f,--fix-txids Renumber the transaction IDs in the input,
>so that there are no gaps or invalid  transaction IDs.
> -r,--recover   When reading binary edit logs, use recovery
>mode.  This will give you the chance to skip
>corrupt parts of the edit log.
> {noformat}
> 2. In help message there is some extra white spaces 
> {code}
> "so that there are no gaps or invalidtransaction IDs."
> {code}
> can remove this also



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9155) [OEV] : The inputFile does not follow case insensitiveness incase of XML file


 [ 
https://issues.apache.org/jira/browse/HDFS-9155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated HDFS-9155:

Status: Patch Available  (was: Open)

> [OEV] : The inputFile does not follow case insensitiveness incase of XML file
> -
>
> Key: HDFS-9155
> URL: https://issues.apache.org/jira/browse/HDFS-9155
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9155_01.patch
>
>
> As in document and help
> {noformat}
> -i,--inputFileedits file to process, xml (*case
>insensitive*) extension means XML format,
> {noformat}
> But if i give the file with "XML" extension it falls back to binary 
> processing.
> This issue is due the code
> {code}
>  int org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go()
> .
> boolean xmlInput = inputFileName.endsWith(".xml");
> {code}
> Here need to check the xml after converting the file name to lower case



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-5795) RemoteBlockReader2#checkSuccess() shoud print error status

2015-09-28 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-5795:

Labels: supportability  (was: )

> RemoteBlockReader2#checkSuccess() shoud print error status 
> ---
>
> Key: HDFS-5795
> URL: https://issues.apache.org/jira/browse/HDFS-5795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Xiao Chen
>Priority: Trivial
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-5795.001.patch
>
>
> RemoteBlockReader2#checkSuccess() doesn't print error status, which makes 
> debug harder when the client can't read from DataNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-8863) The remaining space check in BlockPlacementPolicyDefault is flawed


 [ 
https://issues.apache.org/jira/browse/HDFS-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-8863.
-

> The remaining space check in BlockPlacementPolicyDefault is flawed
> --
>
> Key: HDFS-8863
> URL: https://issues.apache.org/jira/browse/HDFS-8863
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.2
>
> Attachments: HDFS-8863-branch-2.6.1.txt, HDFS-8863.patch, 
> HDFS-8863.v2.patch, HDFS-8863.v3.patch
>
>
> The block placement policy calls 
> {{DatanodeDescriptor#getRemaining(StorageType to check whether the block 
> is going to fit. Since the method is adding up all remaining spaces, namenode 
> can allocate a new block on a full node. This causes pipeline construction 
> failure and {{abandonBlock}}. If the cluster is nearly full, the client might 
> hit this multiple times and the write can fail permanently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (HDFS-8846) Add a unit test for INotify functionality across a layout version upgrade


 [ 
https://issues.apache.org/jira/browse/HDFS-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed HDFS-8846.
-

> Add a unit test for INotify functionality across a layout version upgrade
> -
>
> Key: HDFS-8846
> URL: https://issues.apache.org/jira/browse/HDFS-8846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: HDFS-8846-branch-2.6.1.txt, HDFS-8846.00.patch, 
> HDFS-8846.01.patch, HDFS-8846.02.patch, HDFS-8846.03.patch
>
>
> Per discussion under HDFS-8480, we should create some edit log files with old 
> layout version, to test whether they can be correctly handled in upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8584) Support using ramfs partitions on Linux


 [ 
https://issues.apache.org/jira/browse/HDFS-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-8584:
--
Fix Version/s: (was: 2.6.0)

Dropping fix-version.

> Support using ramfs partitions on Linux
> ---
>
> Key: HDFS-8584
> URL: https://issues.apache.org/jira/browse/HDFS-8584
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>
> Now that the bulk of work for HDFS-6919 is complete the memory limit 
> enforcement uses the {{dfs.datanode.max.locked.memory}} setting and not the 
> RAM disk free space availability.
> We can now use ramfs partitions. This will require fixing the free space 
> computation and reservation logic for transient volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9080) update htrace version to 4.0.1


[ 
https://issues.apache.org/jira/browse/HDFS-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933556#comment-14933556
 ] 

Hudson commented on HDFS-9080:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2368 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2368/])
HDFS-9080. Update htrace version to 4.0.1 (cmccabe) (cmccabe: rev 
892ade689f9bcce76daae8f66fc00a49bee8548e)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSPacket.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveIterator.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* hadoop-common-project/hadoop-common/src/main/proto/RpcHeader.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPacket.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/SetSpanReceiver.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/TraceUtils.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeRpcServer.java
* hadoop-common-project/hadoop-common/src/site/markdown/Tracing.md
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFsShell.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShell.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolIterator.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java
* hadoop-common-project/hadoop-common/pom.xml
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java
* hadoop-project/pom.xml
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtoUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/EncryptionZoneIterator.java
*

[jira] [Assigned] (HDFS-9167) Update pom.xml in other modules to depend on hdfs-client instead of hdfs


 [ 
https://issues.apache.org/jira/browse/HDFS-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu reassigned HDFS-9167:
---

Assignee: Mingliang Liu

> Update pom.xml in other modules to depend on hdfs-client instead of hdfs
> 
>
> Key: HDFS-9167
> URL: https://issues.apache.org/jira/browse/HDFS-9167
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
>
> Since now the implementation of the client has been moved to the 
> hadoop-hdfs-client, we should update the poms of other modules in hadoop to 
> use hdfs-client instead of hdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9165) Move the rest of the entries in META-INF/services/o.a.h.fs.FileSystem to hdfs-client


 [ 
https://issues.apache.org/jira/browse/HDFS-9165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9165:

Attachment: HDFS-9165.000.patch

The v0 patch deletes the 
{{hadoop-hdfs-project/hadoop-hdfs/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}
 and moves the {{DistributedFileSystem}} entry to the client side counterpart 
file.

> Move the rest of the entries in META-INF/services/o.a.h.fs.FileSystem to 
> hdfs-client
> 
>
> Key: HDFS-9165
> URL: https://issues.apache.org/jira/browse/HDFS-9165
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9165.000.patch
>
>
> After HDFS-8740 the entries in META-INF/services/o.a.h.fs.FileSystem should 
> be updated accordingly similar to HDFS-9041.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-9166) Move hftp / hsftp filesystem to hfds-client


 [ 
https://issues.apache.org/jira/browse/HDFS-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu reassigned HDFS-9166:
---

Assignee: Mingliang Liu

> Move hftp / hsftp filesystem to hfds-client
> ---
>
> Key: HDFS-9166
> URL: https://issues.apache.org/jira/browse/HDFS-9166
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
>
> The hftp / hsftp filesystems in branch-2 need to be moved to the hdfs-client 
> module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9148) Incorrect assert message in TestWriteToReplica#testWriteToTemporary

2015-09-28 Thread Lei (Eddy) Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9148:

   Resolution: Fixed
Fix Version/s: 2.8.0
   3.0.0
   Status: Resolved  (was: Patch Available)

+1 for the patch. I've committed this to {{trunk}} and {{branch-2}}

Thanks a lot for working on this [~twu], and thanks for the reviews from 
[~templedf]!

> Incorrect assert message in TestWriteToReplica#testWriteToTemporary
> ---
>
> Key: HDFS-9148
> URL: https://issues.apache.org/jira/browse/HDFS-9148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Fix For: 3.0.0, 2.8.0
>
> Attachments: hdfs-9148.patch
>
>
> The following assert text in TestWriteToReplica#testWriteToTemporary is not 
> correct:
> {code:java}
>   Assert.fail("createRbw() Should have removed the block with the older "
>   + "genstamp and replaced it with the newer one: " + 
> blocks[NON_EXISTENT]);
> {code}
> If the assert is triggered, it can only be due to an temporary replica 
> already exists and has newer generation stamp. It should have nothing to do 
> with createRbw().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures


[ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934057#comment-14934057
 ] 

Hudson commented on HDFS-9106:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #453 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/453/])
HDFS-9106. Transfer failure during pipeline recovery causes permanent write 
failures. Contributed by Kihwal Lee. (kihwal: rev 
4c9497cbf02ecc82532a4e79e18912d8e0eb4731)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: HDFS-9106-poc.patch, HDFS-9106.branch-2.7.patch, 
> HDFS-9106.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9148) Incorrect assert message in TestWriteToReplica#testWriteToTemporary


[ 
https://issues.apache.org/jira/browse/HDFS-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934056#comment-14934056
 ] 

Hudson commented on HDFS-9148:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #453 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/453/])
HDFS-9148. Incorrect assert message in TestWriteToReplica#testWriteToTemporary 
(Tony Wu via Lei (Eddy) Xu) (lei: rev 50741cb568d4da30b92d4954928bc3039e583b22)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Incorrect assert message in TestWriteToReplica#testWriteToTemporary
> ---
>
> Key: HDFS-9148
> URL: https://issues.apache.org/jira/browse/HDFS-9148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Fix For: 3.0.0, 2.8.0
>
> Attachments: hdfs-9148.patch
>
>
> The following assert text in TestWriteToReplica#testWriteToTemporary is not 
> correct:
> {code:java}
>   Assert.fail("createRbw() Should have removed the block with the older "
>   + "genstamp and replaced it with the newer one: " + 
> blocks[NON_EXISTENT]);
> {code}
> If the assert is triggered, it can only be due to an temporary replica 
> already exists and has newer generation stamp. It should have nothing to do 
> with createRbw().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9147) Fix the setting of visibleLength in ExternalBlockReader


[ 
https://issues.apache.org/jira/browse/HDFS-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934055#comment-14934055
 ] 

Hudson commented on HDFS-9147:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #453 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/453/])
HDFS-9147. Fix the setting of visibleLength in ExternalBlockReader.  (Colin P. 
McCabe via Lei (Eddy) Xu) (lei: rev e5992ef4df63fbc6a6b8e357b32c647e7837c662)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ExternalBlockReader.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestExternalBlockReader.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java


> Fix the setting of visibleLength in ExternalBlockReader
> ---
>
> Key: HDFS-9147
> URL: https://issues.apache.org/jira/browse/HDFS-9147
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9147.001.patch
>
>
> BlockReaderFactory needs to take the start offset into consideration when 
> setting the visibleLength to use in ExternalBlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9170) Move libhdfs / fuse-dfs / libwebhdfs to a separate module

Haohui Mai created HDFS-9170:


 Summary: Move libhdfs / fuse-dfs / libwebhdfs to a separate module
 Key: HDFS-9170
 URL: https://issues.apache.org/jira/browse/HDFS-9170
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


After HDFS-6200 the Java implementation of hdfs-client has be moved to a 
separate hadoop-hdfs-client module.

libhdfs, fuse-dfs and libwebhdfs still reside in the hadoop-hdfs module. 
Ideally these modules should reside in the hadoop-hdfs-client. However, to 
write unit tests for these components, it is often necessary to run 
MiniDFSCluster which resides in the hadoop-hdfs module.

This jira is to discuss how these native modules should layout after HDFS-6200.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9064) NN old UI (block_info_xml) not available in 2.7.x


 [ 
https://issues.apache.org/jira/browse/HDFS-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9064:
-
Priority: Major  (was: Critical)

> NN old UI (block_info_xml) not available in 2.7.x
> -
>
> Key: HDFS-9064
> URL: https://issues.apache.org/jira/browse/HDFS-9064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Kanaka Kumar Avvaru
>
> In 2.6.x hadoop deploys, given a blockId it was very easy to find out the 
> file name and the locations of replicas (also whether they are corrupt or 
> not).
> This was the REST call:
> {noformat}
>  http://:/block_info_xml.jsp?blockId=xxx
> {noformat}
> But this was removed by HDFS-6252 in 2.7 builds.
> Creating this jira to restore that functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9064) NN old UI (block_info_xml) not available in 2.7.x


[ 
https://issues.apache.org/jira/browse/HDFS-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934129#comment-14934129
 ] 

Haohui Mai commented on HDFS-9064:
--

HDFS-8246 has discussions on the same topic. My understanding is that the 
conclusion is that the information is not fully accurate and fsck is able to 
solve the use case. Thus IMO we should leave it out.

> NN old UI (block_info_xml) not available in 2.7.x
> -
>
> Key: HDFS-9064
> URL: https://issues.apache.org/jira/browse/HDFS-9064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.0
>Reporter: Rushabh S Shah
>Assignee: Kanaka Kumar Avvaru
>Priority: Critical
>
> In 2.6.x hadoop deploys, given a blockId it was very easy to find out the 
> file name and the locations of replicas (also whether they are corrupt or 
> not).
> This was the REST call:
> {noformat}
>  http://:/block_info_xml.jsp?blockId=xxx
> {noformat}
> But this was removed by HDFS-6252 in 2.7 builds.
> Creating this jira to restore that functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures


[ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934128#comment-14934128
 ] 

Hudson commented on HDFS-9106:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2397 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2397/])
HDFS-9106. Transfer failure during pipeline recovery causes permanent write 
failures. Contributed by Kihwal Lee. (kihwal: rev 
4c9497cbf02ecc82532a4e79e18912d8e0eb4731)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: HDFS-9106-poc.patch, HDFS-9106.branch-2.7.patch, 
> HDFS-9106.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8859) Improve DataNode ReplicaMap memory footprint to save about 45%

2015-09-28 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933946#comment-14933946
 ] 

Uma Maheswara Rao G commented on HDFS-8859:
---

Yi, Checkstyle comments are related. Can you please check them?

> Improve DataNode ReplicaMap memory footprint to save about 45%
> --
>
> Key: HDFS-8859
> URL: https://issues.apache.org/jira/browse/HDFS-8859
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-8859.001.patch, HDFS-8859.002.patch, 
> HDFS-8859.003.patch, HDFS-8859.004.patch, HDFS-8859.005.patch
>
>
> By using following approach we can save about *45%* memory footprint for each 
> block replica in DataNode memory (This JIRA only talks about *ReplicaMap* in 
> DataNode), the details are:
> In ReplicaMap, 
> {code}
> private final Map> map =
> new HashMap>();
> {code}
> Currently we use a HashMap {{Map}} to store the replicas 
> in memory.  The key is block id of the block replica which is already 
> included in {{ReplicaInfo}}, so this memory can be saved.  Also HashMap Entry 
> has a object overhead.  We can implement a lightweight Set which is  similar 
> to {{LightWeightGSet}}, but not a fixed size ({{LightWeightGSet}} uses fix 
> size for the entries array, usually it's a big value, an example is 
> {{BlocksMap}}, this can avoid full gc since no need to resize),  also we 
> should be able to get Element through key.
> Following is comparison of memory footprint If we implement a lightweight set 
> as described:
> We can save:
> {noformat}
> SIZE (bytes)   ITEM
> 20The Key: Long (12 bytes object overhead + 8 
> bytes long)
> 12HashMap Entry object overhead
> 4  reference to the key in Entry
> 4  reference to the value in Entry
> 4  hash in Entry
> {noformat}
> Total:  -44 bytes
> We need to add:
> {noformat}
> SIZE (bytes)   ITEM
> 4 a reference to next element in ReplicaInfo
> {noformat}
> Total:  +4 bytes
> So totally we can save 40bytes for each block replica 
> And currently one finalized replica needs around 46 bytes (notice: we ignore 
> memory alignment here).
> We can save 1 - (4 + 46) / (44 + 46) = *45%*  memory for each block replica 
> in DataNode.
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9147) Fix the setting of visibleLength in ExternalBlockReader

2015-09-28 Thread Lei (Eddy) Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9147:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   3.0.0
   Status: Resolved  (was: Patch Available)

+1 for the patch. Committed.

The test failures are not relevant.

Thanks for working this, [~cmccabe]. Thanks for the reviews, [~hitliuyi].

> Fix the setting of visibleLength in ExternalBlockReader
> ---
>
> Key: HDFS-9147
> URL: https://issues.apache.org/jira/browse/HDFS-9147
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 3.0.0, 2.8.0
>
> Attachments: HDFS-9147.001.patch
>
>
> BlockReaderFactory needs to take the start offset into consideration when 
> setting the visibleLength to use in ExternalBlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures


[ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934042#comment-14934042
 ] 

Hudson commented on HDFS-9106:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #459 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/459/])
HDFS-9106. Transfer failure during pipeline recovery causes permanent write 
failures. Contributed by Kihwal Lee. (kihwal: rev 
4c9497cbf02ecc82532a4e79e18912d8e0eb4731)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: HDFS-9106-poc.patch, HDFS-9106.branch-2.7.patch, 
> HDFS-9106.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9106) Transfer failure during pipeline recovery causes permanent write failures


 [ 
https://issues.apache.org/jira/browse/HDFS-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9106:
-
Attachment: HDFS-9106.branch-2.7.patch

For branch-2.7, the location of the file is different. Otherwise the same logic 
applies. Committing this to branch-2.7.

> Transfer failure during pipeline recovery causes permanent write failures
> -
>
> Key: HDFS-9106
> URL: https://issues.apache.org/jira/browse/HDFS-9106
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: HDFS-9106-poc.patch, HDFS-9106.branch-2.7.patch, 
> HDFS-9106.patch
>
>
> When a new node is added to a write pipeline during flush/sync, if the 
> partial block transfer fails, the write will fail permanently without 
> retrying or continuing with whatever is in the pipeline. 
> The transfer often fails in busy clusters due to timeout. There is no 
> per-packet ACK between client and datanode or between source and target 
> datanodes. If the total transfer time exceeds the configured timeout + 10 
> seconds (2 * 5 seconds slack), it is considered failed.  Naturally, the 
> failure rate is higher with bigger block sizes.
> I propose following changes:
> - Transfer timeout needs to be different from per-packet timeout.
> - transfer should be retried if fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9169) TestNativeAzureFileSystemOperationsMocked fails in trunk

Ted Yu created HDFS-9169:


 Summary: TestNativeAzureFileSystemOperationsMocked fails in trunk
 Key: HDFS-9169
 URL: https://issues.apache.org/jira/browse/HDFS-9169
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


When working on HDFS-6264, QA bot reported the following:
{code}
testGlobStatusFilterWithMultiplePathWildcardsAndNonTrivialFilter(org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked)
  Time elapsed: 0.02 sec  <<< ERROR!
java.lang.NullPointerException: null
at org.apache.hadoop.fs.Globber.glob(Globber.java:145)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1688)
at 
org.apache.hadoop.fs.FSMainOperationsBaseTest.testGlobStatusFilterWithMultiplePathWildcardsAndNonTrivialFilter(FSMainOp
{code}
On hadoop trunk branch, the above can be reproduced without any patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)


[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934036#comment-14934036
 ] 

Jing Zhao commented on HDFS-9040:
-

Thanks for the review, Zhe! For {{getFollowingBlock}}, currently we have:
# The {{DFSStripedOutputStream}} fetches the new block from NN when it receives 
the first chunk of data for the new block. This is before we enqueue the first 
packet for the new block.
# The data streamer does not call {{getFollowingBlock}} until its data queue is 
no longer empty.

Thus I think when a data streamer calls {{getFollowingBlock}}, the new block 
should already be ready in the queue. Therefore {{poll}} here should be safe. 
Besides, if we make a mistake here, poll can give us a NPE so it may be easier 
for debug.

I will update the patch to address your comments about {{callUpdatePipeline}} 
and {{updatePipelineInternal}}.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> ---
>
> Key: HDFS-9040
> URL: https://issues.apache.org/jira/browse/HDFS-9040
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Jing Zhao
> Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040-HDFS-7285.004.patch, 
> HDFS-9040-HDFS-7285.005.patch, HDFS-9040-HDFS-7285.005.patch, 
> HDFS-9040-HDFS-7285.006.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> -Proposal 1:-
> -A BlockGroupDataStreamer to communicate with NN to allocate/update block, 
> and StripedDataStreamer s only have to stream blocks to DNs.-
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

2015-09-28 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934073#comment-14934073
 ] 

Zhe Zhang commented on HDFS-9040:
-

Thanks Jing for the updated patch.

bq. The data streamer does not call getFollowingBlock until its data queue is 
no longer empty.
Right I missed the below logic:
{code}
  while ((!shouldStop() && dataQueue.size() == 0 &&
  (stage != BlockConstructionStage.DATA_STREAMING ||
  stage == BlockConstructionStage.DATA_STREAMING &&
  now - lastPacket < halfSocketTimeout)) || doSleep ) {
{code}

+1 on the patch. Thanks for the work!

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> ---
>
> Key: HDFS-9040
> URL: https://issues.apache.org/jira/browse/HDFS-9040
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Jing Zhao
> Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040-HDFS-7285.004.patch, 
> HDFS-9040-HDFS-7285.005.patch, HDFS-9040-HDFS-7285.005.patch, 
> HDFS-9040-HDFS-7285.006.patch, HDFS-9040-HDFS-7285.007.patch, 
> HDFS-9040.00.patch, HDFS-9040.001.wip.patch, HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> -Proposal 1:-
> -A BlockGroupDataStreamer to communicate with NN to allocate/update block, 
> and StripedDataStreamer s only have to stream blocks to DNs.-
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9159) [OIV] : return value of the command is not correct if invalid value specified in "-p (processor)" option


[ 
https://issues.apache.org/jira/browse/HDFS-9159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934101#comment-14934101
 ] 

Hadoop QA commented on HDFS-9159:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 47s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 25s | The applied patch generated  4 
new checkstyle issues (total was 36, now 39). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 192m 22s | Tests failed in hadoop-hdfs. |
| | | 237m 53s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
| Timed out tests | org.apache.hadoop.hdfs.server.mover.TestStorageMover |
|   | org.apache.hadoop.hdfs.server.mover.TestMover |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764052/HDFS-9159_01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fb2e525 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12716/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12716/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12716/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12716/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12716/console |


This message was automatically generated.

> [OIV] : return value of the command is not correct if invalid value specified 
> in "-p (processor)" option
> 
>
> Key: HDFS-9159
> URL: https://issues.apache.org/jira/browse/HDFS-9159
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9159_01.patch
>
>
> Return value of the IOV command is not correct if invalid value specified in 
> "-p (processor)" option
> this needs to return error to user.
> code change will be in switch statement of
> {code}
>  try (PrintStream out = outputFile.equals("-") ?
> System.out : new PrintStream(outputFile, "UTF-8")) {
>   switch (processor) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)


 [ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9040:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7285
   Status: Resolved  (was: Patch Available)

I've committed this to the feature branch. Thanks for the initial work by 
Walter! Also thanks for the review Walter and Zhe!

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> ---
>
> Key: HDFS-9040
> URL: https://issues.apache.org/jira/browse/HDFS-9040
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Walter Su
>Assignee: Jing Zhao
> Fix For: HDFS-7285
>
> Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040-HDFS-7285.004.patch, 
> HDFS-9040-HDFS-7285.005.patch, HDFS-9040-HDFS-7285.005.patch, 
> HDFS-9040-HDFS-7285.006.patch, HDFS-9040-HDFS-7285.007.patch, 
> HDFS-9040.00.patch, HDFS-9040.001.wip.patch, HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> -Proposal 1:-
> -A BlockGroupDataStreamer to communicate with NN to allocate/update block, 
> and StripedDataStreamer s only have to stream blocks to DNs.-
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel


[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934110#comment-14934110
 ] 

Haohui Mai commented on HDFS-8578:
--

bq. Datanode may *require more memory* to process all volumes/disks in parallel.

That doesn't sound right to me. Can you identify why it requires more memory if 
the tasks of each threads are to create hardlinks for blocks? Note that in the 
current implementation the lists of blocks have been pre-calculated.

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9155) [OEV] : The inputFile does not follow case insensitiveness incase of XML file


[ 
https://issues.apache.org/jira/browse/HDFS-9155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933988#comment-14933988
 ] 

Hadoop QA commented on HDFS-9155:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 52s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 34s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 166m  7s | Tests failed in hadoop-hdfs. |
| | | 211m 45s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.blockmanagement.TestBlockManager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764049/HDFS-9155_01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fb2e525 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12715/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12715/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12715/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12715/console |


This message was automatically generated.

> [OEV] : The inputFile does not follow case insensitiveness incase of XML file
> -
>
> Key: HDFS-9155
> URL: https://issues.apache.org/jira/browse/HDFS-9155
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9155_01.patch
>
>
> As in document and help
> {noformat}
> -i,--inputFileedits file to process, xml (*case
>insensitive*) extension means XML format,
> {noformat}
> But if i give the file with "XML" extension it falls back to binary 
> processing.
> This issue is due the code
> {code}
>  int org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go()
> .
> boolean xmlInput = inputFileName.endsWith(".xml");
> {code}
> Here need to check the xml after converting the file name to lower case



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9148) Incorrect assert message in TestWriteToReplica#testWriteToTemporary


[ 
https://issues.apache.org/jira/browse/HDFS-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934018#comment-14934018
 ] 

Hudson commented on HDFS-9148:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8533 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8533/])
HDFS-9148. Incorrect assert message in TestWriteToReplica#testWriteToTemporary 
(Tony Wu via Lei (Eddy) Xu) (lei: rev 50741cb568d4da30b92d4954928bc3039e583b22)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java


> Incorrect assert message in TestWriteToReplica#testWriteToTemporary
> ---
>
> Key: HDFS-9148
> URL: https://issues.apache.org/jira/browse/HDFS-9148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Trivial
> Fix For: 3.0.0, 2.8.0
>
> Attachments: hdfs-9148.patch
>
>
> The following assert text in TestWriteToReplica#testWriteToTemporary is not 
> correct:
> {code:java}
>   Assert.fail("createRbw() Should have removed the block with the older "
>   + "genstamp and replaced it with the newer one: " + 
> blocks[NON_EXISTENT]);
> {code}
> If the assert is triggered, it can only be due to an temporary replica 
> already exists and has newer generation stamp. It should have nothing to do 
> with createRbw().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9170) Move libhdfs / fuse-dfs / libwebhdfs to a separate module


[ 
https://issues.apache.org/jira/browse/HDFS-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934102#comment-14934102
 ] 

Haohui Mai commented on HDFS-9170:
--

There are two options that are available:

(1) Move libhdfs, fuse-dfs, and libwebhdfs to {{hadoop-hdfs-client}} but leaves 
the unit tests that involves {{MiniDFSCluster}} in {{hadoop-hdfs}}. This is 
consistent with what we have done in the Java implementation but also separates 
the tests from the real implementation.
(2) Move libhdfs, fuse-dfs, and libwebhdfs to a separate 
{{hadoop-hdfs-native-client}} module. It has the benefits of putting both the 
implementation and the tests together but it requires several tricks on cmake 
and pom to get things to work.

Thoughts? [~cmccabe], do you have any ideas on this?

> Move libhdfs / fuse-dfs / libwebhdfs to a separate module
> -
>
> Key: HDFS-9170
> URL: https://issues.apache.org/jira/browse/HDFS-9170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> After HDFS-6200 the Java implementation of hdfs-client has be moved to a 
> separate hadoop-hdfs-client module.
> libhdfs, fuse-dfs and libwebhdfs still reside in the hadoop-hdfs module. 
> Ideally these modules should reside in the hadoop-hdfs-client. However, to 
> write unit tests for these components, it is often necessary to run 
> MiniDFSCluster which resides in the hadoop-hdfs module.
> This jira is to discuss how these native modules should layout after 
> HDFS-6200.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9080) update htrace version to 4.0.1


[ 
https://issues.apache.org/jira/browse/HDFS-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933404#comment-14933404
 ] 

Hudson commented on HDFS-9080:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8530 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8530/])
HDFS-9080. Update htrace version to 4.0.1 (cmccabe) (cmccabe: rev 
892ade689f9bcce76daae8f66fc00a49bee8548e)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/WritableRpcEngine.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveIterator.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsShell.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/CachePoolIterator.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPacket.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java
* hadoop-project/pom.xml
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsTracer.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
* hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FSOutputSummer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/tracing/TestTraceUtils.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracingShortCircuitLocalRead.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFsShell.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSPacket.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ProtoUtil.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileContext.java
* hadoop-common-project/hadoop-common/src/site/markdown/Tracing.md
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* hadoop-hdfs-project/hadoop-hdfs/pom.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/tracing/SpanReceiverHost.java
* hadoop-common-project/hadoop-common/src/main/proto/RpcHeader.proto
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtoUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeRpcServer.java
*

[jira] [Updated] (HDFS-9158) [OEV-Doc] : Document does not mention about "-f" and "-r" options


 [ 
https://issues.apache.org/jira/browse/HDFS-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated HDFS-9158:

Status: Patch Available  (was: Open)

> [OEV-Doc] : Document does not mention about "-f" and "-r" options
> -
>
> Key: HDFS-9158
> URL: https://issues.apache.org/jira/browse/HDFS-9158
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
> Attachments: HDFS-9158.01.patch
>
>
> 1. Document does not mention about "-f" and "-r" options
> add these options also in document
> {noformat}
> -f,--fix-txids Renumber the transaction IDs in the input,
>so that there are no gaps or invalid  transaction IDs.
> -r,--recover   When reading binary edit logs, use recovery
>mode.  This will give you the chance to skip
>corrupt parts of the edit log.
> {noformat}
> 2. In help message there is some extra white spaces 
> {code}
> "so that there are no gaps or invalidtransaction IDs."
> {code}
> can remove this also



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6264) Provide FileSystem#create() variant which throws exception if parent directory doesn't exist