date:20141125


[ 
https://issues.apache.org/jira/browse/HDFS-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224171#comment-14224171
 ] 

Vinayakumar B commented on HDFS-7384:
-

Hi [~cnauroth],
if you could look into latest patch that would be great. 
Thanks 

 'getfacl' command and 'getAclStatus' output should be in sync
 -

 Key: HDFS-7384
 URL: https://issues.apache.org/jira/browse/HDFS-7384
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-7384-001.patch, HDFS-7384-002.patch, 
 HDFS-7384-003.patch, HDFS-7384-004.patch, HDFS-7384-005.patch


 *getfacl* command will print all the entries including basic and extended 
 entries, mask entries and effective permissions.
 But, *getAclStatus* FileSystem API will return only extended ACL entries set 
 by the user. But this will not include the mask entry as well as effective 
 permissions.
 To benefit the client using API, better to include 'mask' entry and effective 
 permissions in the return list of entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-5146) JspHelper#bestNode() doesn't handle bad datanodes correctly


 [ 
https://issues.apache.org/jira/browse/HDFS-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-5146:

Resolution: Cannot Reproduce
Status: Resolved  (was: Patch Available)

resolving this issue as this code is no longer available after the migration of 
UI to HTML5.

 JspHelper#bestNode() doesn't handle bad datanodes correctly
 ---

 Key: HDFS-5146
 URL: https://issues.apache.org/jira/browse/HDFS-5146
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-5146.patch


 JspHelper#bestNode() doesn't handle correctly if the chosen datanode is down.
 {code}while (s == null) {
   if (chosenNode == null) {
 do {
   if (doRandom) {
 index = DFSUtil.getRandom().nextInt(nodes.length);
   } else {
 index++;
   }
   chosenNode = nodes[index];
 } while (deadNodes.contains(chosenNode));
   }
   chosenNode = nodes[index];
 {code}
 In this part of the code, choosing the datanode will be done only once.
 If the chosen datanode is down, then definitely exception will be thrown 
 instead of re-chosing the available node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7310) Mover can give first priority to local DN if it has target storage type available in local DN


[ 
https://issues.apache.org/jira/browse/HDFS-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224178#comment-14224178
 ] 

Vinayakumar B commented on HDFS-7310:
-

Hi [~jingzhao], if you could look at the latest patch would be great. 
Appreciate if others also want to take a look.
Thanks

 Mover can give first priority to local DN if it has target storage type 
 available in local DN
 -

 Key: HDFS-7310
 URL: https://issues.apache.org/jira/browse/HDFS-7310
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Vinayakumar B
 Attachments: HDFS-7310-001.patch, HDFS-7310-002.patch, 
 HDFS-7310-003.patch


 Currently Mover logic may move blocks to any DN which had target storage 
 type. But if the src DN has target storage type then mover can give highest 
 priority to local DN. If local DN does not contains target storage type, then 
 it can assign to any DN as the current logic does.
   This is a thought, have not go through the code fully yet.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

2014-11-25 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224213#comment-14224213
 ] 

Yongjun Zhang commented on HDFS-7342:
-

Hi Guys, 

Thanks a lot for the comments and new rev. Please see my comments below, one 
for each of you:-)

{quote}
If any COMMITTED blocks reaches minReplication, state will be automatically 
changed to COMPLETE while processing that IBR itself. Need not be user call. So 
there is no chance of COMMITTED block state with minReplication met. right?
{quote}
Hi [~vinayrpet], indeed the following code in {{BlockManager::addStoredBlock}} 
may be called when IBR is processed, that matches what you were saying:
{code}
  if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED 
numLiveReplicas = minReplication) {
  storedBlock = completeBlock(bc, storedBlock, false);
  }
{code}
But the block has to be COMMITTED to be made COMPLETE. If it's not COMMITTED 
yet (changing to COMMITTED is a request from client and it's asynchronous) , 
even if it has min replication number of replications, it won't be changed to 
COMPLETE. So I think we may still need to take care of changing block's state 
to COMPLETE in {{FSNamesystem#internalReleaseLease}}. Right?

Hi [~kihwal], 

Summary of my understanding of your comment is, there are two paths, one is the 
regular write, the other is recovery. 
* for regular write path, we need to enforce minimal replication
* for the recovery patch, we just need to enforce 1 replica and let replication 
monitor to take care of the rest.
* we can make commitBlockSynchronization() to change a block to COMMITTED when 
there is at least one replica, ignoring min-replication. Currently only client 
can inform NN asynchronously to make a block COMMITTED.

I think it makes sense. Am I understanding you correctly?

Hi Ravi,
Thanks for the new rev. While we are still discussing the final solution, I 
noticed couple of things in your rev3 per my original suggested solution:
 
1. Change 
{code}
4471   * liIf the penultimate/last block is COMMITTED or COMPLETE - 
force the 
4472   * block to be COMPLETE even if it is not minimally replicated/li
{code}
To
{code}
4471   * liIf the penultimate/last block is COMMITTED  - force the 
4472   * block to be COMPLETE if it is minimally replicated/li
{code}

2. you forgot to add {{setBlockCollection(blk.getBlockCollection());}} in 
BlockInfoDesired constructor, thus Null pointer exception will happen. 

Let's not rush into addressing those, but see if we can work out a solution 
toward the direction Kihwal stated.

Thank you all again.


 Lease Recovery doesn't happen some times
 

 Key: HDFS-7342
 URL: https://issues.apache.org/jira/browse/HDFS-7342
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch


 In some cases, LeaseManager tries to recover a lease, but is not able to. 
 HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7429) DomainSocketWatcher.kick stuck


 [ 
https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong updated HDFS-7429:
---
Summary: DomainSocketWatcher.kick stuck  (was: DomainSocketWatcher.doPoll0 
stuck)

 DomainSocketWatcher.kick stuck
 --

 Key: HDFS-7429
 URL: https://issues.apache.org/jira/browse/HDFS-7429
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: zhaoyunjiong
 Attachments: 11241021, 11241023, 11241025


 I found some of our DataNodes will run exceeds the limit of concurrent 
 xciever, the limit is 4K.
 After check the stack, I suspect that DomainSocketWatcher.doPoll0 stuck:
 {quote}
 DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
 #1] daemon prio=10 tid=0x7f55c5576000 nid=0x385d waiting on condition 
 [0x7f558d5d4000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000740df9c90 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
 at 
 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
 at 
 java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
 --
 DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
 #1] daemon prio=10 tid=0x7f55c5575000 nid=0x37b3 runnable 
 [0x7f558d3d2000]
java.lang.Thread.State: RUNNABLE
 at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
 at 
 org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
 at 
 org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
 DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
 #1] daemon prio=10 tid=0x7f55c5574000 nid=0x377a waiting on condition 
 [0x7f558d7d6000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000740df9cb0 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
 at java.lang.Thread.run(Thread.java:745)
  
 Thread-163852 daemon prio=10 tid=0x7f55c811c800 nid=0x6757 runnable

[jira] [Updated] (HDFS-7429) DomainSocketWatcher.kick stuck


 [ 
https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong updated HDFS-7429:
---
Description: 
I found some of our DataNodes will run exceeds the limit of concurrent 
xciever, the limit is 4K.

After check the stack, I suspect that 
org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by 
DomainSocketWatcher.kick stuck:
{quote}
DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
#1] daemon prio=10 tid=0x7f55c5576000 nid=0x385d waiting on condition 
[0x7f558d5d4000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x000740df9c90 (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
--
DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
#1] daemon prio=10 tid=0x7f55c5575000 nid=0x37b3 runnable 
[0x7f558d3d2000]
   java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
at 
org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
at 
org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)

DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
#1] daemon prio=10 tid=0x7f55c5574000 nid=0x377a waiting on condition 
[0x7f558d7d6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x000740df9cb0 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
at java.lang.Thread.run(Thread.java:745)
 

Thread-163852 daemon prio=10 tid=0x7f55c811c800 nid=0x6757 runnable 
[0x7f55aef6e000]
   java.lang.Thread.State: RUNNABLE 
at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.access$800(DomainSocketWatcher.java:52)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:457)
at java.lang.Thread.run(Thread.java:745)
{quote}

  was:
I found some of our

[jira] [Commented] (HDFS-7429) DomainSocketWatcher.kick stuck


[ 
https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224249#comment-14224249
 ] 

zhaoyunjiong commented on HDFS-7429:


The previous description is not right. 
The stuck thread happened at 
org.apache.hadoop.net.unix.DomainSocket.writeArray0 as below shows.
{quote}
$ grep -B2 -A10 DomainSocket.writeArray 1124102*
11241021-DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for 
operation #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable 
[0x7f7db06c5000]
11241021-   java.lang.Thread.State: RUNNABLE
11241021:   at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native 
Method)
11241021-   at 
org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
11241021-   at 
org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
11241021-   at 
org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
11241021-   at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
11241021-   at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
11241021-   at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
11241021-   at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
11241021-   at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
11241021-   at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
11241021-   at java.lang.Thread.run(Thread.java:745)
--
--
11241023-DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for 
operation #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable 
[0x7f7db06c5000]
11241023-   java.lang.Thread.State: RUNNABLE
11241023:   at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native 
Method)
11241023-   at 
org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
11241023-   at 
org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
11241023-   at 
org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
11241023-   at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
11241023-   at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
11241023-   at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
11241023-   at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
11241023-   at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
11241023-   at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
11241023-   at java.lang.Thread.run(Thread.java:745)
--
--
11241025-DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for 
operation #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable 
[0x7f7db06c5000]
11241025-   java.lang.Thread.State: RUNNABLE
11241025:   at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native 
Method)
11241025-   at 
org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
11241025-   at 
org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
11241025-   at 
org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
11241025-   at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
11241025-   at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
11241025-   at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
11241025-   at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
11241025-   at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
11241025-   at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
11241025-   at java.lang.Thread.run(Thread.java:745)
{quote}


 DomainSocketWatcher.kick stuck
 --

 Key: HDFS-7429
 URL: https://issues.apache.org/jira/browse/HDFS-7429
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: zhaoyunjiong
 Attachments: 11241021, 11241023, 11241025


 I found some of our DataNodes will run exceeds the limit of concurrent 
 xciever, the limit is 4K.
 After check the stack, I suspect that 
 org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by

[jira] [Updated] (HDFS-7429) DomainSocketWatcher.kick stuck


 [ 
https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong updated HDFS-7429:
---
Description: 
I found some of our DataNodes will run exceeds the limit of concurrent 
xciever, the limit is 4K.

After check the stack, I suspect that 
org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by 
DomainSocketWatcher.kick stuck:
{quote}
DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
#1] daemon prio=10 tid=0x7f55c5576000 nid=0x385d waiting on condition 
[0x7f558d5d4000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x000740df9c90 (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
--
DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
#1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable 
[0x7f7db06c5000]
   java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
at 
org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
at 
org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
at java.lang.Thread.run(Thread.java:745)

DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
#1] daemon prio=10 tid=0x7f55c5574000 nid=0x377a waiting on condition 
[0x7f558d7d6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x000740df9cb0 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
at java.lang.Thread.run(Thread.java:745)
 

Thread-163852 daemon prio=10 tid=0x7f55c811c800 nid=0x6757 runnable 
[0x7f55aef6e000]
   java.lang.Thread.State: RUNNABLE 
at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.access$800(DomainSocketWatcher.java:52)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:457)
at

[jira] [Commented] (HDFS-7310) Mover can give first priority to local DN if it has target storage type available in local DN

2014-11-25 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224282#comment-14224282
 ] 

Uma Maheswara Rao G commented on HDFS-7310:
---

Thanks a lot Vinay, Let me take a look at it.

 Mover can give first priority to local DN if it has target storage type 
 available in local DN
 -

 Key: HDFS-7310
 URL: https://issues.apache.org/jira/browse/HDFS-7310
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Vinayakumar B
 Attachments: HDFS-7310-001.patch, HDFS-7310-002.patch, 
 HDFS-7310-003.patch


 Currently Mover logic may move blocks to any DN which had target storage 
 type. But if the src DN has target storage type then mover can give highest 
 priority to local DN. If local DN does not contains target storage type, then 
 it can assign to any DN as the current logic does.
   This is a thought, have not go through the code fully yet.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7429) DomainSocketWatcher.kick stuck


 [ 
https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong reassigned HDFS-7429:
--

Assignee: zhaoyunjiong

 DomainSocketWatcher.kick stuck
 --

 Key: HDFS-7429
 URL: https://issues.apache.org/jira/browse/HDFS-7429
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: 11241021, 11241023, 11241025


 I found some of our DataNodes will run exceeds the limit of concurrent 
 xciever, the limit is 4K.
 After check the stack, I suspect that 
 org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by 
 DomainSocketWatcher.kick stuck:
 {quote}
 DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
 #1] daemon prio=10 tid=0x7f55c5576000 nid=0x385d waiting on condition 
 [0x7f558d5d4000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000740df9c90 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
 at 
 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
 at 
 java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
 --
 DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
 #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable 
 [0x7f7db06c5000]
java.lang.Thread.State: RUNNABLE
   at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
   at 
 org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
   at 
 org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
   at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
   at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
   at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
   at java.lang.Thread.run(Thread.java:745)
 DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
 #1] daemon prio=10 tid=0x7f55c5574000 nid=0x377a waiting on condition 
 [0x7f558d7d6000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000740df9cb0 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
 at java.lang.Thread.run(Thread.java:745)
  
 Thread-163852

[jira] [Commented] (HDFS-7429) DomainSocketWatcher.kick stuck


[ 
https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224325#comment-14224325
 ] 

zhaoyunjiong commented on HDFS-7429:


The problem here is in our machine we can only send 299 bytes to domain socket.
When it try to send the 300 byte, it will block, and the 
DomainSocketWatcher.add(DomainSocket sock, Handler handler)  have the lock, so 
watcherThread.run can't get the lock and clear the buffer, it's a live lock.

I'm not sure which configuration controls the bufferSize of 299 for now.
Now I suspect net.core.netdev_budget, which is 300 at our machines.
I'll upload a patch to control the send bytes to prevent live lock later.

By the way, should I move this to HADOOP COMMON project?

 DomainSocketWatcher.kick stuck
 --

 Key: HDFS-7429
 URL: https://issues.apache.org/jira/browse/HDFS-7429
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: zhaoyunjiong
 Attachments: 11241021, 11241023, 11241025


 I found some of our DataNodes will run exceeds the limit of concurrent 
 xciever, the limit is 4K.
 After check the stack, I suspect that 
 org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by 
 DomainSocketWatcher.kick stuck:
 {quote}
 DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
 #1] daemon prio=10 tid=0x7f55c5576000 nid=0x385d waiting on condition 
 [0x7f558d5d4000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000740df9c90 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
 at 
 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
 at 
 java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
 --
 DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
 #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable 
 [0x7f7db06c5000]
java.lang.Thread.State: RUNNABLE
   at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
   at 
 org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
   at 
 org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
   at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
   at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
   at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
   at java.lang.Thread.run(Thread.java:745)
 DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation 
 #1] daemon prio=10 tid=0x7f55c5574000 nid=0x377a waiting on condition 
 [0x7f558d7d6000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000740df9cb0 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306)
 at

[jira] [Commented] (HDFS-6633) Support reading new data in a being written file until the file is closed


[ 
https://issues.apache.org/jira/browse/HDFS-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224333#comment-14224333
 ] 

Hadoop QA commented on HDFS-6633:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683496/HDFS-6633-002.patch
  against trunk revision 61a2510.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8831//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8831//console

This message is automatically generated.

 Support reading new data in a being written file until the file is closed
 -

 Key: HDFS-6633
 URL: https://issues.apache.org/jira/browse/HDFS-6633
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Vinayakumar B
 Attachments: HDFS-6633-001.patch, HDFS-6633-002.patch, 
 h6633_20140707.patch, h6633_20140708.patch


 When a file is being written, the file length keeps increasing.  If the file 
 is opened for read, the reader first gets the file length and then read only 
 up to that length.  The reader will not be able to read the new data written 
 afterward.
 We propose adding a new feature so that readers will be able to read all the 
 data until the writer closes the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-25 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224353#comment-14224353
 ] 

Steve Loughran commented on HDFS-6735:
--

you can fix the findbugs warnng by tweaking 
{{hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml }} and 
including that diff in the patch

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()


[ 
https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224358#comment-14224358
 ] 

Hudson commented on HDFS-7436:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #754 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/754/])
HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. 
(wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java


 Consolidate implementation of concat()
 --

 Key: HDFS-7436
 URL: https://issues.apache.org/jira/browse/HDFS-7436
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7436.000.patch


 The implementation of {{concat()}} scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the implementation in a 
 single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()


[ 
https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224368#comment-14224368
 ] 

Hudson commented on HDFS-7436:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/16/])
HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. 
(wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java


 Consolidate implementation of concat()
 --

 Key: HDFS-7436
 URL: https://issues.apache.org/jira/browse/HDFS-7436
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7436.000.patch


 The implementation of {{concat()}} scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the implementation in a 
 single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer


[ 
https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224372#comment-14224372
 ] 

Hudson commented on HDFS-7412:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/16/])
HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. 
(wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java


 Move RetryCache to NameNodeRpcServer
 

 Key: HDFS-7412
 URL: https://issues.apache.org/jira/browse/HDFS-7412
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch


 The concept of RetryCache belongs to the RPC layer.It would be nice to 
 separate it from the implementation of {{FSNameSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer


[ 
https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224362#comment-14224362
 ] 

Hudson commented on HDFS-7412:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #754 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/754/])
HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. 
(wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Move RetryCache to NameNodeRpcServer
 

 Key: HDFS-7412
 URL: https://issues.apache.org/jira/browse/HDFS-7412
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch


 The concept of RetryCache belongs to the RPC layer.It would be nice to 
 separate it from the implementation of {{FSNameSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases


[ 
https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224361#comment-14224361
 ] 

Hudson commented on HDFS-4882:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #754 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/754/])
HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in 
checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev 
daacbc18d739d030822df0b75205eeb067f89850)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
CHANGES.txt: add HDFS-4882 (cmccabe: rev 
6970dbf3669b2906ea71c97acbc5a0dcdb715283)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Prevent the Namenode's LeaseManager from looping forever in checkLeases
 ---

 Key: HDFS-4882
 URL: https://issues.apache.org/jira/browse/HDFS-4882
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.0.0-alpha, 2.5.1
Reporter: Zesheng Wu
Assignee: Ravi Prakash
Priority: Critical
 Fix For: 2.6.1

 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, 
 HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, 
 HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch


 Scenario:
 1. cluster with 4 DNs
 2. the size of the file to be written is a little more than one block
 3. write the first block to 3 DNs, DN1-DN2-DN3
 4. all the data packets of first block is successfully acked and the client 
 sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
 5. DN2 and DN3 are down
 6. client recovers the pipeline, but no new DN is added to the pipeline 
 because of the current pipeline stage is PIPELINE_CLOSE
 7. client continuously writes the last block, and try to close the file after 
 written all the data
 8. NN finds that the penultimate block doesn't has enough replica(our 
 dfs.namenode.replication.min=2), and the client's close runs into indefinite 
 loop(HDFS-2936), and at the same time, NN makes the last block's state to 
 COMPLETE
 9. shutdown the client
 10. the file's lease exceeds hard limit
 11. LeaseManager realizes that and begin to do lease recovery by call 
 fsnamesystem.internalReleaseLease()
 12. but the last block's state is COMPLETE, and this triggers lease manager's 
 infinite loop and prints massive logs like this:
 {noformat}
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: 
 DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
  limit
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
 /user/h_wuzesheng/test.dat
 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block 
 blk_-7028017402720175688_1202597,
 lastBLockState=COMPLETE
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery 
 for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
 APREDUCE_-1252656407_1, pendingcreates: 1]
 {noformat}
 (the 3rd line log is a debug log added by us)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host


[ 
https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224357#comment-14224357
 ] 

Hudson commented on HDFS-7303:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #754 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/754/])
HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed 
by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java


 NN UI fails to distinguish datanodes on the same host
 -

 Key: HDFS-7303
 URL: https://issues.apache.org/jira/browse/HDFS-7303
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, 
 HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch


 If you start multiple datanodes on different ports on the the same host, only 
 one of them appears in the NN UI’s datanode tab.
 While this is not a common scenario, there are still scenarios where you need 
 to start multiple datanodes on the same host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases


[ 
https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224371#comment-14224371
 ] 

Hudson commented on HDFS-4882:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/16/])
HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in 
checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev 
daacbc18d739d030822df0b75205eeb067f89850)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
CHANGES.txt: add HDFS-4882 (cmccabe: rev 
6970dbf3669b2906ea71c97acbc5a0dcdb715283)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Prevent the Namenode's LeaseManager from looping forever in checkLeases
 ---

 Key: HDFS-4882
 URL: https://issues.apache.org/jira/browse/HDFS-4882
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.0.0-alpha, 2.5.1
Reporter: Zesheng Wu
Assignee: Ravi Prakash
Priority: Critical
 Fix For: 2.6.1

 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, 
 HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, 
 HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch


 Scenario:
 1. cluster with 4 DNs
 2. the size of the file to be written is a little more than one block
 3. write the first block to 3 DNs, DN1-DN2-DN3
 4. all the data packets of first block is successfully acked and the client 
 sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
 5. DN2 and DN3 are down
 6. client recovers the pipeline, but no new DN is added to the pipeline 
 because of the current pipeline stage is PIPELINE_CLOSE
 7. client continuously writes the last block, and try to close the file after 
 written all the data
 8. NN finds that the penultimate block doesn't has enough replica(our 
 dfs.namenode.replication.min=2), and the client's close runs into indefinite 
 loop(HDFS-2936), and at the same time, NN makes the last block's state to 
 COMPLETE
 9. shutdown the client
 10. the file's lease exceeds hard limit
 11. LeaseManager realizes that and begin to do lease recovery by call 
 fsnamesystem.internalReleaseLease()
 12. but the last block's state is COMPLETE, and this triggers lease manager's 
 infinite loop and prints massive logs like this:
 {noformat}
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: 
 DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
  limit
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
 /user/h_wuzesheng/test.dat
 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block 
 blk_-7028017402720175688_1202597,
 lastBLockState=COMPLETE
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery 
 for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
 APREDUCE_-1252656407_1, pendingcreates: 1]
 {noformat}
 (the 3rd line log is a debug log added by us)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature


[ 
https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224359#comment-14224359
 ] 

Hudson commented on HDFS-7419:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #754 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/754/])
HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu 
via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61)
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java
CHANGES.txt: add HDFS-7419 (cmccabe: rev 
380a361cfaafaab42614f5f26fac9668d99f8073)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve error messages for DataNode hot swap drive feature
 --

 Key: HDFS-7419
 URL: https://issues.apache.org/jira/browse/HDFS-7419
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, 
 HDFS-7419.002.patch, HDFS-7419.003.patch


 When DataNode fails to add a volume, it adds one failure message to 
 {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed 
 error messages are not logged in DataNode's log and they are emitted from 
 clients. 
 This JIRA makes {{DataNode}} reports detailed failure in its log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature


[ 
https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224369#comment-14224369
 ] 

Hudson commented on HDFS-7419:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/16/])
HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu 
via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61)
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java
CHANGES.txt: add HDFS-7419 (cmccabe: rev 
380a361cfaafaab42614f5f26fac9668d99f8073)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve error messages for DataNode hot swap drive feature
 --

 Key: HDFS-7419
 URL: https://issues.apache.org/jira/browse/HDFS-7419
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, 
 HDFS-7419.002.patch, HDFS-7419.003.patch


 When DataNode fails to add a volume, it adds one failure message to 
 {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed 
 error messages are not logged in DataNode's log and they are emitted from 
 clients. 
 This JIRA makes {{DataNode}} reports detailed failure in its log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host


[ 
https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224367#comment-14224367
 ] 

Hudson commented on HDFS-7303:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/16/])
HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed 
by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 NN UI fails to distinguish datanodes on the same host
 -

 Key: HDFS-7303
 URL: https://issues.apache.org/jira/browse/HDFS-7303
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, 
 HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch


 If you start multiple datanodes on different ports on the the same host, only 
 one of them appears in the NN UI’s datanode tab.
 While this is not a common scenario, there are still scenarios where you need 
 to start multiple datanodes on the same host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7210) Avoid two separate RPC's namenode.append() and namenode.getFileInfo() for an append call from DFSClient


[ 
https://issues.apache.org/jira/browse/HDFS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224395#comment-14224395
 ] 

Hadoop QA commented on HDFS-7210:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683498/HDFS-7210-005.patch
  against trunk revision 61a2510.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestRenameWhileOpen

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8832//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8832//console

This message is automatically generated.

 Avoid two separate RPC's namenode.append() and namenode.getFileInfo() for an 
 append call from DFSClient
 ---

 Key: HDFS-7210
 URL: https://issues.apache.org/jira/browse/HDFS-7210
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, namenode
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-7210-001.patch, HDFS-7210-002.patch, 
 HDFS-7210-003.patch, HDFS-7210-004.patch, HDFS-7210-005.patch


 Currently DFSClient does 2 RPCs to namenode for an append operation.
 {{append()}} for re-opening the file and getting the last block, 
 {{getFileInfo()}} Another on to get HdfsFileState
 If we can combine result of these 2 calls and make one RPC, then it can 
 reduce load on NameNode.
 For the backward compatibility we need to keep existing {{append()}} call as 
 is



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times


[ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224407#comment-14224407
 ] 

Vinayakumar B commented on HDFS-7342:
-

{quote}But the block has to be COMMITTED to be made COMPLETE. If it's not 
COMMITTED yet (changing to COMMITTED is a request from client and it's 
asynchronous) , even if it has min replication number of replications, it won't 
be changed to COMPLETE. So I think we may still need to take care of changing 
block's state to COMPLETE in FSNamesystem#internalReleaseLease. Right?{quote}
I agree that client request and Datanode's IBR are asynchronous. But both will 
update the block state under writelock.
penultimate block will  be COMMITTED in the {{getAdditionalBlock()}} client's 
request.

Here there are 3 possibilities,
1. All IBRs comes before even block is COMMITTED. At this time, if the block is 
FINALIZED in DN, replica will be accepted.
{code}if (ucBlock.reportedState == ReplicaState.FINALIZED 
!block.findDatanode(storageInfo.getDatanodeDescriptor())) {
  addStoredBlock(block, storageInfo, null, true);
}{code}
2. If client request comes after receiving 2 (=minReplication) IBRs, then 
client request only will make the state to COMPLETED immediately after making 
it COMMITTED in following code of {{BlockManager#commitOrCompleteLastBlock()}}
{code}final boolean b = commitBlock((BlockInfoUnderConstruction)lastBlock, 
commitBlock);
if(countNodes(lastBlock).liveReplicas() = minReplication)
  completeBlock(bc, bc.numBlocks()-1, false);
return b;{code}
  At this time, if the IBRs received are not enough, then block will be just 
COMMITTED.

3. If the IBRs received after client request. i.e. after COMMITTED, then while 
processing the second IBR block will be COMPLETED in below code.
{code}if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED 
numLiveReplicas = minReplication) {
  storedBlock = completeBlock(bc, storedBlock, false);{code}

So I couldnt find the possibility of the Block in COMMITTED state with 
minReplication met.

{quote}{{recoverLeaseInternal()}} and {{internalReleaseLease()}} will need to 
be made to distinguish the on-demand recovery from normal lease expiration. For 
on-demand recovery, we might want it to fail if there is no live replicas, as a 
file lease is normally recovered for subsequent append or copy(read). If there 
is no data, they will fail.{quote}
I understood [~kihwal]'s suggestions as below.
{{recoverLease()}} call from client passes a {{force}} flag to 
{{recoverLeaseInternal()}}. Based on this flag, we can check the block's states 
(excluding last block) and # of replicas and decide to go ahead for recovery or 
not even initiating request to DataNode. 
So we need not worry this case in commitBlockSynchronization. In 
{{commitBlockSynchronization()}} directly complete all blocks and close the 
file.
Am I right [~kihwal] ?

 Lease Recovery doesn't happen some times
 

 Key: HDFS-7342
 URL: https://issues.apache.org/jira/browse/HDFS-7342
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch


 In some cases, LeaseManager tries to recover a lease, but is not able to. 
 HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7210) Avoid two separate RPC's namenode.append() and namenode.getFileInfo() for an append call from DFSClient


[ 
https://issues.apache.org/jira/browse/HDFS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224412#comment-14224412
 ] 

Vinayakumar B commented on HDFS-7210:
-

Failure is unrelated. Seems like it failed due to corrupted hadoop-common.jar 
and failed to load core-default.xml
{noformat}2014-11-25 09:01:00,976 FATAL conf.Configuration 
(Configuration.java:loadResource(2518)) - error parsing conf core-default.xml
java.util.zip.ZipException: invalid block type
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147)
at 
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:105){noformat}

 Avoid two separate RPC's namenode.append() and namenode.getFileInfo() for an 
 append call from DFSClient
 ---

 Key: HDFS-7210
 URL: https://issues.apache.org/jira/browse/HDFS-7210
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, namenode
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-7210-001.patch, HDFS-7210-002.patch, 
 HDFS-7210-003.patch, HDFS-7210-004.patch, HDFS-7210-005.patch


 Currently DFSClient does 2 RPCs to namenode for an append operation.
 {{append()}} for re-opening the file and getting the last block, 
 {{getFileInfo()}} Another on to get HdfsFileState
 If we can combine result of these 2 calls and make one RPC, then it can 
 reduce load on NameNode.
 For the backward compatibility we need to keep existing {{append()}} call as 
 is



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails

Kihwal Lee created HDFS-7443:


 Summary: Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails
 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Priority: Blocker


When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
datanodes were not coming up.  They treid data file layout upgrade for 
BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.

All failures were caused by {{NativeIO.link()}} throwing IOException saying 
{{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
retried when the block pool initialization was retried whenever 
{{BPServiceActor}} was registering with the namenode.  After many retries, 
datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
no {{VERSION}} file in the block pool slice storage directory.  

Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
in the new layout and the subdirs were all newly created ones.  This shouldn't 
have happened because the upgrade-recovery logic in {{Storage}} removes 
{{current}} and renames {{previous.tmp}} to {{current}} before retrying.  All 
successfully upgraded volumes had old state preserved in their {{previous}} 
directory.

In summary there were two observed issues.
- Upgrade failure with {{link()}} failing with {{EEXIST}}
- {{previous.tmp}} contained not the content of original {{current}}, but 
half-upgraded one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7443:
-
Description: 
When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
datanodes were not coming up.  They treid data file layout upgrade for 
BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.

All failures were caused by {{NativeIO.link()}} throwing IOException saying 
{{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
retried when the block pool initialization was retried whenever 
{{BPServiceActor}} was registering with the namenode.  After many retries, 
datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
no {{VERSION}} file in the block pool slice storage directory.  

Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
in the new layout and the subdirs were all newly created ones.  This shouldn't 
have happened because the upgrade-recovery logic in {{Storage}} removes 
{{current}} and renames {{previous.tmp}} to {{current}} before retrying.  All 
successfully upgraded volumes had old state preserved in their {{previous}} 
directory.

In summary there were two observed issues.
- Upgrade failure with {{link()}} failing with {{EEXIST}}
- {{previous.tmp}} contained not the content of original {{current}}, but 
half-upgraded one.

We did not see this in smaller scale test clusters.

  was:
When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
datanodes were not coming up.  They treid data file layout upgrade for 
BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.

All failures were caused by {{NativeIO.link()}} throwing IOException saying 
{{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
retried when the block pool initialization was retried whenever 
{{BPServiceActor}} was registering with the namenode.  After many retries, 
datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
no {{VERSION}} file in the block pool slice storage directory.  

Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
in the new layout and the subdirs were all newly created ones.  This shouldn't 
have happened because the upgrade-recovery logic in {{Storage}} removes 
{{current}} and renames {{previous.tmp}} to {{current}} before retrying.  All 
successfully upgraded volumes had old state preserved in their {{previous}} 
directory.

In summary there were two observed issues.
- Upgrade failure with {{link()}} failing with {{EEXIST}}
- {{previous.tmp}} contained not the content of original {{current}}, but 
half-upgraded one.


 Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails
 

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Priority: Blocker

 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature


[ 
https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224588#comment-14224588
 ] 

Hudson commented on HDFS-7419:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1944 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1944/])
HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu 
via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java
CHANGES.txt: add HDFS-7419 (cmccabe: rev 
380a361cfaafaab42614f5f26fac9668d99f8073)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve error messages for DataNode hot swap drive feature
 --

 Key: HDFS-7419
 URL: https://issues.apache.org/jira/browse/HDFS-7419
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, 
 HDFS-7419.002.patch, HDFS-7419.003.patch


 When DataNode fails to add a volume, it adds one failure message to 
 {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed 
 error messages are not logged in DataNode's log and they are emitted from 
 clients. 
 This JIRA makes {{DataNode}} reports detailed failure in its log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host


[ 
https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224586#comment-14224586
 ] 

Hudson commented on HDFS-7303:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1944 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1944/])
HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed 
by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 NN UI fails to distinguish datanodes on the same host
 -

 Key: HDFS-7303
 URL: https://issues.apache.org/jira/browse/HDFS-7303
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, 
 HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch


 If you start multiple datanodes on different ports on the the same host, only 
 one of them appears in the NN UI’s datanode tab.
 While this is not a common scenario, there are still scenarios where you need 
 to start multiple datanodes on the same host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases


[ 
https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224590#comment-14224590
 ] 

Hudson commented on HDFS-4882:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1944 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1944/])
HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in 
checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev 
daacbc18d739d030822df0b75205eeb067f89850)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
CHANGES.txt: add HDFS-4882 (cmccabe: rev 
6970dbf3669b2906ea71c97acbc5a0dcdb715283)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Prevent the Namenode's LeaseManager from looping forever in checkLeases
 ---

 Key: HDFS-4882
 URL: https://issues.apache.org/jira/browse/HDFS-4882
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.0.0-alpha, 2.5.1
Reporter: Zesheng Wu
Assignee: Ravi Prakash
Priority: Critical
 Fix For: 2.6.1

 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, 
 HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, 
 HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch


 Scenario:
 1. cluster with 4 DNs
 2. the size of the file to be written is a little more than one block
 3. write the first block to 3 DNs, DN1-DN2-DN3
 4. all the data packets of first block is successfully acked and the client 
 sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
 5. DN2 and DN3 are down
 6. client recovers the pipeline, but no new DN is added to the pipeline 
 because of the current pipeline stage is PIPELINE_CLOSE
 7. client continuously writes the last block, and try to close the file after 
 written all the data
 8. NN finds that the penultimate block doesn't has enough replica(our 
 dfs.namenode.replication.min=2), and the client's close runs into indefinite 
 loop(HDFS-2936), and at the same time, NN makes the last block's state to 
 COMPLETE
 9. shutdown the client
 10. the file's lease exceeds hard limit
 11. LeaseManager realizes that and begin to do lease recovery by call 
 fsnamesystem.internalReleaseLease()
 12. but the last block's state is COMPLETE, and this triggers lease manager's 
 infinite loop and prints massive logs like this:
 {noformat}
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: 
 DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
  limit
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
 /user/h_wuzesheng/test.dat
 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block 
 blk_-7028017402720175688_1202597,
 lastBLockState=COMPLETE
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery 
 for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
 APREDUCE_-1252656407_1, pendingcreates: 1]
 {noformat}
 (the 3rd line log is a debug log added by us)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()


[ 
https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224587#comment-14224587
 ] 

Hudson commented on HDFS-7436:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1944 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1944/])
HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. 
(wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java


 Consolidate implementation of concat()
 --

 Key: HDFS-7436
 URL: https://issues.apache.org/jira/browse/HDFS-7436
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7436.000.patch


 The implementation of {{concat()}} scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the implementation in a 
 single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer


[ 
https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224591#comment-14224591
 ] 

Hudson commented on HDFS-7412:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1944 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1944/])
HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. 
(wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java


 Move RetryCache to NameNodeRpcServer
 

 Key: HDFS-7412
 URL: https://issues.apache.org/jira/browse/HDFS-7412
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch


 The concept of RetryCache belongs to the RPC layer.It would be nice to 
 separate it from the implementation of {{FSNameSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()


[ 
https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224596#comment-14224596
 ] 

Hudson commented on HDFS-7436:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/16/])
HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. 
(wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java


 Consolidate implementation of concat()
 --

 Key: HDFS-7436
 URL: https://issues.apache.org/jira/browse/HDFS-7436
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7436.000.patch


 The implementation of {{concat()}} scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the implementation in a 
 single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer


[ 
https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224600#comment-14224600
 ] 

Hudson commented on HDFS-7412:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/16/])
HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. 
(wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java


 Move RetryCache to NameNodeRpcServer
 

 Key: HDFS-7412
 URL: https://issues.apache.org/jira/browse/HDFS-7412
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch


 The concept of RetryCache belongs to the RPC layer.It would be nice to 
 separate it from the implementation of {{FSNameSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases


[ 
https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224599#comment-14224599
 ] 

Hudson commented on HDFS-4882:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/16/])
HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in 
checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev 
daacbc18d739d030822df0b75205eeb067f89850)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java
CHANGES.txt: add HDFS-4882 (cmccabe: rev 
6970dbf3669b2906ea71c97acbc5a0dcdb715283)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Prevent the Namenode's LeaseManager from looping forever in checkLeases
 ---

 Key: HDFS-4882
 URL: https://issues.apache.org/jira/browse/HDFS-4882
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.0.0-alpha, 2.5.1
Reporter: Zesheng Wu
Assignee: Ravi Prakash
Priority: Critical
 Fix For: 2.6.1

 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, 
 HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, 
 HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch


 Scenario:
 1. cluster with 4 DNs
 2. the size of the file to be written is a little more than one block
 3. write the first block to 3 DNs, DN1-DN2-DN3
 4. all the data packets of first block is successfully acked and the client 
 sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
 5. DN2 and DN3 are down
 6. client recovers the pipeline, but no new DN is added to the pipeline 
 because of the current pipeline stage is PIPELINE_CLOSE
 7. client continuously writes the last block, and try to close the file after 
 written all the data
 8. NN finds that the penultimate block doesn't has enough replica(our 
 dfs.namenode.replication.min=2), and the client's close runs into indefinite 
 loop(HDFS-2936), and at the same time, NN makes the last block's state to 
 COMPLETE
 9. shutdown the client
 10. the file's lease exceeds hard limit
 11. LeaseManager realizes that and begin to do lease recovery by call 
 fsnamesystem.internalReleaseLease()
 12. but the last block's state is COMPLETE, and this triggers lease manager's 
 infinite loop and prints massive logs like this:
 {noformat}
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: 
 DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
  limit
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
 /user/h_wuzesheng/test.dat
 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block 
 blk_-7028017402720175688_1202597,
 lastBLockState=COMPLETE
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery 
 for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
 APREDUCE_-1252656407_1, pendingcreates: 1]
 {noformat}
 (the 3rd line log is a debug log added by us)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature


[ 
https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224597#comment-14224597
 ] 

Hudson commented on HDFS-7419:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/16/])
HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu 
via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java
CHANGES.txt: add HDFS-7419 (cmccabe: rev 
380a361cfaafaab42614f5f26fac9668d99f8073)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve error messages for DataNode hot swap drive feature
 --

 Key: HDFS-7419
 URL: https://issues.apache.org/jira/browse/HDFS-7419
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, 
 HDFS-7419.002.patch, HDFS-7419.003.patch


 When DataNode fails to add a volume, it adds one failure message to 
 {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed 
 error messages are not logged in DataNode's log and they are emitted from 
 clients. 
 This JIRA makes {{DataNode}} reports detailed failure in its log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host


[ 
https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224595#comment-14224595
 ] 

Hudson commented on HDFS-7303:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/16/])
HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed 
by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java


 NN UI fails to distinguish datanodes on the same host
 -

 Key: HDFS-7303
 URL: https://issues.apache.org/jira/browse/HDFS-7303
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, 
 HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch


 If you start multiple datanodes on different ports on the the same host, only 
 one of them appears in the NN UI’s datanode tab.
 While this is not a common scenario, there are still scenarios where you need 
 to start multiple datanodes on the same host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails


[ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224617#comment-14224617
 ] 

Kihwal Lee commented on HDFS-7443:
--

This is the first error seen.

{noformat}
ERROR datanode.DataNode: Initialization failed for Block pool registering 
(Datanode Uuid unassigned) service to some.host:8020 EEXIST: File exists
{noformat}

This was after successful upgrade of several volumes. Since the hard link 
summary was not printed and it was multiple seconds after starting upgrade of 
this volume (did not fail right away), the error must have come from 
{{DataStorage.linkBlocks()}} when it was checking the result with 
{{Futures.get()}}.

Then it was retried and failed the same way.

{noformat}
INFO common.Storage: Analyzing storage directories for bpid BP-
INFO common.Storage: Recovering storage directory 
/a/b/hadoop/var/hdfs/data/current/BP- from previous upgrade
INFO common.Storage: Upgrading block pool storage directory 
/a/b/hadoop/var/hdfs/data/current/BP-
   old LV = -55; old CTime = 12345678.
   new LV = -56; new CTime = 45678989
ERROR datanode.DataNode: Initialization failed for Block pool registering 
(Datanode Uuid unassigned) service to some.host:8020 EEXIST: File exists
{noformat}

This indicates {{Storage.analyzeStorage()}} correctly returning 
{{RECOVER_UPGRADE}} and the partial upgrade is undone before retrying. This 
repeated hundreds of times before termination of datanode, which logged the 
stack trace.

{noformat}
FATAL datanode.DataNode: Initialization failed for Block pool registering 
(Datanode Uuid unassigned) service to some.host:8020. Exiting. 
java.io.IOException: EEXIST: File exists
at sun.reflect.GeneratedConstructorAccessor18.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
com.google.common.util.concurrent.Futures.newFromConstructor(Futures.java:1258)
at 
com.google.common.util.concurrent.Futures.newWithCause(Futures.java:1218)
at 
com.google.common.util.concurrent.Futures.wrapAndThrowExceptionOrError(Futures.java:1131)
at com.google.common.util.concurrent.Futures.get(Futures.java:1048)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:999)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.linkAllBlocks(BlockPoolSliceStorage.java:594)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doUpgrade(BlockPoolSliceStorage.java:403)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:337)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:197)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:438)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1312)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1277)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:722)
Caused by: EEXIST: File exists
at org.apache.hadoop.io.nativeio.NativeIO.link0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO.link(NativeIO.java:836)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage$2.call(DataStorage.java:991)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage$2.call(DataStorage.java:984)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
{noformat}

At this point, {{previous.tmp}} contained the new directory structure with 
blocks and meta files placed in the ID-based directory. Some orphaned meta and 
block files were observed.  Restarting datanode does not reproduce the issue, 
but I suspect data loss based on the missing files and the number of missing 
blocks.

 Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails
 

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0

[jira] [Comment Edited] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails


[ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224617#comment-14224617
 ] 

Kihwal Lee edited comment on HDFS-7443 at 11/25/14 2:44 PM:


This is the first error seen.

{noformat}
ERROR datanode.DataNode: Initialization failed for Block pool registering 
(Datanode Uuid unassigned)
 service to some.host:8020 EEXIST: File exists
{noformat}

This was after successful upgrade of several volumes. Since the hard link 
summary was not printed and it was multiple seconds after starting upgrade of 
this volume (did not fail right away), the error must have come from 
{{DataStorage.linkBlocks()}} when it was checking the result with 
{{Futures.get()}}.

Then it was retried and failed the same way.

{noformat}
INFO common.Storage: Analyzing storage directories for bpid BP-
INFO common.Storage: Recovering storage directory 
/a/b/hadoop/var/hdfs/data/current/BP-
 from previous upgrade
INFO common.Storage: Upgrading block pool storage directory 
/a/b/hadoop/var/hdfs/data/current/BP-
   old LV = -55; old CTime = 12345678.
   new LV = -56; new CTime = 45678989
ERROR datanode.DataNode: Initialization failed for Block pool registering 
(Datanode Uuid unassigned)
 service to some.host:8020 EEXIST: File exists
{noformat}

This indicates {{Storage.analyzeStorage()}} correctly returning 
{{RECOVER_UPGRADE}} and the partial upgrade is undone before retrying. This 
repeated hundreds of times before termination of datanode, which logged the 
stack trace.

{noformat}
FATAL datanode.DataNode: Initialization failed for Block pool registering 
(Datanode Uuid unassigned)
 service to some.host:8020. Exiting. 
java.io.IOException: EEXIST: File exists
at sun.reflect.GeneratedConstructorAccessor18.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
com.google.common.util.concurrent.Futures.newFromConstructor(Futures.java:1258)
at 
com.google.common.util.concurrent.Futures.newWithCause(Futures.java:1218)
at 
com.google.common.util.concurrent.Futures.wrapAndThrowExceptionOrError(Futures.java:1131)
at com.google.common.util.concurrent.Futures.get(Futures.java:1048)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:999)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.linkAllBlocks(BlockPoolSliceStorage.java:594)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doUpgrade(BlockPoolSliceStorage.java:403)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:337)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:197)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:438)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1312)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1277)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
at java.lang.Thread.run(Thread.java:722)
Caused by: EEXIST: File exists
at org.apache.hadoop.io.nativeio.NativeIO.link0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO.link(NativeIO.java:836)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage$2.call(DataStorage.java:991)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage$2.call(DataStorage.java:984)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
{noformat}

At this point, {{previous.tmp}} contained the new directory structure with 
blocks and meta files placed in the ID-based directory. Some orphaned meta and 
block files were observed.  Restarting datanode does not reproduce the issue, 
but I suspect data loss based on the missing files and the number of missing 
blocks.


was (Author: kihwal):
This is the first error seen.

{noformat}
ERROR datanode.DataNode: Initialization failed for Block pool registering 
(Datanode Uuid unassigned) service to some.host:8020 EEXIST: File exists
{noformat}

This was after successful upgrade of

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-25 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224638#comment-14224638
 ] 

Daryn Sharp commented on HDFS-7435:
---

We're definitely on the same page.  I meant questionably in the sense that the 
garbage generation rate is so high that CMS (you mention fragmentation) will be 
slow, bloated with free lists, and not keeping up.  Now granted, a 20M 
allocation is likely to be prematurely tenured - along with the IPC protobuf 
containing the large report.  One solution/workaround to both is reducing the 
size of the individual BR via multiple storages.  The storages don't have to be 
individual drives but just subdirs.

Segmenting shouldn't be an outright replacement.  The decode will emit a 
long[][] which requires updating {{StorageBlockReport}} and 
{{BlockListAsLongs}}.  Similar changes to the datanode, although not required 
for the namenode changes, will be more complex.  I already considered 
segmenting.  :)  I found the complexity vs time vs benefit to be much lower 
than improvements to other subsystems.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7433) DatanodeMap lookups DatanodeID hashCodes are inefficient

2014-11-25 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224667#comment-14224667
 ] 

Daryn Sharp commented on HDFS-7433:
---

My bad mentioning {{datanodeMap}} - juggling too many changes.  {{DatanodeIDs}} 
are added to collections in many other places, and equality checks occur often. 
 My more general point is mutable hashCodes are a hidden landmine which is why 
I filed another jira.  Dynamic computation of the xfer addr (and by extension 
the hash) is inefficient and generates a lot of garbage.

I'm checking out the odd test failures.  They don't appear related, at least  
the xml parsing and class def not founds.

 DatanodeMap lookups  DatanodeID hashCodes are inefficient
 --

 Key: HDFS-7433
 URL: https://issues.apache.org/jira/browse/HDFS-7433
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7433.patch


 The datanode map is currently a {{TreeMap}}.  For many thousands of 
 datanodes, tree lookups are ~10X more expensive than a {{HashMap}}.  
 Insertions and removals are up to 100X more expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host


[ 
https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224687#comment-14224687
 ] 

Hudson commented on HDFS-7303:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/16/])
HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed 
by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NN UI fails to distinguish datanodes on the same host
 -

 Key: HDFS-7303
 URL: https://issues.apache.org/jira/browse/HDFS-7303
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, 
 HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch


 If you start multiple datanodes on different ports on the the same host, only 
 one of them appears in the NN UI’s datanode tab.
 While this is not a common scenario, there are still scenarios where you need 
 to start multiple datanodes on the same host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature


[ 
https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224689#comment-14224689
 ] 

Hudson commented on HDFS-7419:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/16/])
HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu 
via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java
CHANGES.txt: add HDFS-7419 (cmccabe: rev 
380a361cfaafaab42614f5f26fac9668d99f8073)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve error messages for DataNode hot swap drive feature
 --

 Key: HDFS-7419
 URL: https://issues.apache.org/jira/browse/HDFS-7419
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, 
 HDFS-7419.002.patch, HDFS-7419.003.patch


 When DataNode fails to add a volume, it adds one failure message to 
 {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed 
 error messages are not logged in DataNode's log and they are emitted from 
 clients. 
 This JIRA makes {{DataNode}} reports detailed failure in its log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer


[ 
https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224692#comment-14224692
 ] 

Hudson commented on HDFS-7412:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/16/])
HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. 
(wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Move RetryCache to NameNodeRpcServer
 

 Key: HDFS-7412
 URL: https://issues.apache.org/jira/browse/HDFS-7412
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch


 The concept of RetryCache belongs to the RPC layer.It would be nice to 
 separate it from the implementation of {{FSNameSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()


[ 
https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224688#comment-14224688
 ] 

Hudson commented on HDFS-7436:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/16/])
HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. 
(wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java


 Consolidate implementation of concat()
 --

 Key: HDFS-7436
 URL: https://issues.apache.org/jira/browse/HDFS-7436
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7436.000.patch


 The implementation of {{concat()}} scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the implementation in a 
 single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases


[ 
https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224691#comment-14224691
 ] 

Hudson commented on HDFS-4882:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #16 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/16/])
HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in 
checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev 
daacbc18d739d030822df0b75205eeb067f89850)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
CHANGES.txt: add HDFS-4882 (cmccabe: rev 
6970dbf3669b2906ea71c97acbc5a0dcdb715283)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Prevent the Namenode's LeaseManager from looping forever in checkLeases
 ---

 Key: HDFS-4882
 URL: https://issues.apache.org/jira/browse/HDFS-4882
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.0.0-alpha, 2.5.1
Reporter: Zesheng Wu
Assignee: Ravi Prakash
Priority: Critical
 Fix For: 2.6.1

 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, 
 HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, 
 HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch


 Scenario:
 1. cluster with 4 DNs
 2. the size of the file to be written is a little more than one block
 3. write the first block to 3 DNs, DN1-DN2-DN3
 4. all the data packets of first block is successfully acked and the client 
 sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
 5. DN2 and DN3 are down
 6. client recovers the pipeline, but no new DN is added to the pipeline 
 because of the current pipeline stage is PIPELINE_CLOSE
 7. client continuously writes the last block, and try to close the file after 
 written all the data
 8. NN finds that the penultimate block doesn't has enough replica(our 
 dfs.namenode.replication.min=2), and the client's close runs into indefinite 
 loop(HDFS-2936), and at the same time, NN makes the last block's state to 
 COMPLETE
 9. shutdown the client
 10. the file's lease exceeds hard limit
 11. LeaseManager realizes that and begin to do lease recovery by call 
 fsnamesystem.internalReleaseLease()
 12. but the last block's state is COMPLETE, and this triggers lease manager's 
 infinite loop and prints massive logs like this:
 {noformat}
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: 
 DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
  limit
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
 /user/h_wuzesheng/test.dat
 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block 
 blk_-7028017402720175688_1202597,
 lastBLockState=COMPLETE
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery 
 for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
 APREDUCE_-1252656407_1, pendingcreates: 1]
 {noformat}
 (the 3rd line log is a debug log added by us)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host


[ 
https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224700#comment-14224700
 ] 

Hudson commented on HDFS-7303:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1968 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1968/])
HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed 
by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java


 NN UI fails to distinguish datanodes on the same host
 -

 Key: HDFS-7303
 URL: https://issues.apache.org/jira/browse/HDFS-7303
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, 
 HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch


 If you start multiple datanodes on different ports on the the same host, only 
 one of them appears in the NN UI’s datanode tab.
 While this is not a common scenario, there are still scenarios where you need 
 to start multiple datanodes on the same host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature


[ 
https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224702#comment-14224702
 ] 

Hudson commented on HDFS-7419:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1968 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1968/])
HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu 
via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61)
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
CHANGES.txt: add HDFS-7419 (cmccabe: rev 
380a361cfaafaab42614f5f26fac9668d99f8073)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Improve error messages for DataNode hot swap drive feature
 --

 Key: HDFS-7419
 URL: https://issues.apache.org/jira/browse/HDFS-7419
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, 
 HDFS-7419.002.patch, HDFS-7419.003.patch


 When DataNode fails to add a volume, it adds one failure message to 
 {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed 
 error messages are not logged in DataNode's log and they are emitted from 
 clients. 
 This JIRA makes {{DataNode}} reports detailed failure in its log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases


[ 
https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224704#comment-14224704
 ] 

Hudson commented on HDFS-4882:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1968 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1968/])
HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in 
checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev 
daacbc18d739d030822df0b75205eeb067f89850)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
CHANGES.txt: add HDFS-4882 (cmccabe: rev 
6970dbf3669b2906ea71c97acbc5a0dcdb715283)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Prevent the Namenode's LeaseManager from looping forever in checkLeases
 ---

 Key: HDFS-4882
 URL: https://issues.apache.org/jira/browse/HDFS-4882
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.0.0-alpha, 2.5.1
Reporter: Zesheng Wu
Assignee: Ravi Prakash
Priority: Critical
 Fix For: 2.6.1

 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, 
 HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, 
 HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch


 Scenario:
 1. cluster with 4 DNs
 2. the size of the file to be written is a little more than one block
 3. write the first block to 3 DNs, DN1-DN2-DN3
 4. all the data packets of first block is successfully acked and the client 
 sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
 5. DN2 and DN3 are down
 6. client recovers the pipeline, but no new DN is added to the pipeline 
 because of the current pipeline stage is PIPELINE_CLOSE
 7. client continuously writes the last block, and try to close the file after 
 written all the data
 8. NN finds that the penultimate block doesn't has enough replica(our 
 dfs.namenode.replication.min=2), and the client's close runs into indefinite 
 loop(HDFS-2936), and at the same time, NN makes the last block's state to 
 COMPLETE
 9. shutdown the client
 10. the file's lease exceeds hard limit
 11. LeaseManager realizes that and begin to do lease recovery by call 
 fsnamesystem.internalReleaseLease()
 12. but the last block's state is COMPLETE, and this triggers lease manager's 
 infinite loop and prints massive logs like this:
 {noformat}
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: 
 DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
  limit
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
 /user/h_wuzesheng/test.dat
 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block 
 blk_-7028017402720175688_1202597,
 lastBLockState=COMPLETE
 2013-06-05,17:42:25,695 INFO 
 org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery 
 for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
 APREDUCE_-1252656407_1, pendingcreates: 1]
 {noformat}
 (the 3rd line log is a debug log added by us)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer


[ 
https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224705#comment-14224705
 ] 

Hudson commented on HDFS-7412:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1968 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1968/])
HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. 
(wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java


 Move RetryCache to NameNodeRpcServer
 

 Key: HDFS-7412
 URL: https://issues.apache.org/jira/browse/HDFS-7412
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch


 The concept of RetryCache belongs to the RPC layer.It would be nice to 
 separate it from the implementation of {{FSNameSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()


[ 
https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224701#comment-14224701
 ] 

Hudson commented on HDFS-7436:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1968 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1968/])
HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. 
(wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Consolidate implementation of concat()
 --

 Key: HDFS-7436
 URL: https://issues.apache.org/jira/browse/HDFS-7436
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7436.000.patch


 The implementation of {{concat()}} scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the implementation in a 
 single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

2014-11-25 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224827#comment-14224827
 ] 

Yongjun Zhang commented on HDFS-7342:
-

Thanks a lot for your detailed explanation [~vinayrpet].
{quote}
2. If client request comes after receiving 2 (=minReplication) IBRs, ...
{quote}
It seems that lease recovery could happen before the client request comes here, 
when this happens, the block state would be COMMITTED with minReplication met, 
right?



 Lease Recovery doesn't happen some times
 

 Key: HDFS-7342
 URL: https://issues.apache.org/jira/browse/HDFS-7342
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch


 In some cases, LeaseManager tries to recover a lease, but is not able to. 
 HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab

2014-11-25 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224876#comment-14224876
 ] 

Benoy Antony commented on HDFS-6407:


[~wheat9], Could you please review this enhancement ?

 new namenode UI, lost ability to sort columns in datanode tab
 -

 Key: HDFS-6407
 URL: https://issues.apache.org/jira/browse/HDFS-6407
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Nathan Roberts
Assignee: Benoy Antony
Priority: Minor
 Attachments: HDFS-6407.patch, browse_directory.png, datanodes.png, 
 snapshots.png


 old ui supported clicking on column header to sort on that column. The new ui 
 seems to have dropped this very useful feature.
 There are a few tables in the Namenode UI to display  datanodes information, 
 directory listings and snapshots.
 When there are many items in the tables, it is useful to have ability to sort 
 on the different columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7310) Mover can give first priority to local DN if it has target storage type available in local DN

2014-11-25 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224891#comment-14224891
 ] 

Uma Maheswara Rao G commented on HDFS-7310:
---

Hi Vinay, Patch looks good to me. I have one question before my +1 on this.

Seems like now when you plan to move block across storage in a DN and if it is 
not on transient storage, you are not doing checksum calculation. my point is, 
we may  need to compute checksum for having data integrity when moving across 
storage? For some reason, that is not necessary, please let me know and also in 
such case we need to change below log though

{code}
if (LOG.isDebugEnabled()) {
  LOG.debug(Copied  + srcMeta +  to  + dstMeta +
   and calculated checksum);
  LOG.debug(Copied  + srcFile +  to  + dstFile);
}
{code}


 Mover can give first priority to local DN if it has target storage type 
 available in local DN
 -

 Key: HDFS-7310
 URL: https://issues.apache.org/jira/browse/HDFS-7310
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Vinayakumar B
 Attachments: HDFS-7310-001.patch, HDFS-7310-002.patch, 
 HDFS-7310-003.patch


 Currently Mover logic may move blocks to any DN which had target storage 
 type. But if the src DN has target storage type then mover can give highest 
 priority to local DN. If local DN does not contains target storage type, then 
 it can assign to any DN as the current logic does.
   This is a thought, have not go through the code fully yet.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host

2014-11-25 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224893#comment-14224893
 ] 

Benoy Antony commented on HDFS-7303:


Thank you [~wheat9] for reviewing and committing.

 NN UI fails to distinguish datanodes on the same host
 -

 Key: HDFS-7303
 URL: https://issues.apache.org/jira/browse/HDFS-7303
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Benoy Antony
Assignee: Benoy Antony
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, 
 HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch


 If you start multiple datanodes on different ports on the the same host, only 
 one of them appears in the NN UI’s datanode tab.
 While this is not a common scenario, there are still scenarios where you need 
 to start multiple datanodes on the same host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class


[ 
https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225094#comment-14225094
 ] 

Jing Zhao commented on HDFS-7440:
-

The patch looks good to me. One small suggestion is that I think the changes 
related to audit log can be separated into another jira since it changes the 
current auditlog semantic. +1 after addressing the comments.

 Consolidate snapshot related operations in a single class
 -

 Key: HDFS-7440
 URL: https://issues.apache.org/jira/browse/HDFS-7440
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7440.000.patch


 Currently the snapshot-related code scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the logic in a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7440) Consolidate snapshot related operations in a single class


 [ 
https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7440:
-
Attachment: HDFS-7440.001.patch

 Consolidate snapshot related operations in a single class
 -

 Key: HDFS-7440
 URL: https://issues.apache.org/jira/browse/HDFS-7440
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch


 Currently the snapshot-related code scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the logic in a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-25 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v6.txt

Thanks [~ste...@apache.org].

New patch with findbugs tweak.

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kihwal Lee updated HDFS-7097:
-
Attachment: HDFS-7097.ultimate.trunk.patch

Here is the updated patch.

Allow block reports to be processed during checkpointing on standby name node
-

Key: HDFS-7097
URL: https://issues.apache.org/jira/browse/HDFS-7097
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch,
HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch

On a reasonably busy HDFS cluster, there are stream of creates, causing data
nodes to generate incremental block reports. When a standby name node is
checkpointing, RPC handler threads trying to process a full or incremental
block report is blocked on the name system's {{fsLock}}, because the
checkpointer acquires the read lock on it. This can create a serious problem
if the size of name space is big and checkpointing takes a long time.
All available RPC handlers can be tied up very quickly. If you have 100
handlers, it only takes 34 file creates. If a separate service RPC port is
not used, HA transition will have to wait in the call queue for minutes. Even
if a separate service RPC port is configured, hearbeats from datanodes will
be blocked. A standby NN with a big name space can lose all data nodes after
checkpointing. The rpc calls will also be retransmitted by data nodes many
times, filling up the call queue and potentially causing listen queue
overflow.
Since block reports are not modifying any state that is being saved to
fsimage, I propose letting them through during checkpointing.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class


[ 
https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225346#comment-14225346
 ] 

Hadoop QA commented on HDFS-7440:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683635/HDFS-7440.001.patch
  against trunk revision 61a2510.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8833//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8833//console

This message is automatically generated.

 Consolidate snapshot related operations in a single class
 -

 Key: HDFS-7440
 URL: https://issues.apache.org/jira/browse/HDFS-7440
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch


 Currently the snapshot-related code scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the logic in a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection

Haohui Mai created HDFS-7444:


 Summary: convertToBlockUnderConstruction should preserve 
BlockCollection
 Key: HDFS-7444
 URL: https://issues.apache.org/jira/browse/HDFS-7444
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai


{{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object 
to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller 
should preserve the {{BlockCollection}} field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class


[ 
https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225363#comment-14225363
 ] 

Jing Zhao commented on HDFS-7440:
-

With the change now FSNamesystem#removeBlocks is moved into the fsn write lock 
(in {{deleteSnapshot}}). I think we'd better still keep it out of the write 
lock.

 Consolidate snapshot related operations in a single class
 -

 Key: HDFS-7440
 URL: https://issues.apache.org/jira/browse/HDFS-7440
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch


 Currently the snapshot-related code scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the logic in a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection


 [ 
https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7444:
-
Attachment: HDFS-7444.000.patch

 convertToBlockUnderConstruction should preserve BlockCollection
 ---

 Key: HDFS-7444
 URL: https://issues.apache.org/jira/browse/HDFS-7444
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7444.000.patch


 {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object 
 to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller 
 should preserve the {{BlockCollection}} field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection


 [ 
https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7444:
-
Status: Patch Available  (was: Open)

 convertToBlockUnderConstruction should preserve BlockCollection
 ---

 Key: HDFS-7444
 URL: https://issues.apache.org/jira/browse/HDFS-7444
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7444.000.patch


 {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object 
 to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller 
 should preserve the {{BlockCollection}} field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7440) Consolidate snapshot related operations in a single class


 [ 
https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7440:
-
Attachment: HDFS-7440.002.patch

 Consolidate snapshot related operations in a single class
 -

 Key: HDFS-7440
 URL: https://issues.apache.org/jira/browse/HDFS-7440
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch, 
 HDFS-7440.002.patch


 Currently the snapshot-related code scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the logic in a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-25 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225393#comment-14225393
]

Andrew Wang commented on HDFS-7097:
---

+1 LGTM as well, thanks Kihwal for the patch, ATM, Vinay, and Ming for
reviewing. I'll commit this shortly.

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7424) Add web UI for NFS gateway

2014-11-25 Thread Brandon Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-7424:
-
Attachment: HDFS-7424.002.patch

Rebased the patch.

 Add web UI for NFS gateway
 --

 Key: HDFS-7424
 URL: https://issues.apache.org/jira/browse/HDFS-7424
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-7424.001.patch, HDFS-7424.002.patch


 This JIRA is to track the effort to add web UI for NFS gateway to show some 
 metrics and configuration related information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225424#comment-14225424
 ] 

Hudson commented on HDFS-7097:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6605 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6605/])
HDFS-7097. Allow block reports to be processed during checkpointing on standby 
name node. (kihwal via wang) (wang: rev 
f43a20c529ac3f104add95b222de6580757b3763)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java


 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection


 [ 
https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7444:
-
Attachment: HDFS-7444.001.patch

 convertToBlockUnderConstruction should preserve BlockCollection
 ---

 Key: HDFS-7444
 URL: https://issues.apache.org/jira/browse/HDFS-7444
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7444.000.patch, HDFS-7444.001.patch


 {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object 
 to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller 
 should preserve the {{BlockCollection}} field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection


[ 
https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225428#comment-14225428
 ] 

Hadoop QA commented on HDFS-7444:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683689/HDFS-7444.000.patch
  against trunk revision 56f3eec.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.
{color:red}-1 core tests{color}.  Failed to build the native portion of 
hadoop-common prior to running the unit tests in   
hadoop-hdfs-project/hadoop-hdfs

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8836//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8836//console

This message is automatically generated.

 convertToBlockUnderConstruction should preserve BlockCollection
 ---

 Key: HDFS-7444
 URL: https://issues.apache.org/jira/browse/HDFS-7444
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7444.000.patch, HDFS-7444.001.patch


 {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object 
 to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller 
 should preserve the {{BlockCollection}} field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7424) Add web UI for NFS gateway


[ 
https://issues.apache.org/jira/browse/HDFS-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225474#comment-14225474
 ] 

Hadoop QA commented on HDFS-7424:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683693/HDFS-7424.002.patch
  against trunk revision f43a20c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.
{color:red}-1 core tests{color}.  Failed to build the native portion of 
hadoop-common prior to running the unit tests in   
hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8838//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8838//console

This message is automatically generated.

 Add web UI for NFS gateway
 --

 Key: HDFS-7424
 URL: https://issues.apache.org/jira/browse/HDFS-7424
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-7424.001.patch, HDFS-7424.002.patch


 This JIRA is to track the effort to add web UI for NFS gateway to show some 
 metrics and configuration related information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225473#comment-14225473
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683657/HDFS-6735-v6.txt
  against trunk revision 78f7cdb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8834//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8834//console

This message is automatically generated.

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225476#comment-14225476
 ] 

Hadoop QA commented on HDFS-7097:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12683658/HDFS-7097.ultimate.trunk.patch
  against trunk revision 78f7cdb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.TestEnhancedByteBufferAccess

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8835//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8835//console

This message is automatically generated.

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7424) Add web UI for NFS gateway


[ 
https://issues.apache.org/jira/browse/HDFS-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225523#comment-14225523
 ] 

Haohui Mai commented on HDFS-7424:
--

Good work. Some comments:

{code}
+/**
+ * Encapsulates the HTTP server started by the NFS3 gateway.
+ */
+@InterfaceAudience.Private
+public class Nfs3HttpServer {
{code}

You can simply mark the class as package-local.

{code}
+  void start() throws IOException {
+HttpServer2.Builder builder = new HttpServer2.Builder().setName(nfs3)
+.setConf(conf).setACL(new AccessControlList(conf.get(DFS_ADMIN,  )));
{code}

Please see {{DFSUtil.httpServerTemplateForNNAndJN}}.

{code}
+  public int getSecurePort() {
+return this.infoSecurePort;
+  }
+
{code}

This is unused.

{code}
+  URL url = new URL(scheme + :// + NetUtils.getHostPortString(addr)
+  + /jmx);
+  URLConnection conn = connectionFactory.openConnection(url);
+  conn.connect();
+
+  InputStream is = conn.getInputStream();
+  InputStreamReader isr = new InputStreamReader(is);
+
+  int numCharsRead;
+  char[] charArray = new char[1024];
+  StringBuffer sb = new StringBuffer();
+  while ((numCharsRead = isr.read(charArray))  0) {
+sb.append(charArray, 0, numCharsRead);
+  }
+  result = sb.toString();
+
+} catch (Exception e) {
+  e.printStackTrace();
+  return null;
+}
+return result;
{code}

See {{DFSTestUtil.urlGet()}}.

 Add web UI for NFS gateway
 --

 Key: HDFS-7424
 URL: https://issues.apache.org/jira/browse/HDFS-7424
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.2.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-7424.001.patch, HDFS-7424.002.patch


 This JIRA is to track the effort to add web UI for NFS gateway to show some 
 metrics and configuration related information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7445) Implement packet memory pool in output stream in libhdfs3

Zhanwei Wang created HDFS-7445:
--

 Summary: Implement packet memory pool in output stream in libhdfs3
 Key: HDFS-7445
 URL: https://issues.apache.org/jira/browse/HDFS-7445
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang


Implement a packet memory pool instead of allocating packet dynamically. A 
packet memory pool can guard against overcommit and avoid the cost of 
allocation for output stream in libhdfs3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3


[ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225541#comment-14225541
 ] 

Zhanwei Wang commented on HDFS-7017:


JIra HDFS-7445 is opened for packet memroy pool.

Hi [~wheat9], [~cmccabe], any comment on new patch?

 Implement OutputStream for libhdfs3
 ---

 Key: HDFS-7017
 URL: https://issues.apache.org/jira/browse/HDFS-7017
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7017-pnative.002.patch, 
 HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, 
 HDFS-7017-pnative.005.patch, HDFS-7017.patch


 Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7023) use libexpat instead of libxml2 for libhdfs3


[ 
https://issues.apache.org/jira/browse/HDFS-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225548#comment-14225548
 ] 

Zhanwei Wang commented on HDFS-7023:


Hi [~cmccabe]

The patch looks good. But the compiler failed build the binary. 

{code}

Undefined symbols for architecture x86_64:
  _XML_ErrorString, referenced from:
  hdfs::internal::XmlConfigParser::ParseXml(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const) in 
XmlConfigParser.cc.o
  _XML_GetCurrentLineNumber, referenced from:
  hdfs::internal::XmlData::endElement(void*, char const*) in 
XmlConfigParser.cc.o
  hdfs::internal::XmlData::handleData(void*, char const*, int) in 
XmlConfigParser.cc.o
  _XML_GetErrorCode, referenced from:
  hdfs::internal::XmlConfigParser::ParseXml(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const) in 
XmlConfigParser.cc.o
  _XML_Parse, referenced from:
  hdfs::internal::XmlConfigParser::ParseXml(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const) in 
XmlConfigParser.cc.o
  _XML_ParserCreate, referenced from:
  hdfs::internal::XmlConfigParser::ParseXml(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const) in 
XmlConfigParser.cc.o
  _XML_ParserFree, referenced from:
  hdfs::internal::XmlData::~XmlData() in XmlConfigParser.cc.o
  _XML_SetCharacterDataHandler, referenced from:
  hdfs::internal::XmlData::XmlData(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, 
XML_ParserStruct*) in XmlConfigParser.cc.o
  _XML_SetElementHandler, referenced from:
  hdfs::internal::XmlData::XmlData(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, 
XML_ParserStruct*) in XmlConfigParser.cc.o
  _XML_SetUserData, referenced from:
  hdfs::internal::XmlData::XmlData(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, 
XML_ParserStruct*) in XmlConfigParser.cc.o
  hdfs::internal::StrToInt32(char const*, int*), referenced from:
  hdfs::Config::getInt32(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, int*) const in 
Config.cc.o
  hdfs::Config::getInt32(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, int, int*) 
const in Config.cc.o
  hdfs::internal::StrToInt64(char const*, long long*), referenced from:
  hdfs::Config::getInt64(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, long long*) 
const in Config.cc.o
  hdfs::Config::getInt64(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, long long, 
long long*) const in Config.cc.o
  hdfs::internal::StrToDouble(char const*, double*), referenced from:
  hdfs::Config::getDouble(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, double*) const 
in Config.cc.o
  hdfs::Config::getDouble(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, double, 
double*) const in Config.cc.o
  hdfs::internal::StrToBool(char const*, bool*), referenced from:
  hdfs::Config::getBool(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, bool*) const 
in Config.cc.o
  hdfs::Config::getBool(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, bool, bool*) 
const in Config.cc.o
  hdfs::internal::XmlData::handleData(void*, char const*, int) in 
XmlConfigParser.cc.o

{code}

Seems that you forget to modify CMake file to add libexpat.

 use libexpat instead of libxml2 for libhdfs3
 

 Key: HDFS-7023
 URL: https://issues.apache.org/jira/browse/HDFS-7023
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7023.001.pnative.patch


 As commented in HDFS-6994, libxml2 may has some thread safe issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to

Colin Patrick McCabe created HDFS-7446:
--

 Summary: HDFS inotify should have the ability to determine what 
txid it has read up to
 Key: HDFS-7446
 URL: https://issues.apache.org/jira/browse/HDFS-7446
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


HDFS inotify should have the ability to determine what txid it has read up to.  
This will allow users who want to avoid missing any events to record this txid 
and use it to resume reading events at the spot they left off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to


[ 
https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225571#comment-14225571
 ] 

Colin Patrick McCabe commented on HDFS-7446:


This patch adds a txid field to all {{Event}} objects.  We have to send this 
over the wire, since the existing information (start txid and stop txid for a 
group of txids we got in an RPC) is not enough.  Not every edit log txid maps 
to an event.

Another complication is the fact that some txids map to more than one event.  
This makes it somewhat difficult for clients to know what they've read up to 
when using a one-event-at-a-time interface.  This patch solves that by having 
the {{DFSInotifyEventInputStream}} return an array of events.  In the cases 
where a single txid maps to multiple events, we return an array of all those 
events.  So the client knows that after it has finished processing this batch, 
it is done with that transaction id.  This interface is marked as unstable, so 
changing it is not a problem.

Miscellaneous cleanups: I made all some fields final in the {{Event}} 
structures.  In cases where I modified a unit test, I replaced assertTrue(1 == 
foo) with assertEquals(1, foo).  The latter gives nicer error messages when the 
test fails.

 HDFS inotify should have the ability to determine what txid it has read up to
 -

 Key: HDFS-7446
 URL: https://issues.apache.org/jira/browse/HDFS-7446
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe

 HDFS inotify should have the ability to determine what txid it has read up 
 to.  This will allow users who want to avoid missing any events to record 
 this txid and use it to resume reading events at the spot they left off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to


 [ 
https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7446:
---
Attachment: HDFS-7446.001.patch

 HDFS inotify should have the ability to determine what txid it has read up to
 -

 Key: HDFS-7446
 URL: https://issues.apache.org/jira/browse/HDFS-7446
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7446.001.patch


 HDFS inotify should have the ability to determine what txid it has read up 
 to.  This will allow users who want to avoid missing any events to record 
 this txid and use it to resume reading events at the spot they left off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to


 [ 
https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7446:
---
Status: Patch Available  (was: Open)

 HDFS inotify should have the ability to determine what txid it has read up to
 -

 Key: HDFS-7446
 URL: https://issues.apache.org/jira/browse/HDFS-7446
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7446.001.patch


 HDFS inotify should have the ability to determine what txid it has read up 
 to.  This will allow users who want to avoid missing any events to record 
 this txid and use it to resume reading events at the spot they left off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to

2014-11-25 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225605#comment-14225605
]

Andrew Wang commented on HDFS-7446:
---

Hey Colin, thanks for working on this. You definitely bring up a good point
about the txids. Since this is marked as unstable and still quite new, I think
it's okay to make sweeping changes to the API.

I had just a few high-level review comments, the code itself looks fine:

* It feels like we have a mismatch between the underlying data and our objects.
The need for the VHS-rewind in getTxidBatchSize is one example, what we really
want there is an iterator of EditEvents, with one EditEvents per txid (name is
just a suggestion).
* The txid could also be moved into EditEvents which would also save some bytes.

I'm hoping this isn't too bad to do, since the edit log translator already
returns an Event[] per op, and it seems like most of the PB code can be reused.

HDFS inotify should have the ability to determine what txid it has read up to
-

Key: HDFS-7446
URL: https://issues.apache.org/jira/browse/HDFS-7446
Project: Hadoop HDFS
Issue Type: Improvement
Components: dfsclient
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-7446.001.patch

HDFS inotify should have the ability to determine what txid it has read up
to. This will allow users who want to avoid missing any events to record
this txid and use it to resume reading events at the spot they left off.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class


[ 
https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225608#comment-14225608
 ] 

Jing Zhao commented on HDFS-7440:
-

The latest patch looks good to me. +1 pending Jenkins.

 Consolidate snapshot related operations in a single class
 -

 Key: HDFS-7440
 URL: https://issues.apache.org/jira/browse/HDFS-7440
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch, 
 HDFS-7440.002.patch


 Currently the snapshot-related code scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the logic in a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7435) PB encoding of block reports is very inefficient


 [ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7435:

Attachment: HDFS-7435.000.patch

Uploading a demo patch (based on Daryn's patch) for chunking. Still need to do 
more testing.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class


[ 
https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225620#comment-14225620
 ] 

Hadoop QA commented on HDFS-7440:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683691/HDFS-7440.002.patch
  against trunk revision 56f3eec.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8837//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8837//console

This message is automatically generated.

 Consolidate snapshot related operations in a single class
 -

 Key: HDFS-7440
 URL: https://issues.apache.org/jira/browse/HDFS-7440
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch, 
 HDFS-7440.002.patch


 Currently the snapshot-related code scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the logic in a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection


[ 
https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225637#comment-14225637
 ] 

Hadoop QA commented on HDFS-7444:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683697/HDFS-7444.001.patch
  against trunk revision f43a20c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8839//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8839//console

This message is automatically generated.

 convertToBlockUnderConstruction should preserve BlockCollection
 ---

 Key: HDFS-7444
 URL: https://issues.apache.org/jira/browse/HDFS-7444
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7444.000.patch, HDFS-7444.001.patch


 {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object 
 to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller 
 should preserve the {{BlockCollection}} field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection


[ 
https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225641#comment-14225641
 ] 

Hadoop QA commented on HDFS-7444:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683697/HDFS-7444.001.patch
  against trunk revision f43a20c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8840//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8840//console

This message is automatically generated.

 convertToBlockUnderConstruction should preserve BlockCollection
 ---

 Key: HDFS-7444
 URL: https://issues.apache.org/jira/browse/HDFS-7444
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7444.000.patch, HDFS-7444.001.patch


 {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object 
 to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller 
 should preserve the {{BlockCollection}} field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2014-11-25 Thread Yi Liu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225666#comment-14225666
]

Yi Liu commented on HDFS-7435:
--

Hi guys, I think this is a good improvement.
I also find a similar issue related to this when I'm doing Hadoop RPC related
optimization in my local branch.
As we all known that the blocks report from DNs may become very large in big
cluster, and there is chance to cause full GC if there is no enough contiguous
space in old generation.

We know that we will reuse the connection for RPC calls, but when we process
each rpc in the same connection, we will allocate a fresh heap byte buffer to
store the rpc bytes data. The rpc message may be very large. So it will cause
the same issue.
My thought is to reuse the data buffer in the connection, I will open a new
JIRA to track it.

PB encoding of block reports is very inefficient

Key: HDFS-7435
URL: https://issues.apache.org/jira/browse/HDFS-7435
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
Attachments: HDFS-7435.000.patch, HDFS-7435.patch

Block reports are encoded as a PB repeating long. Repeating fields use an
{{ArrayList}} with default capacity of 10. A block report containing tens or
hundreds of thousand of longs (3 for each replica) is extremely expensive
since the {{ArrayList}} must realloc many times. Also, decoding repeating
fields will box the primitive longs which must then be unboxed.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to

[
https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225710#comment-14225710
]

Hadoop QA commented on HDFS-7446:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12683729/HDFS-7446.001.patch
against trunk revision a655973.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 2 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

org.apache.hadoop.mapreduce.v2.app.TestCheckpointPreemptionPolicy
org.apache.hadoop.mapreduce.v2.app.TestMRClientService

The following test timeouts occurred in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

org.apache.hadoop.mapreduce.v2.TestUberAM

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/8841//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8841//console

This message is automatically generated.

HDFS inotify should have the ability to determine what txid it has read up to
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7440) Consolidate snapshot related operations in a single class


 [ 
https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7440:
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~jingzhao] for the 
reviews.

 Consolidate snapshot related operations in a single class
 -

 Key: HDFS-7440
 URL: https://issues.apache.org/jira/browse/HDFS-7440
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch, 
 HDFS-7440.002.patch


 Currently the snapshot-related code scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the logic in a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7438) Consolidate implementation of rename()


 [ 
https://issues.apache.org/jira/browse/HDFS-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-7438:
-
Attachment: HDFS-7438.001.patch

 Consolidate implementation of rename()
 --

 Key: HDFS-7438
 URL: https://issues.apache.org/jira/browse/HDFS-7438
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-7438.000.patch, HDFS-7438.001.patch


 The implementation of {{rename()}} resides in both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate them in a single class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class


[ 
https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225757#comment-14225757
 ] 

Hudson commented on HDFS-7440:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6608 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6608/])
HDFS-7440. Consolidate snapshot related operations in a single class. 
Contributed by Haohui Mai. (wheat9: rev 
4a3161182905afaf450a60d02528161ed1f97471)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSnapshotOp.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Consolidate snapshot related operations in a single class
 -

 Key: HDFS-7440
 URL: https://issues.apache.org/jira/browse/HDFS-7440
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch, 
 HDFS-7440.002.patch


 Currently the snapshot-related code scatters across both {{FSNameSystem}} and 
 {{FSDirectory}}. This jira proposes to consolidate the logic in a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab


[ 
https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225760#comment-14225760
 ] 

Haohui Mai commented on HDFS-6407:
--

Thanks for working on this.

My understanding is that only the datanode tab needs to be sorted and 
paginated, thus it should not affect other tables. Therefore I'm leaning 
towards a simpler solution instead of introducing a plugin. Let me experiment a 
little bit.

 new namenode UI, lost ability to sort columns in datanode tab
 -

 Key: HDFS-6407
 URL: https://issues.apache.org/jira/browse/HDFS-6407
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Nathan Roberts
Assignee: Benoy Antony
Priority: Minor
 Attachments: HDFS-6407.patch, browse_directory.png, datanodes.png, 
 snapshots.png


 old ui supported clicking on column header to sort on that column. The new ui 
 seems to have dropped this very useful feature.
 There are a few tables in the Namenode UI to display  datanodes information, 
 directory listings and snapshots.
 When there are many items in the tables, it is useful to have ability to sort 
 on the different columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7017) Implement OutputStream for libhdfs3


[ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225768#comment-14225768
 ] 

Haohui Mai edited comment on HDFS-7017 at 11/26/14 5:23 AM:


{code}
virtual ~LeaseRenewer();
{code}

It no longer needs to have a virtual destructor since no class inherits 
{{LeaseRenewer}}.

Similar to the comments on {{LeaseRenewer}} and {{LeaseRenewerImpl}}, maybe the 
code can be further simplified by merging {{PipelineImpl}} into {{Pipeline}}.


was (Author: wheat9):
{code}
virtual ~LeaseRenewer();
{code}

It no longer needs to have a virtual destructor since no class inherits 
{{LeaseRenewer}}.

Similar to the comments on {{LeaseRenewer}} and {{LeaseRenewerImpl}}, maybe the 
code can be further simplified by merging {{PipelineImpl}} into {{Pipeline}},

 Implement OutputStream for libhdfs3
 ---

 Key: HDFS-7017
 URL: https://issues.apache.org/jira/browse/HDFS-7017
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7017-pnative.002.patch, 
 HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, 
 HDFS-7017-pnative.005.patch, HDFS-7017.patch


 Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times


[ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225767#comment-14225767
 ] 

Vinayakumar B commented on HDFS-7342:
-

bq. It seems that lease recovery could happen before the client request comes 
here, when this happens, the block state would be COMMITTED with minReplication 
met, right?
We are talking about the state of the penultimate block not the last block, 
which is the cause found for this issue.
1. For the penultimate block, only client request (request for another block) 
will make it COMMITTED, as client will be still alive and adds one more block.
2. And for the last block, client makes it COMMITTED during normal closure, 
else {{commitBlockSynchronization()}} during the lease recovery closure. 

I see no other places, block getting COMMITTED.

 Lease Recovery doesn't happen some times
 

 Key: HDFS-7342
 URL: https://issues.apache.org/jira/browse/HDFS-7342
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch


 In some cases, LeaseManager tries to recover a lease, but is not able to. 
 HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3


[ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225768#comment-14225768
 ] 

Haohui Mai commented on HDFS-7017:
--

{code}
virtual ~LeaseRenewer();
{code}

It no longer needs to have a virtual destructor since no class inherits 
{{LeaseRenewer}}.

Similar to the comments on {{LeaseRenewer}} and {{LeaseRenewerImpl}}, maybe the 
code can be further simplified by merging {{PipelineImpl}} into {{Pipeline}},

 Implement OutputStream for libhdfs3
 ---

 Key: HDFS-7017
 URL: https://issues.apache.org/jira/browse/HDFS-7017
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7017-pnative.002.patch, 
 HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, 
 HDFS-7017-pnative.005.patch, HDFS-7017.patch


 Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7023) use libexpat instead of libxml2 for libhdfs3


[ 
https://issues.apache.org/jira/browse/HDFS-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225771#comment-14225771
 ] 

Haohui Mai commented on HDFS-7023:
--

I wonder, are there any interests on creating a configuration that does not 
depend on XML parser at all?

What we can do is to create a {{Options}} class which captures all 
configuration parameters directly. The XML parser can set the configuration 
parameters accordingly.

 use libexpat instead of libxml2 for libhdfs3
 

 Key: HDFS-7023
 URL: https://issues.apache.org/jira/browse/HDFS-7023
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7023.001.pnative.patch


 As commented in HDFS-6994, libxml2 may has some thread safe issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3


[ 
https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225775#comment-14225775
 ] 

Zhanwei Wang commented on HDFS-7017:


Hi [~wheat9]

{{MockLeaseRenewer}} will inherits {{LeaseRenewer}} and will be used in unit 
test.

 Implement OutputStream for libhdfs3
 ---

 Key: HDFS-7017
 URL: https://issues.apache.org/jira/browse/HDFS-7017
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-7017-pnative.002.patch, 
 HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, 
 HDFS-7017-pnative.005.patch, HDFS-7017.patch


 Implement pipeline and OutputStream C++ interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3