[jira] [Commented] (HDFS-7384) 'getfacl' command and 'getAclStatus' output should be in sync
[ https://issues.apache.org/jira/browse/HDFS-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224171#comment-14224171 ] Vinayakumar B commented on HDFS-7384: - Hi [~cnauroth], if you could look into latest patch that would be great. Thanks 'getfacl' command and 'getAclStatus' output should be in sync - Key: HDFS-7384 URL: https://issues.apache.org/jira/browse/HDFS-7384 Project: Hadoop HDFS Issue Type: Improvement Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-7384-001.patch, HDFS-7384-002.patch, HDFS-7384-003.patch, HDFS-7384-004.patch, HDFS-7384-005.patch *getfacl* command will print all the entries including basic and extended entries, mask entries and effective permissions. But, *getAclStatus* FileSystem API will return only extended ACL entries set by the user. But this will not include the mask entry as well as effective permissions. To benefit the client using API, better to include 'mask' entry and effective permissions in the return list of entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5146) JspHelper#bestNode() doesn't handle bad datanodes correctly
[ https://issues.apache.org/jira/browse/HDFS-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-5146: Resolution: Cannot Reproduce Status: Resolved (was: Patch Available) resolving this issue as this code is no longer available after the migration of UI to HTML5. JspHelper#bestNode() doesn't handle bad datanodes correctly --- Key: HDFS-5146 URL: https://issues.apache.org/jira/browse/HDFS-5146 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-5146.patch JspHelper#bestNode() doesn't handle correctly if the chosen datanode is down. {code}while (s == null) { if (chosenNode == null) { do { if (doRandom) { index = DFSUtil.getRandom().nextInt(nodes.length); } else { index++; } chosenNode = nodes[index]; } while (deadNodes.contains(chosenNode)); } chosenNode = nodes[index]; {code} In this part of the code, choosing the datanode will be done only once. If the chosen datanode is down, then definitely exception will be thrown instead of re-chosing the available node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7310) Mover can give first priority to local DN if it has target storage type available in local DN
[ https://issues.apache.org/jira/browse/HDFS-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224178#comment-14224178 ] Vinayakumar B commented on HDFS-7310: - Hi [~jingzhao], if you could look at the latest patch would be great. Appreciate if others also want to take a look. Thanks Mover can give first priority to local DN if it has target storage type available in local DN - Key: HDFS-7310 URL: https://issues.apache.org/jira/browse/HDFS-7310 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Vinayakumar B Attachments: HDFS-7310-001.patch, HDFS-7310-002.patch, HDFS-7310-003.patch Currently Mover logic may move blocks to any DN which had target storage type. But if the src DN has target storage type then mover can give highest priority to local DN. If local DN does not contains target storage type, then it can assign to any DN as the current logic does. This is a thought, have not go through the code fully yet. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times
[ https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224213#comment-14224213 ] Yongjun Zhang commented on HDFS-7342: - Hi Guys, Thanks a lot for the comments and new rev. Please see my comments below, one for each of you:-) {quote} If any COMMITTED blocks reaches minReplication, state will be automatically changed to COMPLETE while processing that IBR itself. Need not be user call. So there is no chance of COMMITTED block state with minReplication met. right? {quote} Hi [~vinayrpet], indeed the following code in {{BlockManager::addStoredBlock}} may be called when IBR is processed, that matches what you were saying: {code} if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED numLiveReplicas = minReplication) { storedBlock = completeBlock(bc, storedBlock, false); } {code} But the block has to be COMMITTED to be made COMPLETE. If it's not COMMITTED yet (changing to COMMITTED is a request from client and it's asynchronous) , even if it has min replication number of replications, it won't be changed to COMPLETE. So I think we may still need to take care of changing block's state to COMPLETE in {{FSNamesystem#internalReleaseLease}}. Right? Hi [~kihwal], Summary of my understanding of your comment is, there are two paths, one is the regular write, the other is recovery. * for regular write path, we need to enforce minimal replication * for the recovery patch, we just need to enforce 1 replica and let replication monitor to take care of the rest. * we can make commitBlockSynchronization() to change a block to COMMITTED when there is at least one replica, ignoring min-replication. Currently only client can inform NN asynchronously to make a block COMMITTED. I think it makes sense. Am I understanding you correctly? Hi Ravi, Thanks for the new rev. While we are still discussing the final solution, I noticed couple of things in your rev3 per my original suggested solution: 1. Change {code} 4471 * liIf the penultimate/last block is COMMITTED or COMPLETE - force the 4472 * block to be COMPLETE even if it is not minimally replicated/li {code} To {code} 4471 * liIf the penultimate/last block is COMMITTED - force the 4472 * block to be COMPLETE if it is minimally replicated/li {code} 2. you forgot to add {{setBlockCollection(blk.getBlockCollection());}} in BlockInfoDesired constructor, thus Null pointer exception will happen. Let's not rush into addressing those, but see if we can work out a solution toward the direction Kihwal stated. Thank you all again. Lease Recovery doesn't happen some times Key: HDFS-7342 URL: https://issues.apache.org/jira/browse/HDFS-7342 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch In some cases, LeaseManager tries to recover a lease, but is not able to. HDFS-4882 describes a possibility of that. We should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7429) DomainSocketWatcher.kick stuck
[ https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-7429: --- Summary: DomainSocketWatcher.kick stuck (was: DomainSocketWatcher.doPoll0 stuck) DomainSocketWatcher.kick stuck -- Key: HDFS-7429 URL: https://issues.apache.org/jira/browse/HDFS-7429 Project: Hadoop HDFS Issue Type: Bug Reporter: zhaoyunjiong Attachments: 11241021, 11241023, 11241025 I found some of our DataNodes will run exceeds the limit of concurrent xciever, the limit is 4K. After check the stack, I suspect that DomainSocketWatcher.doPoll0 stuck: {quote} DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5576000 nid=0x385d waiting on condition [0x7f558d5d4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000740df9c90 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) -- DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5575000 nid=0x37b3 runnable [0x7f558d3d2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5574000 nid=0x377a waiting on condition [0x7f558d7d6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000740df9cb0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) Thread-163852 daemon prio=10 tid=0x7f55c811c800 nid=0x6757 runnable
[jira] [Updated] (HDFS-7429) DomainSocketWatcher.kick stuck
[ https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-7429: --- Description: I found some of our DataNodes will run exceeds the limit of concurrent xciever, the limit is 4K. After check the stack, I suspect that org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by DomainSocketWatcher.kick stuck: {quote} DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5576000 nid=0x385d waiting on condition [0x7f558d5d4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000740df9c90 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) -- DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5575000 nid=0x37b3 runnable [0x7f558d3d2000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5574000 nid=0x377a waiting on condition [0x7f558d7d6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000740df9cb0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) Thread-163852 daemon prio=10 tid=0x7f55c811c800 nid=0x6757 runnable [0x7f55aef6e000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$800(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:457) at java.lang.Thread.run(Thread.java:745) {quote} was: I found some of our
[jira] [Commented] (HDFS-7429) DomainSocketWatcher.kick stuck
[ https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224249#comment-14224249 ] zhaoyunjiong commented on HDFS-7429: The previous description is not right. The stuck thread happened at org.apache.hadoop.net.unix.DomainSocket.writeArray0 as below shows. {quote} $ grep -B2 -A10 DomainSocket.writeArray 1124102* 11241021-DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable [0x7f7db06c5000] 11241021- java.lang.Thread.State: RUNNABLE 11241021: at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) 11241021- at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) 11241021- at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) 11241021- at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) 11241021- at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) 11241021- at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) 11241021- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) 11241021- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) 11241021- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) 11241021- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) 11241021- at java.lang.Thread.run(Thread.java:745) -- -- 11241023-DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable [0x7f7db06c5000] 11241023- java.lang.Thread.State: RUNNABLE 11241023: at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) 11241023- at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) 11241023- at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) 11241023- at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) 11241023- at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) 11241023- at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) 11241023- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) 11241023- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) 11241023- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) 11241023- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) 11241023- at java.lang.Thread.run(Thread.java:745) -- -- 11241025-DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable [0x7f7db06c5000] 11241025- java.lang.Thread.State: RUNNABLE 11241025: at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) 11241025- at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) 11241025- at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) 11241025- at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) 11241025- at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) 11241025- at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) 11241025- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) 11241025- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) 11241025- at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) 11241025- at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) 11241025- at java.lang.Thread.run(Thread.java:745) {quote} DomainSocketWatcher.kick stuck -- Key: HDFS-7429 URL: https://issues.apache.org/jira/browse/HDFS-7429 Project: Hadoop HDFS Issue Type: Bug Reporter: zhaoyunjiong Attachments: 11241021, 11241023, 11241025 I found some of our DataNodes will run exceeds the limit of concurrent xciever, the limit is 4K. After check the stack, I suspect that org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by
[jira] [Updated] (HDFS-7429) DomainSocketWatcher.kick stuck
[ https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong updated HDFS-7429: --- Description: I found some of our DataNodes will run exceeds the limit of concurrent xciever, the limit is 4K. After check the stack, I suspect that org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by DomainSocketWatcher.kick stuck: {quote} DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5576000 nid=0x385d waiting on condition [0x7f558d5d4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000740df9c90 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) -- DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable [0x7f7db06c5000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5574000 nid=0x377a waiting on condition [0x7f558d7d6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000740df9cb0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) Thread-163852 daemon prio=10 tid=0x7f55c811c800 nid=0x6757 runnable [0x7f55aef6e000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$800(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:457) at
[jira] [Commented] (HDFS-7310) Mover can give first priority to local DN if it has target storage type available in local DN
[ https://issues.apache.org/jira/browse/HDFS-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224282#comment-14224282 ] Uma Maheswara Rao G commented on HDFS-7310: --- Thanks a lot Vinay, Let me take a look at it. Mover can give first priority to local DN if it has target storage type available in local DN - Key: HDFS-7310 URL: https://issues.apache.org/jira/browse/HDFS-7310 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Vinayakumar B Attachments: HDFS-7310-001.patch, HDFS-7310-002.patch, HDFS-7310-003.patch Currently Mover logic may move blocks to any DN which had target storage type. But if the src DN has target storage type then mover can give highest priority to local DN. If local DN does not contains target storage type, then it can assign to any DN as the current logic does. This is a thought, have not go through the code fully yet. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7429) DomainSocketWatcher.kick stuck
[ https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaoyunjiong reassigned HDFS-7429: -- Assignee: zhaoyunjiong DomainSocketWatcher.kick stuck -- Key: HDFS-7429 URL: https://issues.apache.org/jira/browse/HDFS-7429 Project: Hadoop HDFS Issue Type: Bug Reporter: zhaoyunjiong Assignee: zhaoyunjiong Attachments: 11241021, 11241023, 11241025 I found some of our DataNodes will run exceeds the limit of concurrent xciever, the limit is 4K. After check the stack, I suspect that org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by DomainSocketWatcher.kick stuck: {quote} DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5576000 nid=0x385d waiting on condition [0x7f558d5d4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000740df9c90 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) -- DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable [0x7f7db06c5000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5574000 nid=0x377a waiting on condition [0x7f558d7d6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000740df9cb0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) Thread-163852
[jira] [Commented] (HDFS-7429) DomainSocketWatcher.kick stuck
[ https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224325#comment-14224325 ] zhaoyunjiong commented on HDFS-7429: The problem here is in our machine we can only send 299 bytes to domain socket. When it try to send the 300 byte, it will block, and the DomainSocketWatcher.add(DomainSocket sock, Handler handler) have the lock, so watcherThread.run can't get the lock and clear the buffer, it's a live lock. I'm not sure which configuration controls the bufferSize of 299 for now. Now I suspect net.core.netdev_budget, which is 300 at our machines. I'll upload a patch to control the send bytes to prevent live lock later. By the way, should I move this to HADOOP COMMON project? DomainSocketWatcher.kick stuck -- Key: HDFS-7429 URL: https://issues.apache.org/jira/browse/HDFS-7429 Project: Hadoop HDFS Issue Type: Bug Reporter: zhaoyunjiong Attachments: 11241021, 11241023, 11241025 I found some of our DataNodes will run exceeds the limit of concurrent xciever, the limit is 4K. After check the stack, I suspect that org.apache.hadoop.net.unix.DomainSocket.writeArray0 which called by DomainSocketWatcher.kick stuck: {quote} DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5576000 nid=0x385d waiting on condition [0x7f558d5d4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000740df9c90 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) -- DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f7de034c800 nid=0x7b7 runnable [0x7f7db06c5000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589) at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1] daemon prio=10 tid=0x7f55c5574000 nid=0x377a waiting on condition [0x7f558d7d6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000740df9cb0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306) at
[jira] [Commented] (HDFS-6633) Support reading new data in a being written file until the file is closed
[ https://issues.apache.org/jira/browse/HDFS-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224333#comment-14224333 ] Hadoop QA commented on HDFS-6633: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683496/HDFS-6633-002.patch against trunk revision 61a2510. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8831//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8831//console This message is automatically generated. Support reading new data in a being written file until the file is closed - Key: HDFS-6633 URL: https://issues.apache.org/jira/browse/HDFS-6633 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Vinayakumar B Attachments: HDFS-6633-001.patch, HDFS-6633-002.patch, h6633_20140707.patch, h6633_20140708.patch When a file is being written, the file length keeps increasing. If the file is opened for read, the reader first gets the file length and then read only up to that length. The reader will not be able to read the new data written afterward. We propose adding a new feature so that readers will be able to read all the data until the writer closes the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224353#comment-14224353 ] Steve Loughran commented on HDFS-6735: -- you can fix the findbugs warnng by tweaking {{hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml }} and including that diff in the patch A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream - Key: HDFS-6735 URL: https://issues.apache.org/jira/browse/HDFS-6735 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, HDFS-6735-v5.txt, HDFS-6735.txt In current DFSInputStream impl, there're a couple of coarser-grained locks in read/pread path, and it has became a HBase read latency pain point so far. In HDFS-6698, i made a minor patch against the first encourtered lock, around getFileLength, in deed, after reading code and testing, it shows still other locks we could improve. In this jira, i'll make a patch against other locks, and a simple test case to show the issue and the improved result. This is important for HBase application, since in current HFile read path, we issue all read()/pread() requests in the same DFSInputStream for one HFile. (Multi streams solution is another story i had a plan to do, but probably will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()
[ https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224358#comment-14224358 ] Hudson commented on HDFS-7436: -- FAILURE: Integrated in Hadoop-Yarn-trunk #754 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/754/]) HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. (wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java Consolidate implementation of concat() -- Key: HDFS-7436 URL: https://issues.apache.org/jira/browse/HDFS-7436 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7436.000.patch The implementation of {{concat()}} scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the implementation in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()
[ https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224368#comment-14224368 ] Hudson commented on HDFS-7436: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/16/]) HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. (wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java Consolidate implementation of concat() -- Key: HDFS-7436 URL: https://issues.apache.org/jira/browse/HDFS-7436 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7436.000.patch The implementation of {{concat()}} scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the implementation in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer
[ https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224372#comment-14224372 ] Hudson commented on HDFS-7412: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/16/]) HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. (wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java Move RetryCache to NameNodeRpcServer Key: HDFS-7412 URL: https://issues.apache.org/jira/browse/HDFS-7412 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch The concept of RetryCache belongs to the RPC layer.It would be nice to separate it from the implementation of {{FSNameSystem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer
[ https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224362#comment-14224362 ] Hudson commented on HDFS-7412: -- FAILURE: Integrated in Hadoop-Yarn-trunk #754 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/754/]) HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. (wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Move RetryCache to NameNodeRpcServer Key: HDFS-7412 URL: https://issues.apache.org/jira/browse/HDFS-7412 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch The concept of RetryCache belongs to the RPC layer.It would be nice to separate it from the implementation of {{FSNameSystem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224361#comment-14224361 ] Hudson commented on HDFS-4882: -- FAILURE: Integrated in Hadoop-Yarn-trunk #754 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/754/]) HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev daacbc18d739d030822df0b75205eeb067f89850) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java CHANGES.txt: add HDFS-4882 (cmccabe: rev 6970dbf3669b2906ea71c97acbc5a0dcdb715283) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Prevent the Namenode's LeaseManager from looping forever in checkLeases --- Key: HDFS-4882 URL: https://issues.apache.org/jira/browse/HDFS-4882 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.0.0-alpha, 2.5.1 Reporter: Zesheng Wu Assignee: Ravi Prakash Priority: Critical Fix For: 2.6.1 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch Scenario: 1. cluster with 4 DNs 2. the size of the file to be written is a little more than one block 3. write the first block to 3 DNs, DN1-DN2-DN3 4. all the data packets of first block is successfully acked and the client sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out 5. DN2 and DN3 are down 6. client recovers the pipeline, but no new DN is added to the pipeline because of the current pipeline stage is PIPELINE_CLOSE 7. client continuously writes the last block, and try to close the file after written all the data 8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2), and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes the last block's state to COMPLETE 9. shutdown the client 10. the file's lease exceeds hard limit 11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease() 12. but the last block's state is COMPLETE, and this triggers lease manager's infinite loop and prints massive logs like this: {noformat} 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard limit 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= /user/h_wuzesheng/test.dat 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597, lastBLockState=COMPLETE 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM APREDUCE_-1252656407_1, pendingcreates: 1] {noformat} (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host
[ https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224357#comment-14224357 ] Hudson commented on HDFS-7303: -- FAILURE: Integrated in Hadoop-Yarn-trunk #754 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/754/]) HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java NN UI fails to distinguish datanodes on the same host - Key: HDFS-7303 URL: https://issues.apache.org/jira/browse/HDFS-7303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch If you start multiple datanodes on different ports on the the same host, only one of them appears in the NN UI’s datanode tab. While this is not a common scenario, there are still scenarios where you need to start multiple datanodes on the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224371#comment-14224371 ] Hudson commented on HDFS-4882: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/16/]) HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev daacbc18d739d030822df0b75205eeb067f89850) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java CHANGES.txt: add HDFS-4882 (cmccabe: rev 6970dbf3669b2906ea71c97acbc5a0dcdb715283) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Prevent the Namenode's LeaseManager from looping forever in checkLeases --- Key: HDFS-4882 URL: https://issues.apache.org/jira/browse/HDFS-4882 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.0.0-alpha, 2.5.1 Reporter: Zesheng Wu Assignee: Ravi Prakash Priority: Critical Fix For: 2.6.1 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch Scenario: 1. cluster with 4 DNs 2. the size of the file to be written is a little more than one block 3. write the first block to 3 DNs, DN1-DN2-DN3 4. all the data packets of first block is successfully acked and the client sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out 5. DN2 and DN3 are down 6. client recovers the pipeline, but no new DN is added to the pipeline because of the current pipeline stage is PIPELINE_CLOSE 7. client continuously writes the last block, and try to close the file after written all the data 8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2), and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes the last block's state to COMPLETE 9. shutdown the client 10. the file's lease exceeds hard limit 11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease() 12. but the last block's state is COMPLETE, and this triggers lease manager's infinite loop and prints massive logs like this: {noformat} 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard limit 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= /user/h_wuzesheng/test.dat 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597, lastBLockState=COMPLETE 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM APREDUCE_-1252656407_1, pendingcreates: 1] {noformat} (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature
[ https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224359#comment-14224359 ] Hudson commented on HDFS-7419: -- FAILURE: Integrated in Hadoop-Yarn-trunk #754 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/754/]) HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java CHANGES.txt: add HDFS-7419 (cmccabe: rev 380a361cfaafaab42614f5f26fac9668d99f8073) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve error messages for DataNode hot swap drive feature -- Key: HDFS-7419 URL: https://issues.apache.org/jira/browse/HDFS-7419 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, HDFS-7419.002.patch, HDFS-7419.003.patch When DataNode fails to add a volume, it adds one failure message to {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed error messages are not logged in DataNode's log and they are emitted from clients. This JIRA makes {{DataNode}} reports detailed failure in its log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature
[ https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224369#comment-14224369 ] Hudson commented on HDFS-7419: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/16/]) HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java CHANGES.txt: add HDFS-7419 (cmccabe: rev 380a361cfaafaab42614f5f26fac9668d99f8073) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve error messages for DataNode hot swap drive feature -- Key: HDFS-7419 URL: https://issues.apache.org/jira/browse/HDFS-7419 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, HDFS-7419.002.patch, HDFS-7419.003.patch When DataNode fails to add a volume, it adds one failure message to {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed error messages are not logged in DataNode's log and they are emitted from clients. This JIRA makes {{DataNode}} reports detailed failure in its log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host
[ https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224367#comment-14224367 ] Hudson commented on HDFS-7303: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/16/]) HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java NN UI fails to distinguish datanodes on the same host - Key: HDFS-7303 URL: https://issues.apache.org/jira/browse/HDFS-7303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch If you start multiple datanodes on different ports on the the same host, only one of them appears in the NN UI’s datanode tab. While this is not a common scenario, there are still scenarios where you need to start multiple datanodes on the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7210) Avoid two separate RPC's namenode.append() and namenode.getFileInfo() for an append call from DFSClient
[ https://issues.apache.org/jira/browse/HDFS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224395#comment-14224395 ] Hadoop QA commented on HDFS-7210: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683498/HDFS-7210-005.patch against trunk revision 61a2510. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestRenameWhileOpen {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8832//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8832//console This message is automatically generated. Avoid two separate RPC's namenode.append() and namenode.getFileInfo() for an append call from DFSClient --- Key: HDFS-7210 URL: https://issues.apache.org/jira/browse/HDFS-7210 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-7210-001.patch, HDFS-7210-002.patch, HDFS-7210-003.patch, HDFS-7210-004.patch, HDFS-7210-005.patch Currently DFSClient does 2 RPCs to namenode for an append operation. {{append()}} for re-opening the file and getting the last block, {{getFileInfo()}} Another on to get HdfsFileState If we can combine result of these 2 calls and make one RPC, then it can reduce load on NameNode. For the backward compatibility we need to keep existing {{append()}} call as is -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times
[ https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224407#comment-14224407 ] Vinayakumar B commented on HDFS-7342: - {quote}But the block has to be COMMITTED to be made COMPLETE. If it's not COMMITTED yet (changing to COMMITTED is a request from client and it's asynchronous) , even if it has min replication number of replications, it won't be changed to COMPLETE. So I think we may still need to take care of changing block's state to COMPLETE in FSNamesystem#internalReleaseLease. Right?{quote} I agree that client request and Datanode's IBR are asynchronous. But both will update the block state under writelock. penultimate block will be COMMITTED in the {{getAdditionalBlock()}} client's request. Here there are 3 possibilities, 1. All IBRs comes before even block is COMMITTED. At this time, if the block is FINALIZED in DN, replica will be accepted. {code}if (ucBlock.reportedState == ReplicaState.FINALIZED !block.findDatanode(storageInfo.getDatanodeDescriptor())) { addStoredBlock(block, storageInfo, null, true); }{code} 2. If client request comes after receiving 2 (=minReplication) IBRs, then client request only will make the state to COMPLETED immediately after making it COMMITTED in following code of {{BlockManager#commitOrCompleteLastBlock()}} {code}final boolean b = commitBlock((BlockInfoUnderConstruction)lastBlock, commitBlock); if(countNodes(lastBlock).liveReplicas() = minReplication) completeBlock(bc, bc.numBlocks()-1, false); return b;{code} At this time, if the IBRs received are not enough, then block will be just COMMITTED. 3. If the IBRs received after client request. i.e. after COMMITTED, then while processing the second IBR block will be COMPLETED in below code. {code}if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED numLiveReplicas = minReplication) { storedBlock = completeBlock(bc, storedBlock, false);{code} So I couldnt find the possibility of the Block in COMMITTED state with minReplication met. {quote}{{recoverLeaseInternal()}} and {{internalReleaseLease()}} will need to be made to distinguish the on-demand recovery from normal lease expiration. For on-demand recovery, we might want it to fail if there is no live replicas, as a file lease is normally recovered for subsequent append or copy(read). If there is no data, they will fail.{quote} I understood [~kihwal]'s suggestions as below. {{recoverLease()}} call from client passes a {{force}} flag to {{recoverLeaseInternal()}}. Based on this flag, we can check the block's states (excluding last block) and # of replicas and decide to go ahead for recovery or not even initiating request to DataNode. So we need not worry this case in commitBlockSynchronization. In {{commitBlockSynchronization()}} directly complete all blocks and close the file. Am I right [~kihwal] ? Lease Recovery doesn't happen some times Key: HDFS-7342 URL: https://issues.apache.org/jira/browse/HDFS-7342 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch In some cases, LeaseManager tries to recover a lease, but is not able to. HDFS-4882 describes a possibility of that. We should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7210) Avoid two separate RPC's namenode.append() and namenode.getFileInfo() for an append call from DFSClient
[ https://issues.apache.org/jira/browse/HDFS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224412#comment-14224412 ] Vinayakumar B commented on HDFS-7210: - Failure is unrelated. Seems like it failed due to corrupted hadoop-common.jar and failed to load core-default.xml {noformat}2014-11-25 09:01:00,976 FATAL conf.Configuration (Configuration.java:loadResource(2518)) - error parsing conf core-default.xml java.util.zip.ZipException: invalid block type at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:105){noformat} Avoid two separate RPC's namenode.append() and namenode.getFileInfo() for an append call from DFSClient --- Key: HDFS-7210 URL: https://issues.apache.org/jira/browse/HDFS-7210 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-7210-001.patch, HDFS-7210-002.patch, HDFS-7210-003.patch, HDFS-7210-004.patch, HDFS-7210-005.patch Currently DFSClient does 2 RPCs to namenode for an append operation. {{append()}} for re-opening the file and getting the last block, {{getFileInfo()}} Another on to get HdfsFileState If we can combine result of these 2 calls and make one RPC, then it can reduce load on NameNode. For the backward compatibility we need to keep existing {{append()}} call as is -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails
Kihwal Lee created HDFS-7443: Summary: Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Priority: Blocker When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-7443: - Description: When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. was: When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Priority: Blocker When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature
[ https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224588#comment-14224588 ] Hudson commented on HDFS-7419: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1944 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1944/]) HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java CHANGES.txt: add HDFS-7419 (cmccabe: rev 380a361cfaafaab42614f5f26fac9668d99f8073) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve error messages for DataNode hot swap drive feature -- Key: HDFS-7419 URL: https://issues.apache.org/jira/browse/HDFS-7419 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, HDFS-7419.002.patch, HDFS-7419.003.patch When DataNode fails to add a volume, it adds one failure message to {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed error messages are not logged in DataNode's log and they are emitted from clients. This JIRA makes {{DataNode}} reports detailed failure in its log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host
[ https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224586#comment-14224586 ] Hudson commented on HDFS-7303: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1944 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1944/]) HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java NN UI fails to distinguish datanodes on the same host - Key: HDFS-7303 URL: https://issues.apache.org/jira/browse/HDFS-7303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch If you start multiple datanodes on different ports on the the same host, only one of them appears in the NN UI’s datanode tab. While this is not a common scenario, there are still scenarios where you need to start multiple datanodes on the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224590#comment-14224590 ] Hudson commented on HDFS-4882: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1944 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1944/]) HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev daacbc18d739d030822df0b75205eeb067f89850) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java CHANGES.txt: add HDFS-4882 (cmccabe: rev 6970dbf3669b2906ea71c97acbc5a0dcdb715283) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Prevent the Namenode's LeaseManager from looping forever in checkLeases --- Key: HDFS-4882 URL: https://issues.apache.org/jira/browse/HDFS-4882 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.0.0-alpha, 2.5.1 Reporter: Zesheng Wu Assignee: Ravi Prakash Priority: Critical Fix For: 2.6.1 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch Scenario: 1. cluster with 4 DNs 2. the size of the file to be written is a little more than one block 3. write the first block to 3 DNs, DN1-DN2-DN3 4. all the data packets of first block is successfully acked and the client sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out 5. DN2 and DN3 are down 6. client recovers the pipeline, but no new DN is added to the pipeline because of the current pipeline stage is PIPELINE_CLOSE 7. client continuously writes the last block, and try to close the file after written all the data 8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2), and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes the last block's state to COMPLETE 9. shutdown the client 10. the file's lease exceeds hard limit 11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease() 12. but the last block's state is COMPLETE, and this triggers lease manager's infinite loop and prints massive logs like this: {noformat} 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard limit 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= /user/h_wuzesheng/test.dat 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597, lastBLockState=COMPLETE 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM APREDUCE_-1252656407_1, pendingcreates: 1] {noformat} (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()
[ https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224587#comment-14224587 ] Hudson commented on HDFS-7436: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1944 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1944/]) HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. (wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java Consolidate implementation of concat() -- Key: HDFS-7436 URL: https://issues.apache.org/jira/browse/HDFS-7436 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7436.000.patch The implementation of {{concat()}} scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the implementation in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer
[ https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224591#comment-14224591 ] Hudson commented on HDFS-7412: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1944 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1944/]) HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. (wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java Move RetryCache to NameNodeRpcServer Key: HDFS-7412 URL: https://issues.apache.org/jira/browse/HDFS-7412 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch The concept of RetryCache belongs to the RPC layer.It would be nice to separate it from the implementation of {{FSNameSystem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()
[ https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224596#comment-14224596 ] Hudson commented on HDFS-7436: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/16/]) HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. (wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java Consolidate implementation of concat() -- Key: HDFS-7436 URL: https://issues.apache.org/jira/browse/HDFS-7436 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7436.000.patch The implementation of {{concat()}} scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the implementation in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer
[ https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224600#comment-14224600 ] Hudson commented on HDFS-7412: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/16/]) HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. (wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java Move RetryCache to NameNodeRpcServer Key: HDFS-7412 URL: https://issues.apache.org/jira/browse/HDFS-7412 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch The concept of RetryCache belongs to the RPC layer.It would be nice to separate it from the implementation of {{FSNameSystem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224599#comment-14224599 ] Hudson commented on HDFS-4882: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/16/]) HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev daacbc18d739d030822df0b75205eeb067f89850) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java CHANGES.txt: add HDFS-4882 (cmccabe: rev 6970dbf3669b2906ea71c97acbc5a0dcdb715283) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Prevent the Namenode's LeaseManager from looping forever in checkLeases --- Key: HDFS-4882 URL: https://issues.apache.org/jira/browse/HDFS-4882 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.0.0-alpha, 2.5.1 Reporter: Zesheng Wu Assignee: Ravi Prakash Priority: Critical Fix For: 2.6.1 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch Scenario: 1. cluster with 4 DNs 2. the size of the file to be written is a little more than one block 3. write the first block to 3 DNs, DN1-DN2-DN3 4. all the data packets of first block is successfully acked and the client sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out 5. DN2 and DN3 are down 6. client recovers the pipeline, but no new DN is added to the pipeline because of the current pipeline stage is PIPELINE_CLOSE 7. client continuously writes the last block, and try to close the file after written all the data 8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2), and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes the last block's state to COMPLETE 9. shutdown the client 10. the file's lease exceeds hard limit 11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease() 12. but the last block's state is COMPLETE, and this triggers lease manager's infinite loop and prints massive logs like this: {noformat} 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard limit 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= /user/h_wuzesheng/test.dat 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597, lastBLockState=COMPLETE 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM APREDUCE_-1252656407_1, pendingcreates: 1] {noformat} (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature
[ https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224597#comment-14224597 ] Hudson commented on HDFS-7419: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/16/]) HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java CHANGES.txt: add HDFS-7419 (cmccabe: rev 380a361cfaafaab42614f5f26fac9668d99f8073) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve error messages for DataNode hot swap drive feature -- Key: HDFS-7419 URL: https://issues.apache.org/jira/browse/HDFS-7419 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, HDFS-7419.002.patch, HDFS-7419.003.patch When DataNode fails to add a volume, it adds one failure message to {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed error messages are not logged in DataNode's log and they are emitted from clients. This JIRA makes {{DataNode}} reports detailed failure in its log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host
[ https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224595#comment-14224595 ] Hudson commented on HDFS-7303: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/16/]) HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java NN UI fails to distinguish datanodes on the same host - Key: HDFS-7303 URL: https://issues.apache.org/jira/browse/HDFS-7303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch If you start multiple datanodes on different ports on the the same host, only one of them appears in the NN UI’s datanode tab. While this is not a common scenario, there are still scenarios where you need to start multiple datanodes on the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224617#comment-14224617 ] Kihwal Lee commented on HDFS-7443: -- This is the first error seen. {noformat} ERROR datanode.DataNode: Initialization failed for Block pool registering (Datanode Uuid unassigned) service to some.host:8020 EEXIST: File exists {noformat} This was after successful upgrade of several volumes. Since the hard link summary was not printed and it was multiple seconds after starting upgrade of this volume (did not fail right away), the error must have come from {{DataStorage.linkBlocks()}} when it was checking the result with {{Futures.get()}}. Then it was retried and failed the same way. {noformat} INFO common.Storage: Analyzing storage directories for bpid BP- INFO common.Storage: Recovering storage directory /a/b/hadoop/var/hdfs/data/current/BP- from previous upgrade INFO common.Storage: Upgrading block pool storage directory /a/b/hadoop/var/hdfs/data/current/BP- old LV = -55; old CTime = 12345678. new LV = -56; new CTime = 45678989 ERROR datanode.DataNode: Initialization failed for Block pool registering (Datanode Uuid unassigned) service to some.host:8020 EEXIST: File exists {noformat} This indicates {{Storage.analyzeStorage()}} correctly returning {{RECOVER_UPGRADE}} and the partial upgrade is undone before retrying. This repeated hundreds of times before termination of datanode, which logged the stack trace. {noformat} FATAL datanode.DataNode: Initialization failed for Block pool registering (Datanode Uuid unassigned) service to some.host:8020. Exiting. java.io.IOException: EEXIST: File exists at sun.reflect.GeneratedConstructorAccessor18.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at com.google.common.util.concurrent.Futures.newFromConstructor(Futures.java:1258) at com.google.common.util.concurrent.Futures.newWithCause(Futures.java:1218) at com.google.common.util.concurrent.Futures.wrapAndThrowExceptionOrError(Futures.java:1131) at com.google.common.util.concurrent.Futures.get(Futures.java:1048) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:999) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.linkAllBlocks(BlockPoolSliceStorage.java:594) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doUpgrade(BlockPoolSliceStorage.java:403) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:337) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:197) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:438) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1312) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1277) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:722) Caused by: EEXIST: File exists at org.apache.hadoop.io.nativeio.NativeIO.link0(Native Method) at org.apache.hadoop.io.nativeio.NativeIO.link(NativeIO.java:836) at org.apache.hadoop.hdfs.server.datanode.DataStorage$2.call(DataStorage.java:991) at org.apache.hadoop.hdfs.server.datanode.DataStorage$2.call(DataStorage.java:984) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more {noformat} At this point, {{previous.tmp}} contained the new directory structure with blocks and meta files placed in the ID-based directory. Some orphaned meta and block files were observed. Restarting datanode does not reproduce the issue, but I suspect data loss based on the missing files and the number of missing blocks. Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0
[jira] [Comment Edited] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224617#comment-14224617 ] Kihwal Lee edited comment on HDFS-7443 at 11/25/14 2:44 PM: This is the first error seen. {noformat} ERROR datanode.DataNode: Initialization failed for Block pool registering (Datanode Uuid unassigned) service to some.host:8020 EEXIST: File exists {noformat} This was after successful upgrade of several volumes. Since the hard link summary was not printed and it was multiple seconds after starting upgrade of this volume (did not fail right away), the error must have come from {{DataStorage.linkBlocks()}} when it was checking the result with {{Futures.get()}}. Then it was retried and failed the same way. {noformat} INFO common.Storage: Analyzing storage directories for bpid BP- INFO common.Storage: Recovering storage directory /a/b/hadoop/var/hdfs/data/current/BP- from previous upgrade INFO common.Storage: Upgrading block pool storage directory /a/b/hadoop/var/hdfs/data/current/BP- old LV = -55; old CTime = 12345678. new LV = -56; new CTime = 45678989 ERROR datanode.DataNode: Initialization failed for Block pool registering (Datanode Uuid unassigned) service to some.host:8020 EEXIST: File exists {noformat} This indicates {{Storage.analyzeStorage()}} correctly returning {{RECOVER_UPGRADE}} and the partial upgrade is undone before retrying. This repeated hundreds of times before termination of datanode, which logged the stack trace. {noformat} FATAL datanode.DataNode: Initialization failed for Block pool registering (Datanode Uuid unassigned) service to some.host:8020. Exiting. java.io.IOException: EEXIST: File exists at sun.reflect.GeneratedConstructorAccessor18.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at com.google.common.util.concurrent.Futures.newFromConstructor(Futures.java:1258) at com.google.common.util.concurrent.Futures.newWithCause(Futures.java:1218) at com.google.common.util.concurrent.Futures.wrapAndThrowExceptionOrError(Futures.java:1131) at com.google.common.util.concurrent.Futures.get(Futures.java:1048) at org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocks(DataStorage.java:999) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.linkAllBlocks(BlockPoolSliceStorage.java:594) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doUpgrade(BlockPoolSliceStorage.java:403) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:337) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:197) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:438) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1312) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1277) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:722) Caused by: EEXIST: File exists at org.apache.hadoop.io.nativeio.NativeIO.link0(Native Method) at org.apache.hadoop.io.nativeio.NativeIO.link(NativeIO.java:836) at org.apache.hadoop.hdfs.server.datanode.DataStorage$2.call(DataStorage.java:991) at org.apache.hadoop.hdfs.server.datanode.DataStorage$2.call(DataStorage.java:984) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more {noformat} At this point, {{previous.tmp}} contained the new directory structure with blocks and meta files placed in the ID-based directory. Some orphaned meta and block files were observed. Restarting datanode does not reproduce the issue, but I suspect data loss based on the missing files and the number of missing blocks. was (Author: kihwal): This is the first error seen. {noformat} ERROR datanode.DataNode: Initialization failed for Block pool registering (Datanode Uuid unassigned) service to some.host:8020 EEXIST: File exists {noformat} This was after successful upgrade of
[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient
[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224638#comment-14224638 ] Daryn Sharp commented on HDFS-7435: --- We're definitely on the same page. I meant questionably in the sense that the garbage generation rate is so high that CMS (you mention fragmentation) will be slow, bloated with free lists, and not keeping up. Now granted, a 20M allocation is likely to be prematurely tenured - along with the IPC protobuf containing the large report. One solution/workaround to both is reducing the size of the individual BR via multiple storages. The storages don't have to be individual drives but just subdirs. Segmenting shouldn't be an outright replacement. The decode will emit a long[][] which requires updating {{StorageBlockReport}} and {{BlockListAsLongs}}. Similar changes to the datanode, although not required for the namenode changes, will be more complex. I already considered segmenting. :) I found the complexity vs time vs benefit to be much lower than improvements to other subsystems. PB encoding of block reports is very inefficient Key: HDFS-7435 URL: https://issues.apache.org/jira/browse/HDFS-7435 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-7435.patch Block reports are encoded as a PB repeating long. Repeating fields use an {{ArrayList}} with default capacity of 10. A block report containing tens or hundreds of thousand of longs (3 for each replica) is extremely expensive since the {{ArrayList}} must realloc many times. Also, decoding repeating fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7433) DatanodeMap lookups DatanodeID hashCodes are inefficient
[ https://issues.apache.org/jira/browse/HDFS-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224667#comment-14224667 ] Daryn Sharp commented on HDFS-7433: --- My bad mentioning {{datanodeMap}} - juggling too many changes. {{DatanodeIDs}} are added to collections in many other places, and equality checks occur often. My more general point is mutable hashCodes are a hidden landmine which is why I filed another jira. Dynamic computation of the xfer addr (and by extension the hash) is inefficient and generates a lot of garbage. I'm checking out the odd test failures. They don't appear related, at least the xml parsing and class def not founds. DatanodeMap lookups DatanodeID hashCodes are inefficient -- Key: HDFS-7433 URL: https://issues.apache.org/jira/browse/HDFS-7433 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-7433.patch The datanode map is currently a {{TreeMap}}. For many thousands of datanodes, tree lookups are ~10X more expensive than a {{HashMap}}. Insertions and removals are up to 100X more expensive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host
[ https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224687#comment-14224687 ] Hudson commented on HDFS-7303: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/16/]) HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NN UI fails to distinguish datanodes on the same host - Key: HDFS-7303 URL: https://issues.apache.org/jira/browse/HDFS-7303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch If you start multiple datanodes on different ports on the the same host, only one of them appears in the NN UI’s datanode tab. While this is not a common scenario, there are still scenarios where you need to start multiple datanodes on the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature
[ https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224689#comment-14224689 ] Hudson commented on HDFS-7419: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/16/]) HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java CHANGES.txt: add HDFS-7419 (cmccabe: rev 380a361cfaafaab42614f5f26fac9668d99f8073) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve error messages for DataNode hot swap drive feature -- Key: HDFS-7419 URL: https://issues.apache.org/jira/browse/HDFS-7419 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, HDFS-7419.002.patch, HDFS-7419.003.patch When DataNode fails to add a volume, it adds one failure message to {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed error messages are not logged in DataNode's log and they are emitted from clients. This JIRA makes {{DataNode}} reports detailed failure in its log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer
[ https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224692#comment-14224692 ] Hudson commented on HDFS-7412: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/16/]) HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. (wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Move RetryCache to NameNodeRpcServer Key: HDFS-7412 URL: https://issues.apache.org/jira/browse/HDFS-7412 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch The concept of RetryCache belongs to the RPC layer.It would be nice to separate it from the implementation of {{FSNameSystem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()
[ https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224688#comment-14224688 ] Hudson commented on HDFS-7436: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/16/]) HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. (wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java Consolidate implementation of concat() -- Key: HDFS-7436 URL: https://issues.apache.org/jira/browse/HDFS-7436 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7436.000.patch The implementation of {{concat()}} scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the implementation in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224691#comment-14224691 ] Hudson commented on HDFS-4882: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #16 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/16/]) HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev daacbc18d739d030822df0b75205eeb067f89850) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java CHANGES.txt: add HDFS-4882 (cmccabe: rev 6970dbf3669b2906ea71c97acbc5a0dcdb715283) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Prevent the Namenode's LeaseManager from looping forever in checkLeases --- Key: HDFS-4882 URL: https://issues.apache.org/jira/browse/HDFS-4882 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.0.0-alpha, 2.5.1 Reporter: Zesheng Wu Assignee: Ravi Prakash Priority: Critical Fix For: 2.6.1 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch Scenario: 1. cluster with 4 DNs 2. the size of the file to be written is a little more than one block 3. write the first block to 3 DNs, DN1-DN2-DN3 4. all the data packets of first block is successfully acked and the client sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out 5. DN2 and DN3 are down 6. client recovers the pipeline, but no new DN is added to the pipeline because of the current pipeline stage is PIPELINE_CLOSE 7. client continuously writes the last block, and try to close the file after written all the data 8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2), and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes the last block's state to COMPLETE 9. shutdown the client 10. the file's lease exceeds hard limit 11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease() 12. but the last block's state is COMPLETE, and this triggers lease manager's infinite loop and prints massive logs like this: {noformat} 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard limit 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= /user/h_wuzesheng/test.dat 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597, lastBLockState=COMPLETE 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM APREDUCE_-1252656407_1, pendingcreates: 1] {noformat} (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host
[ https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224700#comment-14224700 ] Hudson commented on HDFS-7303: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1968 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1968/]) HDFS-7303. NN UI fails to distinguish datanodes on the same host. Contributed by Benoy Antony. (wheat9: rev 45fa7f023532e79dff3cf381056eff717dc4ecc7) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java NN UI fails to distinguish datanodes on the same host - Key: HDFS-7303 URL: https://issues.apache.org/jira/browse/HDFS-7303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch If you start multiple datanodes on different ports on the the same host, only one of them appears in the NN UI’s datanode tab. While this is not a common scenario, there are still scenarios where you need to start multiple datanodes on the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7419) Improve error messages for DataNode hot swap drive feature
[ https://issues.apache.org/jira/browse/HDFS-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224702#comment-14224702 ] Hudson commented on HDFS-7419: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1968 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1968/]) HDFS-7419. Improve error messages for DataNode hot swap drive feature (Lei Xu via Colin P. Mccabe) (cmccabe: rev f636f9d9439742d7ebaaf21f7e22652403572c61) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestReconfiguration.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/ReconfigurableBase.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java CHANGES.txt: add HDFS-7419 (cmccabe: rev 380a361cfaafaab42614f5f26fac9668d99f8073) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Improve error messages for DataNode hot swap drive feature -- Key: HDFS-7419 URL: https://issues.apache.org/jira/browse/HDFS-7419 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7419.000.patch, HDFS-7419.001.patch, HDFS-7419.002.patch, HDFS-7419.003.patch When DataNode fails to add a volume, it adds one failure message to {{errorMessageBuilder}} in {{DataNode#refreshVolumes}}. However, the detailed error messages are not logged in DataNode's log and they are emitted from clients. This JIRA makes {{DataNode}} reports detailed failure in its log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4882) Prevent the Namenode's LeaseManager from looping forever in checkLeases
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224704#comment-14224704 ] Hudson commented on HDFS-4882: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1968 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1968/]) HDFS-4882. Prevent the Namenode's LeaseManager from looping forever in checkLeases (Ravi Prakash via Colin P. McCabe) (cmccabe: rev daacbc18d739d030822df0b75205eeb067f89850) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java CHANGES.txt: add HDFS-4882 (cmccabe: rev 6970dbf3669b2906ea71c97acbc5a0dcdb715283) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Prevent the Namenode's LeaseManager from looping forever in checkLeases --- Key: HDFS-4882 URL: https://issues.apache.org/jira/browse/HDFS-4882 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.0.0-alpha, 2.5.1 Reporter: Zesheng Wu Assignee: Ravi Prakash Priority: Critical Fix For: 2.6.1 Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.5.patch, HDFS-4882.6.patch, HDFS-4882.7.patch, HDFS-4882.patch Scenario: 1. cluster with 4 DNs 2. the size of the file to be written is a little more than one block 3. write the first block to 3 DNs, DN1-DN2-DN3 4. all the data packets of first block is successfully acked and the client sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out 5. DN2 and DN3 are down 6. client recovers the pipeline, but no new DN is added to the pipeline because of the current pipeline stage is PIPELINE_CLOSE 7. client continuously writes the last block, and try to close the file after written all the data 8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2), and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes the last block's state to COMPLETE 9. shutdown the client 10. the file's lease exceeds hard limit 11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease() 12. but the last block's state is COMPLETE, and this triggers lease manager's infinite loop and prints massive logs like this: {noformat} 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard limit 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= /user/h_wuzesheng/test.dat 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597, lastBLockState=COMPLETE 2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM APREDUCE_-1252656407_1, pendingcreates: 1] {noformat} (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7412) Move RetryCache to NameNodeRpcServer
[ https://issues.apache.org/jira/browse/HDFS-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224705#comment-14224705 ] Hudson commented on HDFS-7412: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1968 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1968/]) HDFS-7412. Move RetryCache to NameNodeRpcServer. Contributed by Haohui Mai. (wheat9: rev 8e253cb93030642f5a7324bad0f161cd0ad33206) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java Move RetryCache to NameNodeRpcServer Key: HDFS-7412 URL: https://issues.apache.org/jira/browse/HDFS-7412 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7412.000.patch, HDFS-7412.001.patch The concept of RetryCache belongs to the RPC layer.It would be nice to separate it from the implementation of {{FSNameSystem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7436) Consolidate implementation of concat()
[ https://issues.apache.org/jira/browse/HDFS-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224701#comment-14224701 ] Hudson commented on HDFS-7436: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1968 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1968/]) HDFS-7436. Consolidate implementation of concat(). Contributed by Haohui Mai. (wheat9: rev 8caf537afabc70b0c74e0a29aea0cc2935ecb162) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirConcatOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Consolidate implementation of concat() -- Key: HDFS-7436 URL: https://issues.apache.org/jira/browse/HDFS-7436 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7436.000.patch The implementation of {{concat()}} scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the implementation in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times
[ https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224827#comment-14224827 ] Yongjun Zhang commented on HDFS-7342: - Thanks a lot for your detailed explanation [~vinayrpet]. {quote} 2. If client request comes after receiving 2 (=minReplication) IBRs, ... {quote} It seems that lease recovery could happen before the client request comes here, when this happens, the block state would be COMMITTED with minReplication met, right? Lease Recovery doesn't happen some times Key: HDFS-7342 URL: https://issues.apache.org/jira/browse/HDFS-7342 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch In some cases, LeaseManager tries to recover a lease, but is not able to. HDFS-4882 describes a possibility of that. We should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224876#comment-14224876 ] Benoy Antony commented on HDFS-6407: [~wheat9], Could you please review this enhancement ? new namenode UI, lost ability to sort columns in datanode tab - Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Benoy Antony Priority: Minor Attachments: HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7310) Mover can give first priority to local DN if it has target storage type available in local DN
[ https://issues.apache.org/jira/browse/HDFS-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224891#comment-14224891 ] Uma Maheswara Rao G commented on HDFS-7310: --- Hi Vinay, Patch looks good to me. I have one question before my +1 on this. Seems like now when you plan to move block across storage in a DN and if it is not on transient storage, you are not doing checksum calculation. my point is, we may need to compute checksum for having data integrity when moving across storage? For some reason, that is not necessary, please let me know and also in such case we need to change below log though {code} if (LOG.isDebugEnabled()) { LOG.debug(Copied + srcMeta + to + dstMeta + and calculated checksum); LOG.debug(Copied + srcFile + to + dstFile); } {code} Mover can give first priority to local DN if it has target storage type available in local DN - Key: HDFS-7310 URL: https://issues.apache.org/jira/browse/HDFS-7310 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Vinayakumar B Attachments: HDFS-7310-001.patch, HDFS-7310-002.patch, HDFS-7310-003.patch Currently Mover logic may move blocks to any DN which had target storage type. But if the src DN has target storage type then mover can give highest priority to local DN. If local DN does not contains target storage type, then it can assign to any DN as the current logic does. This is a thought, have not go through the code fully yet. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7303) NN UI fails to distinguish datanodes on the same host
[ https://issues.apache.org/jira/browse/HDFS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224893#comment-14224893 ] Benoy Antony commented on HDFS-7303: Thank you [~wheat9] for reviewing and committing. NN UI fails to distinguish datanodes on the same host - Key: HDFS-7303 URL: https://issues.apache.org/jira/browse/HDFS-7303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch, HDFS-7303.patch If you start multiple datanodes on different ports on the the same host, only one of them appears in the NN UI’s datanode tab. While this is not a common scenario, there are still scenarios where you need to start multiple datanodes on the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class
[ https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225094#comment-14225094 ] Jing Zhao commented on HDFS-7440: - The patch looks good to me. One small suggestion is that I think the changes related to audit log can be separated into another jira since it changes the current auditlog semantic. +1 after addressing the comments. Consolidate snapshot related operations in a single class - Key: HDFS-7440 URL: https://issues.apache.org/jira/browse/HDFS-7440 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7440.000.patch Currently the snapshot-related code scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the logic in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7440) Consolidate snapshot related operations in a single class
[ https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7440: - Attachment: HDFS-7440.001.patch Consolidate snapshot related operations in a single class - Key: HDFS-7440 URL: https://issues.apache.org/jira/browse/HDFS-7440 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch Currently the snapshot-related code scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the logic in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: HDFS-6735-v6.txt Thanks [~ste...@apache.org]. New patch with findbugs tweak. A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream - Key: HDFS-6735 URL: https://issues.apache.org/jira/browse/HDFS-6735 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt In current DFSInputStream impl, there're a couple of coarser-grained locks in read/pread path, and it has became a HBase read latency pain point so far. In HDFS-6698, i made a minor patch against the first encourtered lock, around getFileLength, in deed, after reading code and testing, it shows still other locks we could improve. In this jira, i'll make a patch against other locks, and a simple test case to show the issue and the improved result. This is important for HBase application, since in current HFile read path, we issue all read()/pread() requests in the same DFSInputStream for one HFile. (Multi streams solution is another story i had a plan to do, but probably will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node
[ https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-7097: - Attachment: HDFS-7097.ultimate.trunk.patch Here is the updated patch. Allow block reports to be processed during checkpointing on standby name node - Key: HDFS-7097 URL: https://issues.apache.org/jira/browse/HDFS-7097 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch On a reasonably busy HDFS cluster, there are stream of creates, causing data nodes to generate incremental block reports. When a standby name node is checkpointing, RPC handler threads trying to process a full or incremental block report is blocked on the name system's {{fsLock}}, because the checkpointer acquires the read lock on it. This can create a serious problem if the size of name space is big and checkpointing takes a long time. All available RPC handlers can be tied up very quickly. If you have 100 handlers, it only takes 34 file creates. If a separate service RPC port is not used, HA transition will have to wait in the call queue for minutes. Even if a separate service RPC port is configured, hearbeats from datanodes will be blocked. A standby NN with a big name space can lose all data nodes after checkpointing. The rpc calls will also be retransmitted by data nodes many times, filling up the call queue and potentially causing listen queue overflow. Since block reports are not modifying any state that is being saved to fsimage, I propose letting them through during checkpointing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class
[ https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225346#comment-14225346 ] Hadoop QA commented on HDFS-7440: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683635/HDFS-7440.001.patch against trunk revision 61a2510. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8833//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8833//console This message is automatically generated. Consolidate snapshot related operations in a single class - Key: HDFS-7440 URL: https://issues.apache.org/jira/browse/HDFS-7440 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch Currently the snapshot-related code scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the logic in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection
Haohui Mai created HDFS-7444: Summary: convertToBlockUnderConstruction should preserve BlockCollection Key: HDFS-7444 URL: https://issues.apache.org/jira/browse/HDFS-7444 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller should preserve the {{BlockCollection}} field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class
[ https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225363#comment-14225363 ] Jing Zhao commented on HDFS-7440: - With the change now FSNamesystem#removeBlocks is moved into the fsn write lock (in {{deleteSnapshot}}). I think we'd better still keep it out of the write lock. Consolidate snapshot related operations in a single class - Key: HDFS-7440 URL: https://issues.apache.org/jira/browse/HDFS-7440 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch Currently the snapshot-related code scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the logic in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection
[ https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7444: - Attachment: HDFS-7444.000.patch convertToBlockUnderConstruction should preserve BlockCollection --- Key: HDFS-7444 URL: https://issues.apache.org/jira/browse/HDFS-7444 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7444.000.patch {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller should preserve the {{BlockCollection}} field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection
[ https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7444: - Status: Patch Available (was: Open) convertToBlockUnderConstruction should preserve BlockCollection --- Key: HDFS-7444 URL: https://issues.apache.org/jira/browse/HDFS-7444 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7444.000.patch {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller should preserve the {{BlockCollection}} field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7440) Consolidate snapshot related operations in a single class
[ https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7440: - Attachment: HDFS-7440.002.patch Consolidate snapshot related operations in a single class - Key: HDFS-7440 URL: https://issues.apache.org/jira/browse/HDFS-7440 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch, HDFS-7440.002.patch Currently the snapshot-related code scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the logic in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node
[ https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225393#comment-14225393 ] Andrew Wang commented on HDFS-7097: --- +1 LGTM as well, thanks Kihwal for the patch, ATM, Vinay, and Ming for reviewing. I'll commit this shortly. Allow block reports to be processed during checkpointing on standby name node - Key: HDFS-7097 URL: https://issues.apache.org/jira/browse/HDFS-7097 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch On a reasonably busy HDFS cluster, there are stream of creates, causing data nodes to generate incremental block reports. When a standby name node is checkpointing, RPC handler threads trying to process a full or incremental block report is blocked on the name system's {{fsLock}}, because the checkpointer acquires the read lock on it. This can create a serious problem if the size of name space is big and checkpointing takes a long time. All available RPC handlers can be tied up very quickly. If you have 100 handlers, it only takes 34 file creates. If a separate service RPC port is not used, HA transition will have to wait in the call queue for minutes. Even if a separate service RPC port is configured, hearbeats from datanodes will be blocked. A standby NN with a big name space can lose all data nodes after checkpointing. The rpc calls will also be retransmitted by data nodes many times, filling up the call queue and potentially causing listen queue overflow. Since block reports are not modifying any state that is being saved to fsimage, I propose letting them through during checkpointing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7424) Add web UI for NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7424: - Attachment: HDFS-7424.002.patch Rebased the patch. Add web UI for NFS gateway -- Key: HDFS-7424 URL: https://issues.apache.org/jira/browse/HDFS-7424 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7424.001.patch, HDFS-7424.002.patch This JIRA is to track the effort to add web UI for NFS gateway to show some metrics and configuration related information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node
[ https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225424#comment-14225424 ] Hudson commented on HDFS-7097: -- FAILURE: Integrated in Hadoop-trunk-Commit #6605 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6605/]) HDFS-7097. Allow block reports to be processed during checkpointing on standby name node. (kihwal via wang) (wang: rev f43a20c529ac3f104add95b222de6580757b3763) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java Allow block reports to be processed during checkpointing on standby name node - Key: HDFS-7097 URL: https://issues.apache.org/jira/browse/HDFS-7097 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch On a reasonably busy HDFS cluster, there are stream of creates, causing data nodes to generate incremental block reports. When a standby name node is checkpointing, RPC handler threads trying to process a full or incremental block report is blocked on the name system's {{fsLock}}, because the checkpointer acquires the read lock on it. This can create a serious problem if the size of name space is big and checkpointing takes a long time. All available RPC handlers can be tied up very quickly. If you have 100 handlers, it only takes 34 file creates. If a separate service RPC port is not used, HA transition will have to wait in the call queue for minutes. Even if a separate service RPC port is configured, hearbeats from datanodes will be blocked. A standby NN with a big name space can lose all data nodes after checkpointing. The rpc calls will also be retransmitted by data nodes many times, filling up the call queue and potentially causing listen queue overflow. Since block reports are not modifying any state that is being saved to fsimage, I propose letting them through during checkpointing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection
[ https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7444: - Attachment: HDFS-7444.001.patch convertToBlockUnderConstruction should preserve BlockCollection --- Key: HDFS-7444 URL: https://issues.apache.org/jira/browse/HDFS-7444 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7444.000.patch, HDFS-7444.001.patch {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller should preserve the {{BlockCollection}} field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection
[ https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225428#comment-14225428 ] Hadoop QA commented on HDFS-7444: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683689/HDFS-7444.000.patch against trunk revision 56f3eec. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. Failed to build the native portion of hadoop-common prior to running the unit tests in hadoop-hdfs-project/hadoop-hdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8836//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8836//console This message is automatically generated. convertToBlockUnderConstruction should preserve BlockCollection --- Key: HDFS-7444 URL: https://issues.apache.org/jira/browse/HDFS-7444 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7444.000.patch, HDFS-7444.001.patch {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller should preserve the {{BlockCollection}} field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7424) Add web UI for NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225474#comment-14225474 ] Hadoop QA commented on HDFS-7424: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683693/HDFS-7424.002.patch against trunk revision f43a20c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. Failed to build the native portion of hadoop-common prior to running the unit tests in hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8838//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8838//console This message is automatically generated. Add web UI for NFS gateway -- Key: HDFS-7424 URL: https://issues.apache.org/jira/browse/HDFS-7424 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7424.001.patch, HDFS-7424.002.patch This JIRA is to track the effort to add web UI for NFS gateway to show some metrics and configuration related information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225473#comment-14225473 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683657/HDFS-6735-v6.txt against trunk revision 78f7cdb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8834//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8834//console This message is automatically generated. A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream - Key: HDFS-6735 URL: https://issues.apache.org/jira/browse/HDFS-6735 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt In current DFSInputStream impl, there're a couple of coarser-grained locks in read/pread path, and it has became a HBase read latency pain point so far. In HDFS-6698, i made a minor patch against the first encourtered lock, around getFileLength, in deed, after reading code and testing, it shows still other locks we could improve. In this jira, i'll make a patch against other locks, and a simple test case to show the issue and the improved result. This is important for HBase application, since in current HFile read path, we issue all read()/pread() requests in the same DFSInputStream for one HFile. (Multi streams solution is another story i had a plan to do, but probably will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node
[ https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225476#comment-14225476 ] Hadoop QA commented on HDFS-7097: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683658/HDFS-7097.ultimate.trunk.patch against trunk revision 78f7cdb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TestEnhancedByteBufferAccess {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8835//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8835//console This message is automatically generated. Allow block reports to be processed during checkpointing on standby name node - Key: HDFS-7097 URL: https://issues.apache.org/jira/browse/HDFS-7097 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch On a reasonably busy HDFS cluster, there are stream of creates, causing data nodes to generate incremental block reports. When a standby name node is checkpointing, RPC handler threads trying to process a full or incremental block report is blocked on the name system's {{fsLock}}, because the checkpointer acquires the read lock on it. This can create a serious problem if the size of name space is big and checkpointing takes a long time. All available RPC handlers can be tied up very quickly. If you have 100 handlers, it only takes 34 file creates. If a separate service RPC port is not used, HA transition will have to wait in the call queue for minutes. Even if a separate service RPC port is configured, hearbeats from datanodes will be blocked. A standby NN with a big name space can lose all data nodes after checkpointing. The rpc calls will also be retransmitted by data nodes many times, filling up the call queue and potentially causing listen queue overflow. Since block reports are not modifying any state that is being saved to fsimage, I propose letting them through during checkpointing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7424) Add web UI for NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225523#comment-14225523 ] Haohui Mai commented on HDFS-7424: -- Good work. Some comments: {code} +/** + * Encapsulates the HTTP server started by the NFS3 gateway. + */ +@InterfaceAudience.Private +public class Nfs3HttpServer { {code} You can simply mark the class as package-local. {code} + void start() throws IOException { +HttpServer2.Builder builder = new HttpServer2.Builder().setName(nfs3) +.setConf(conf).setACL(new AccessControlList(conf.get(DFS_ADMIN, ))); {code} Please see {{DFSUtil.httpServerTemplateForNNAndJN}}. {code} + public int getSecurePort() { +return this.infoSecurePort; + } + {code} This is unused. {code} + URL url = new URL(scheme + :// + NetUtils.getHostPortString(addr) + + /jmx); + URLConnection conn = connectionFactory.openConnection(url); + conn.connect(); + + InputStream is = conn.getInputStream(); + InputStreamReader isr = new InputStreamReader(is); + + int numCharsRead; + char[] charArray = new char[1024]; + StringBuffer sb = new StringBuffer(); + while ((numCharsRead = isr.read(charArray)) 0) { +sb.append(charArray, 0, numCharsRead); + } + result = sb.toString(); + +} catch (Exception e) { + e.printStackTrace(); + return null; +} +return result; {code} See {{DFSTestUtil.urlGet()}}. Add web UI for NFS gateway -- Key: HDFS-7424 URL: https://issues.apache.org/jira/browse/HDFS-7424 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7424.001.patch, HDFS-7424.002.patch This JIRA is to track the effort to add web UI for NFS gateway to show some metrics and configuration related information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7445) Implement packet memory pool in output stream in libhdfs3
Zhanwei Wang created HDFS-7445: -- Summary: Implement packet memory pool in output stream in libhdfs3 Key: HDFS-7445 URL: https://issues.apache.org/jira/browse/HDFS-7445 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Implement a packet memory pool instead of allocating packet dynamically. A packet memory pool can guard against overcommit and avoid the cost of allocation for output stream in libhdfs3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225541#comment-14225541 ] Zhanwei Wang commented on HDFS-7017: JIra HDFS-7445 is opened for packet memroy pool. Hi [~wheat9], [~cmccabe], any comment on new patch? Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, HDFS-7017-pnative.005.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7023) use libexpat instead of libxml2 for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225548#comment-14225548 ] Zhanwei Wang commented on HDFS-7023: Hi [~cmccabe] The patch looks good. But the compiler failed build the binary. {code} Undefined symbols for architecture x86_64: _XML_ErrorString, referenced from: hdfs::internal::XmlConfigParser::ParseXml(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const) in XmlConfigParser.cc.o _XML_GetCurrentLineNumber, referenced from: hdfs::internal::XmlData::endElement(void*, char const*) in XmlConfigParser.cc.o hdfs::internal::XmlData::handleData(void*, char const*, int) in XmlConfigParser.cc.o _XML_GetErrorCode, referenced from: hdfs::internal::XmlConfigParser::ParseXml(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const) in XmlConfigParser.cc.o _XML_Parse, referenced from: hdfs::internal::XmlConfigParser::ParseXml(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const) in XmlConfigParser.cc.o _XML_ParserCreate, referenced from: hdfs::internal::XmlConfigParser::ParseXml(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const) in XmlConfigParser.cc.o _XML_ParserFree, referenced from: hdfs::internal::XmlData::~XmlData() in XmlConfigParser.cc.o _XML_SetCharacterDataHandler, referenced from: hdfs::internal::XmlData::XmlData(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, XML_ParserStruct*) in XmlConfigParser.cc.o _XML_SetElementHandler, referenced from: hdfs::internal::XmlData::XmlData(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, XML_ParserStruct*) in XmlConfigParser.cc.o _XML_SetUserData, referenced from: hdfs::internal::XmlData::XmlData(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, XML_ParserStruct*) in XmlConfigParser.cc.o hdfs::internal::StrToInt32(char const*, int*), referenced from: hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int*) const in Config.cc.o hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int, int*) const in Config.cc.o hdfs::internal::StrToInt64(char const*, long long*), referenced from: hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long*) const in Config.cc.o hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long, long long*) const in Config.cc.o hdfs::internal::StrToDouble(char const*, double*), referenced from: hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double*) const in Config.cc.o hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double, double*) const in Config.cc.o hdfs::internal::StrToBool(char const*, bool*), referenced from: hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool*) const in Config.cc.o hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool, bool*) const in Config.cc.o hdfs::internal::XmlData::handleData(void*, char const*, int) in XmlConfigParser.cc.o {code} Seems that you forget to modify CMake file to add libexpat. use libexpat instead of libxml2 for libhdfs3 Key: HDFS-7023 URL: https://issues.apache.org/jira/browse/HDFS-7023 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Colin Patrick McCabe Attachments: HDFS-7023.001.pnative.patch As commented in HDFS-6994, libxml2 may has some thread safe issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to
Colin Patrick McCabe created HDFS-7446: -- Summary: HDFS inotify should have the ability to determine what txid it has read up to Key: HDFS-7446 URL: https://issues.apache.org/jira/browse/HDFS-7446 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe HDFS inotify should have the ability to determine what txid it has read up to. This will allow users who want to avoid missing any events to record this txid and use it to resume reading events at the spot they left off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to
[ https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225571#comment-14225571 ] Colin Patrick McCabe commented on HDFS-7446: This patch adds a txid field to all {{Event}} objects. We have to send this over the wire, since the existing information (start txid and stop txid for a group of txids we got in an RPC) is not enough. Not every edit log txid maps to an event. Another complication is the fact that some txids map to more than one event. This makes it somewhat difficult for clients to know what they've read up to when using a one-event-at-a-time interface. This patch solves that by having the {{DFSInotifyEventInputStream}} return an array of events. In the cases where a single txid maps to multiple events, we return an array of all those events. So the client knows that after it has finished processing this batch, it is done with that transaction id. This interface is marked as unstable, so changing it is not a problem. Miscellaneous cleanups: I made all some fields final in the {{Event}} structures. In cases where I modified a unit test, I replaced assertTrue(1 == foo) with assertEquals(1, foo). The latter gives nicer error messages when the test fails. HDFS inotify should have the ability to determine what txid it has read up to - Key: HDFS-7446 URL: https://issues.apache.org/jira/browse/HDFS-7446 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe HDFS inotify should have the ability to determine what txid it has read up to. This will allow users who want to avoid missing any events to record this txid and use it to resume reading events at the spot they left off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to
[ https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7446: --- Attachment: HDFS-7446.001.patch HDFS inotify should have the ability to determine what txid it has read up to - Key: HDFS-7446 URL: https://issues.apache.org/jira/browse/HDFS-7446 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7446.001.patch HDFS inotify should have the ability to determine what txid it has read up to. This will allow users who want to avoid missing any events to record this txid and use it to resume reading events at the spot they left off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to
[ https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7446: --- Status: Patch Available (was: Open) HDFS inotify should have the ability to determine what txid it has read up to - Key: HDFS-7446 URL: https://issues.apache.org/jira/browse/HDFS-7446 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7446.001.patch HDFS inotify should have the ability to determine what txid it has read up to. This will allow users who want to avoid missing any events to record this txid and use it to resume reading events at the spot they left off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to
[ https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225605#comment-14225605 ] Andrew Wang commented on HDFS-7446: --- Hey Colin, thanks for working on this. You definitely bring up a good point about the txids. Since this is marked as unstable and still quite new, I think it's okay to make sweeping changes to the API. I had just a few high-level review comments, the code itself looks fine: * It feels like we have a mismatch between the underlying data and our objects. The need for the VHS-rewind in getTxidBatchSize is one example, what we really want there is an iterator of EditEvents, with one EditEvents per txid (name is just a suggestion). * The txid could also be moved into EditEvents which would also save some bytes. I'm hoping this isn't too bad to do, since the edit log translator already returns an Event[] per op, and it seems like most of the PB code can be reused. HDFS inotify should have the ability to determine what txid it has read up to - Key: HDFS-7446 URL: https://issues.apache.org/jira/browse/HDFS-7446 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7446.001.patch HDFS inotify should have the ability to determine what txid it has read up to. This will allow users who want to avoid missing any events to record this txid and use it to resume reading events at the spot they left off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class
[ https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225608#comment-14225608 ] Jing Zhao commented on HDFS-7440: - The latest patch looks good to me. +1 pending Jenkins. Consolidate snapshot related operations in a single class - Key: HDFS-7440 URL: https://issues.apache.org/jira/browse/HDFS-7440 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch, HDFS-7440.002.patch Currently the snapshot-related code scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the logic in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7435) PB encoding of block reports is very inefficient
[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7435: Attachment: HDFS-7435.000.patch Uploading a demo patch (based on Daryn's patch) for chunking. Still need to do more testing. PB encoding of block reports is very inefficient Key: HDFS-7435 URL: https://issues.apache.org/jira/browse/HDFS-7435 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-7435.000.patch, HDFS-7435.patch Block reports are encoded as a PB repeating long. Repeating fields use an {{ArrayList}} with default capacity of 10. A block report containing tens or hundreds of thousand of longs (3 for each replica) is extremely expensive since the {{ArrayList}} must realloc many times. Also, decoding repeating fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class
[ https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225620#comment-14225620 ] Hadoop QA commented on HDFS-7440: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683691/HDFS-7440.002.patch against trunk revision 56f3eec. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8837//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8837//console This message is automatically generated. Consolidate snapshot related operations in a single class - Key: HDFS-7440 URL: https://issues.apache.org/jira/browse/HDFS-7440 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch, HDFS-7440.002.patch Currently the snapshot-related code scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the logic in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection
[ https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225637#comment-14225637 ] Hadoop QA commented on HDFS-7444: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683697/HDFS-7444.001.patch against trunk revision f43a20c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8839//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8839//console This message is automatically generated. convertToBlockUnderConstruction should preserve BlockCollection --- Key: HDFS-7444 URL: https://issues.apache.org/jira/browse/HDFS-7444 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7444.000.patch, HDFS-7444.001.patch {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller should preserve the {{BlockCollection}} field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7444) convertToBlockUnderConstruction should preserve BlockCollection
[ https://issues.apache.org/jira/browse/HDFS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225641#comment-14225641 ] Hadoop QA commented on HDFS-7444: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683697/HDFS-7444.001.patch against trunk revision f43a20c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8840//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8840//console This message is automatically generated. convertToBlockUnderConstruction should preserve BlockCollection --- Key: HDFS-7444 URL: https://issues.apache.org/jira/browse/HDFS-7444 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7444.000.patch, HDFS-7444.001.patch {{BlockInfo#convertToBlockUnderConstruction}} converts a {{BlockInfo}} object to a {{BlockInfoUnderConstruction}} object. The callee instead of the caller should preserve the {{BlockCollection}} field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient
[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225666#comment-14225666 ] Yi Liu commented on HDFS-7435: -- Hi guys, I think this is a good improvement. I also find a similar issue related to this when I'm doing Hadoop RPC related optimization in my local branch. As we all known that the blocks report from DNs may become very large in big cluster, and there is chance to cause full GC if there is no enough contiguous space in old generation. We know that we will reuse the connection for RPC calls, but when we process each rpc in the same connection, we will allocate a fresh heap byte buffer to store the rpc bytes data. The rpc message may be very large. So it will cause the same issue. My thought is to reuse the data buffer in the connection, I will open a new JIRA to track it. PB encoding of block reports is very inefficient Key: HDFS-7435 URL: https://issues.apache.org/jira/browse/HDFS-7435 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-7435.000.patch, HDFS-7435.patch Block reports are encoded as a PB repeating long. Repeating fields use an {{ArrayList}} with default capacity of 10. A block report containing tens or hundreds of thousand of longs (3 for each replica) is extremely expensive since the {{ArrayList}} must realloc many times. Also, decoding repeating fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to
[ https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225710#comment-14225710 ] Hadoop QA commented on HDFS-7446: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683729/HDFS-7446.001.patch against trunk revision a655973. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.mapreduce.v2.app.TestCheckpointPreemptionPolicy org.apache.hadoop.mapreduce.v2.app.TestMRClientService The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.mapreduce.v2.TestUberAM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8841//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8841//console This message is automatically generated. HDFS inotify should have the ability to determine what txid it has read up to - Key: HDFS-7446 URL: https://issues.apache.org/jira/browse/HDFS-7446 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7446.001.patch HDFS inotify should have the ability to determine what txid it has read up to. This will allow users who want to avoid missing any events to record this txid and use it to resume reading events at the spot they left off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7440) Consolidate snapshot related operations in a single class
[ https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7440: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~jingzhao] for the reviews. Consolidate snapshot related operations in a single class - Key: HDFS-7440 URL: https://issues.apache.org/jira/browse/HDFS-7440 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch, HDFS-7440.002.patch Currently the snapshot-related code scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the logic in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7438) Consolidate implementation of rename()
[ https://issues.apache.org/jira/browse/HDFS-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7438: - Attachment: HDFS-7438.001.patch Consolidate implementation of rename() -- Key: HDFS-7438 URL: https://issues.apache.org/jira/browse/HDFS-7438 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7438.000.patch, HDFS-7438.001.patch The implementation of {{rename()}} resides in both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate them in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7440) Consolidate snapshot related operations in a single class
[ https://issues.apache.org/jira/browse/HDFS-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225757#comment-14225757 ] Hudson commented on HDFS-7440: -- FAILURE: Integrated in Hadoop-trunk-Commit #6608 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6608/]) HDFS-7440. Consolidate snapshot related operations in a single class. Contributed by Haohui Mai. (wheat9: rev 4a3161182905afaf450a60d02528161ed1f97471) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSnapshotOp.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Consolidate snapshot related operations in a single class - Key: HDFS-7440 URL: https://issues.apache.org/jira/browse/HDFS-7440 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7440.000.patch, HDFS-7440.001.patch, HDFS-7440.002.patch Currently the snapshot-related code scatters across both {{FSNameSystem}} and {{FSDirectory}}. This jira proposes to consolidate the logic in a single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225760#comment-14225760 ] Haohui Mai commented on HDFS-6407: -- Thanks for working on this. My understanding is that only the datanode tab needs to be sorted and paginated, thus it should not affect other tables. Therefore I'm leaning towards a simpler solution instead of introducing a plugin. Let me experiment a little bit. new namenode UI, lost ability to sort columns in datanode tab - Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Benoy Antony Priority: Minor Attachments: HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225768#comment-14225768 ] Haohui Mai edited comment on HDFS-7017 at 11/26/14 5:23 AM: {code} virtual ~LeaseRenewer(); {code} It no longer needs to have a virtual destructor since no class inherits {{LeaseRenewer}}. Similar to the comments on {{LeaseRenewer}} and {{LeaseRenewerImpl}}, maybe the code can be further simplified by merging {{PipelineImpl}} into {{Pipeline}}. was (Author: wheat9): {code} virtual ~LeaseRenewer(); {code} It no longer needs to have a virtual destructor since no class inherits {{LeaseRenewer}}. Similar to the comments on {{LeaseRenewer}} and {{LeaseRenewerImpl}}, maybe the code can be further simplified by merging {{PipelineImpl}} into {{Pipeline}}, Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, HDFS-7017-pnative.005.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times
[ https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225767#comment-14225767 ] Vinayakumar B commented on HDFS-7342: - bq. It seems that lease recovery could happen before the client request comes here, when this happens, the block state would be COMMITTED with minReplication met, right? We are talking about the state of the penultimate block not the last block, which is the cause found for this issue. 1. For the penultimate block, only client request (request for another block) will make it COMMITTED, as client will be still alive and adds one more block. 2. And for the last block, client makes it COMMITTED during normal closure, else {{commitBlockSynchronization()}} during the lease recovery closure. I see no other places, block getting COMMITTED. Lease Recovery doesn't happen some times Key: HDFS-7342 URL: https://issues.apache.org/jira/browse/HDFS-7342 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch, HDFS-7342.3.patch In some cases, LeaseManager tries to recover a lease, but is not able to. HDFS-4882 describes a possibility of that. We should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225768#comment-14225768 ] Haohui Mai commented on HDFS-7017: -- {code} virtual ~LeaseRenewer(); {code} It no longer needs to have a virtual destructor since no class inherits {{LeaseRenewer}}. Similar to the comments on {{LeaseRenewer}} and {{LeaseRenewerImpl}}, maybe the code can be further simplified by merging {{PipelineImpl}} into {{Pipeline}}, Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, HDFS-7017-pnative.005.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7023) use libexpat instead of libxml2 for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225771#comment-14225771 ] Haohui Mai commented on HDFS-7023: -- I wonder, are there any interests on creating a configuration that does not depend on XML parser at all? What we can do is to create a {{Options}} class which captures all configuration parameters directly. The XML parser can set the configuration parameters accordingly. use libexpat instead of libxml2 for libhdfs3 Key: HDFS-7023 URL: https://issues.apache.org/jira/browse/HDFS-7023 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Colin Patrick McCabe Attachments: HDFS-7023.001.pnative.patch As commented in HDFS-6994, libxml2 may has some thread safe issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225775#comment-14225775 ] Zhanwei Wang commented on HDFS-7017: Hi [~wheat9] {{MockLeaseRenewer}} will inherits {{LeaseRenewer}} and will be used in unit test. Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, HDFS-7017-pnative.005.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225784#comment-14225784 ] Haohui Mai commented on HDFS-7017: -- bq. MockLeaseRenewer will inherits LeaseRenewer and will be used in unit test. What about putting this change in the patch that adds unit tests in this branch? Otherwise it might look confusing only from the prospective of this patch. The other parts of the patch looks good to me. +1 once it is addressed. Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, HDFS-7017-pnative.005.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)