[jira] [Resolved] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-26 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10322.
--
Resolution: Duplicate

[~chenfolin], thank you for investigating this further.  I'm just updating 
status on this issue to indicate it's a duplicate of a prior issue.

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Fix For: 2.6.4
>
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-26 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-10322:
--

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Fix For: 2.6.4
>
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259529#comment-15259529
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

bq. Hey Colin, I reviewed your patch more thoroughly. There is still a problem 
with interleaving reports. See updateBlockReportContext(). Suppose that block 
reports interleave like this: . Then br1-s2 
will reset curBlockReportRpcsSeen since curBlockReportId is not the same as in 
the report, which will discard the bit set for s1 in br2-s1, and the count of 
rpcsSeen = 0 will be wrong for br2-s2. So possibly unreported (zombie) storages 
will not be removed. LMK if you see what I see.

Thanks for looking at the patch.  I agree that in the case of interleaving, 
zombie storages will not be removed.  I don't consider that a problem, since we 
will eventually get a non-interleaved full block report that will do the zombie 
storage removal.  In practice, interleaved block reports are extremely rare (we 
have never seen the problem described in this JIRA, after deploying to 
thousands of clusters).

bq. May be we should go with a different approach for this problem.  Single 
block report can be split into multiple RPCs. Within single block-report-RPC 
NameNode processes each storage under a lock, but then releases and re-acquires 
the lock for the next storage, so that multiple RPC reports can interleave due 
to multi-threading.

Maybe I'm misunderstanding the proposal, but don't we already do all of this?  
We split block reports into multiple RPCs when the storage reports grow beyond 
a certain size.

bq. Approach. DN should report full list of its storages in the first 
block-report-RPC. The NameNode first cleans up unreported storages and replicas 
belonging them, then start processing the rest of block reports as usually. So 
DataNodes explicitly report storages that they have, which eliminates NameNode 
guessing, which storage is the last in the block report RPC.

What does the NameNode do if the DataNode is restarted while sending these 
RPCs, so that it never gets a chance to send all the storages that it claimed 
existed?  It seems like you will get stuck and not be able to accept any new 
reports.  Or, you can take the same approach the current patch does, and clear 
the current state every time you see a new ID (but then you can't do zombie 
storage elimination in the presence of interleaving.)

One approach that avoids all these problems is to avoid doing zombie storage 
elimination during FBRs entirely, and do it instead during DN heartbeats (for 
example).  DN heartbeats are small messages that are never split, and their 
processing is not interleaved with anything.

We agree that the current patch solves the problem of storages falsely being 
declared as zombies, I hope.  I think that's a good enough reason to get this 
patch in, and then think about alternate approaches later.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely

2016-04-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10336:
-
Affects Version/s: 3.0.0

> TestBalancer failing intermittently because of not reseting 
> UserGroupInformation completely
> ---
>
> Key: HDFS-10336
> URL: https://issues.apache.org/jira/browse/HDFS-10336
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10336.001.patch
>
>
> The unit test {{TestBalancer}} failed sometimes. 
> I looked for the reason. I found two main reasons causing this.
> * 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
> {code}
> org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
> Time elapsed: 300.41 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
> {code}
> * 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} 
> not completely sometimes in the finally block. And this affected the other 
> unit tests threw {{IOException}}, like this:
> {code}
> testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
>   Time elapsed: 0 sec  <<< ERROR!
> java.io.IOException: Running in secure mode, but config doesn't have a keytab
>   at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
> {code}
> And there were not only one test will be affected by this. We should add a 
> line to do before doing reset {{UGI}} operation and can avoid the potenial 
> exception happens.
> {code}
> UserGroupInformation.reset();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10313) Distcp need to enforce the order of snapshot names passed to -diff

2016-04-26 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259500#comment-15259500
 ] 

Lin Yiqun commented on HDFS-10313:
--

Thanks [~yzhangal] for commit!

> Distcp need to enforce the order of snapshot names passed to -diff
> --
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely

2016-04-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10336:
-
Attachment: HDFS-10336.001.patch

> TestBalancer failing intermittently because of not reseting 
> UserGroupInformation completely
> ---
>
> Key: HDFS-10336
> URL: https://issues.apache.org/jira/browse/HDFS-10336
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10336.001.patch
>
>
> The unit test {{TestBalancer}} failed sometimes. 
> I looked for the reason. I found two main reasons causing this.
> * 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
> {code}
> org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
> Time elapsed: 300.41 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
> {code}
> * 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} 
> not completely sometimes in the finally block. And this affected the other 
> unit tests threw {{IOException}}, like this:
> {code}
> testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
>   Time elapsed: 0 sec  <<< ERROR!
> java.io.IOException: Running in secure mode, but config doesn't have a keytab
>   at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
> {code}
> And there were not only one test will be affected by this. We should add a 
> line to do before doing reset {{UGI}} operation and can avoid the potenial 
> exception happens.
> {code}
> UserGroupInformation.reset();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely

2016-04-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10336:
-
Status: Patch Available  (was: Open)

Attach a initial patch, thanks review.

> TestBalancer failing intermittently because of not reseting 
> UserGroupInformation completely
> ---
>
> Key: HDFS-10336
> URL: https://issues.apache.org/jira/browse/HDFS-10336
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>
> The unit test {{TestBalancer}} failed sometimes. 
> I looked for the reason. I found two main reasons causing this.
> * 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
> {code}
> org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
> Time elapsed: 300.41 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
> {code}
> * 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} 
> not completely sometimes in the finally block. And this affected the other 
> unit tests threw {{IOException}}, like this:
> {code}
> testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
>   Time elapsed: 0 sec  <<< ERROR!
> java.io.IOException: Running in secure mode, but config doesn't have a keytab
>   at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
> {code}
> And there were not only one test will be affected by this. We should add a 
> line to do before doing reset {{UGI}} operation and can avoid the potenial 
> exception happens.
> {code}
> UserGroupInformation.reset();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely

2016-04-26 Thread Lin Yiqun (JIRA)
Lin Yiqun created HDFS-10336:


 Summary: TestBalancer failing intermittently because of not 
reseting UserGroupInformation completely
 Key: HDFS-10336
 URL: https://issues.apache.org/jira/browse/HDFS-10336
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Lin Yiqun
Assignee: Lin Yiqun


The unit test {{TestBalancer}} failed sometimes. 

I looked for the reason. I found two main reasons causing this.

* 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
{code}
org.apache.hadoop.hdfs.server.balancer.TestBalancer
testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
Time elapsed: 300.41 sec  <<< ERROR!
java.lang.Exception: test timed out after 30 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
at 
org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
at 
org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
{code}

* 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} not 
completely sometimes in the finally block. And this affected the other unit 
tests threw {{IOException}}, like this:
{code}
testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
  Time elapsed: 0 sec  <<< ERROR!
java.io.IOException: Running in secure mode, but config doesn't have a keytab
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
{code}
And there were not only one test will be affected by this. We should add a line 
to do before doing reset {{UGI}} operation and can avoid the potenial exception 
happens.
{code}
UserGroupInformation.reset();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5280) Corrupted meta files on data nodes prevents DFClient from connecting to data nodes and updating corruption status to name node.

2016-04-26 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259490#comment-15259490
 ] 

Walter Su commented on HDFS-5280:
-

There's other IOExceptions will cause readBlock RPC call fails, then cause the 
dn marked as dead. We could fix them as well.
If I understand correctly, you approach is to use a fake checksum. When client 
reads data, the check failed, and client will mark block as corrupted instead 
of mark dn as dead. I think, can we let client not to read from this dn at 
first? If client fails to create blockreader, it can tell if the dn is dead or 
it's just the block is corrupted.

{code}
//DFSInputStream.java
 652   try {
 653 blockReader = getBlockReader(targetBlock, offsetIntoBlock,
 654 targetBlock.getBlockSize() - offsetIntoBlock, targetAddr,
 655 storageType, chosenNode);
 656 if(connectFailedOnce) {
 657   DFSClient.LOG.info("Successfully connected to " + targetAddr +
 658  " for " + targetBlock.getBlock());
 659 }
 660 return chosenNode;
 661   } catch (IOException ex) {
 662 if (ex instanceof InvalidEncryptionKeyException && 
refetchEncryptionKey > 0) {
...
 672 } else {
...
 677   addToDeadNodes(chosenNode);
 678 }
 679   }
 680 }
 681   }
{code}
Instead of going to {{else}} clause, can we have another Exception like 
{{InvalidEncryptionKeyException}}, if we catch it, we skip the dn, and do not 
add it to dead nodes.

> Corrupted meta files on data nodes prevents DFClient from connecting to data 
> nodes and updating corruption status to name node.
> ---
>
> Key: HDFS-5280
> URL: https://issues.apache.org/jira/browse/HDFS-5280
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs-client
>Affects Versions: 1.1.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.7.2
> Environment: Red hat enterprise 6.4
> Hadoop-2.1.0
>Reporter: Jinghui Wang
>Assignee: Andres Perez
> Attachments: HDFS-5280.patch
>
>
> Meta files being corrupted causes the DFSClient not able to connect to the 
> datanodes to access the blocks, so DFSClient never perform a read on the 
> block, which is what throws the ChecksumException when file blocks are 
> corrupted and report to the namenode to mark the block as corrupt.  Since the 
> client never got to that far, thus the file status remain as healthy and so 
> are all the blocks.
> To replicate the error, put a file onto HDFS.
> run hadoop fsck /tmp/bogus.csv -files -blocks -location will get that 
> following output.
> FSCK started for path /tmp/bogus.csv at 11:33:29
> /tmp/bogus.csv 109 bytes, 1 block(s):  OK
> 0. blk_-4255166695856420554_5292 len=109 repl=3
> find the block/meta files for 4255166695856420554 by running 
> ssh datanode1.address find /hadoop/ -name "*4255166695856420554*" and it will 
> get the following output:
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta
> now corrupt the meta file by running 
> ssh datanode1.address "sed -i -e '1i 1234567891' 
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta" 
> now run hadoop fs -cat /tmp/bogus.csv
> will show the stack trace of DFSClient failing to connect to the data node 
> with the corrupted meta file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10329) Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java

2016-04-26 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259471#comment-15259471
 ] 

Lin Yiqun commented on HDFS-10329:
--

Thanks [~kihwal] for commit!

> Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java
> --
>
> Key: HDFS-10329
> URL: https://issues.apache.org/jira/browse/HDFS-10329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Max Schaefer
>Assignee: Lin Yiqun
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-10329.001.patch
>
>
> On [line 
> 167|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java#L167]
>  of {{RequestHedgingProxyProvider.java}}, a {{StringBuilder}} is initialised 
> like this:
> {code}
> StringBuilder combinedInfo = new StringBuilder('[');
> {code}
> This won't have the (presumably) desired effect of creating a 
> {{StringBuilder}} containing the string {{"["}}; instead, it will create a 
> {{StringBuilder}} with capacity 91 (the character code of '['). See 
> [here|http://what-when-how.com/Tutorial/topic-90315a/Java-Puzzlers-Traps-Pitfalls-and-Corner-Cases-69.html]
>  for an explanation.
> To fix this, pass a string literal instead of the character literal:
> {code}
> StringBuilder combinedInfo = new StringBuilder("[");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-26 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259457#comment-15259457
 ] 

Walter Su commented on HDFS-9958:
-

{code}
@@ -1320,11 +1320,22 @@ public void findAndMarkBlockAsCorrupt(final 
ExtendedBlock blk,
  
+if (storage == null) {
+  storage = storedBlock.findStorageInfo(node);
+}
{code}
I'm surprised that most of the time, {{storageID}} is null. It makes the code 
above error prone, because the blk can be added/moved to another healthy 
storage in the same node. I suppose we should add the storageID message into 
the request.

+1. re-trigger the jenkins.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10334) trunk test failures: TestHFlush#testHFlushInterrupted and TestFsDatasetImpl#testCleanShutdownOfVolume

2016-04-26 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259445#comment-15259445
 ] 

Brahma Reddy Battula commented on HDFS-10334:
-

So this can be closed right..?  HDFS-2043 and HDFS-10260 will track these 
failures..

> trunk test failures: TestHFlush#testHFlushInterrupted and 
> TestFsDatasetImpl#testCleanShutdownOfVolume
> -
>
> Key: HDFS-10334
> URL: https://issues.apache.org/jira/browse/HDFS-10334
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Xiaobing Zhou
>Priority: Critical
>
> It's been noticed these tests failed on trunk:
> TestHFlush#testHFlushInterrupted TestFsDatasetImpl#testCleanShutdownOfVolume
> {noformat}
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.417 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestHFlush
> testHFlushInterrupted(org.apache.hadoop.hdfs.TestHFlush)  Time elapsed: 0.646 
> sec  <<< ERROR!
> java.nio.channels.ClosedByInterruptException: null
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:653)
> {noformat}
> {noformat}
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 14.438 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
> testCleanShutdownOfVolume(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
>   Time elapsed: 6.882 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testCleanShutdownOfVolume(TestFsDatasetImpl.java:683)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10216) distcp -diff relative path exception

2016-04-26 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259444#comment-15259444
 ] 

John Zhuge commented on HDFS-10216:
---

Thanks [~yzhangal].

> distcp -diff relative path exception
> 
>
> Key: HDFS-10216
> URL: https://issues.apache.org/jira/browse/HDFS-10216
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.8.0
>Reporter: John Zhuge
>Assignee: Takashi Ohnishi
> Fix For: 2.8.0
>
> Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, 
> HDFS-10216.3.patch, HDFS-10216.4.patch
>
>
> Got this exception when running {{distcp -diff}} with relative paths:
> {code}
> $ hadoop distcp -update -diff s1 s2 d1 d2
> 16/03/25 09:45:40 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], 
> targetPath=d2, targetPathExists=true, preserveRawXattrs=false, 
> filtersFile='null'}
> 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at 
> jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032
> 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: 
> hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2
>   at org.apache.hadoop.fs.Path.initialize(Path.java:206)
>   at org.apache.hadoop.fs.Path.(Path.java:197)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:123)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:436)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2
>   at java.net.URI.checkPath(URI.java:1804)
>   at java.net.URI.(URI.java:752)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:203)
>   ... 11 more
> {code}
> But theses commands worked:
> * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 
> /user/systest/d2}}
> * No {{-diff}}: {{hadoop distcp -update d1 d2}}
> However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 
> d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. 
> Trying to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9885) In distcp cmd ouput, Display name should be given for org.apache.hadoop.tools.mapred.CopyMapper$Counter.

2016-04-26 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259417#comment-15259417
 ] 

John Zhuge commented on HDFS-9885:
--

Can anyone commit this reviewed jira? Thanks.

> In distcp cmd ouput, Display name should be given for 
> org.apache.hadoop.tools.mapred.CopyMapper$Counter.
> 
>
> Key: HDFS-9885
> URL: https://issues.apache.org/jira/browse/HDFS-9885
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Archana T
>Assignee: Surendra Singh Lilhore
>Priority: Minor
> Attachments: HDFS-9885.001.patch, HDFS-9885.002.patch
>
>
> In distcp cmd output,
> hadoop distcp hdfs://NN1:port/file1 hdfs://NN2:port/file2
> 16/02/29 07:05:55 INFO tools.DistCp: DistCp job-id: job_1456729398560_0002
> 16/02/29 07:05:55 INFO mapreduce.Job: Running job: job_1456729398560_0002
> 16/02/29 07:06:01 INFO mapreduce.Job: Job job_1456729398560_0002 running in 
> uber mode : false
> 16/02/29 07:06:01 INFO mapreduce.Job: map 0% reduce 0%
> 16/02/29 07:06:06 INFO mapreduce.Job: map 100% reduce 0%
> 16/02/29 07:06:07 INFO mapreduce.Job: Job job_1456729398560_0002 completed 
> successfully
> ...
> ...
> File Input Format Counters
> Bytes Read=212
> File Output Format Counters
> Bytes Written=0{color:red} 
> org.apache.hadoop.tools.mapred.CopyMapper$Counter
> {color}
> BANDWIDTH_IN_BYTES=12418
> BYTESCOPIED=12418
> BYTESEXPECTED=12418
> COPY=1
> Expected:
> Display Name can be given instead of 
> {color:red}"org.apache.hadoop.tools.mapred.CopyMapper$Counter"{color}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10335:
-
Attachment: HDFS-10335.000.patch

> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-10335.000.patch
>
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-26 Thread ChenFolin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenFolin resolved HDFS-10322.
--
   Resolution: Fixed
Fix Version/s: 2.6.4

It is the same as HADOOP-11802 
https://issues.apache.org/jira/browse/HADOOP-11802.
I am not sure first, because i saw a runnable DomainSocketWatcher thread. 
Now, I have known that runnable DomainSocketWatcher thread  provide the webhdfs 
service, because if enable webhdfs, the datanode process may contains two 
DomainSocketWatcher threads. And now, I am sure another thread was done.  

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Fix For: 2.6.4
>
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-26 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259401#comment-15259401
 ] 

Mingliang Liu commented on HDFS-10335:
--

The code is as following:
{code}
boolean chooseTarget(DBlock db, Source source,
List targetTypes, Matcher matcher) {
  final NetworkTopology cluster = dispatcher.getCluster(); 
  for (StorageType t : targetTypes) {
for(StorageGroup target : storages.getTargetStorages(t)) {
  if (matcher.match(cluster, source.getDatanodeInfo(),
  target.getDatanodeInfo())) {
final PendingMove pm = source.addPendingMove(db, target);
if (pm != null) {
  dispatcher.executePendingMove(pm);
  return true;
}
  }
}
  }
  return false;
}
  }
{code}

To address this, we can pick a random matching storage group for the given 
storage type. One implementation is to shuffle the candidate target storages 
before iterating them. Will post a patch shortly.

> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10335:
-
Description: 
Currently the 
{{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
chooses the first matching target datanode from the candidate list. This may 
make the mover schedule a lot of task to a few of the datanodes (first several 
datanodes of the candidate list). The overall performance will suffer 
significantly from this because of the saturated network/disk usage. Specially, 
if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the scheduled move 
task will be queued on a few of the storage group, regardless of other 
available storage groups. We need an algorithm which can distribute the move 
tasks approximately even across all the candidate target storage groups.

Thanks [~szetszwo] for offline discussion.

  was:
Currently the 
{{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
chooses the first matching target datanode from the candidate list. This may 
make the mover schedule a lot of task to a few of the datanodes (first several 
datanodes of the candidate list). The overall performance will suffer 
significantly from this because of the saturated network/disk usage. Specially, 
if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the scheduled move 
task will be queued on a few of the datanodes, regardless of other available 
storage resources. We need an algorithm which can distribute the move tasks 
approximately even across all the candidate target datanodes (storages).

Thanks [~szetszwo] for offline discussion.


> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-10335 started by Mingliang Liu.

> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target datanode

2016-04-26 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-10335:


 Summary: Mover$Processor#chooseTarget() always chooses the first 
matching target datanode
 Key: HDFS-10335
 URL: https://issues.apache.org/jira/browse/HDFS-10335
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover
Affects Versions: 2.8.0
Reporter: Mingliang Liu
Assignee: Mingliang Liu
Priority: Critical


Currently the 
{{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
chooses the first matching target datanode from the candidate list. This may 
make the mover schedule a lot of task to a few of the datanodes (first several 
datanodes of the candidate list). The overall performance will suffer 
significantly from this because of the saturated network/disk usage. Specially, 
if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the scheduled move 
task will be queued on a few of the datanodes, regardless of other available 
storage resources. We need an algorithm which can distribute the move tasks 
approximately even across all the candidate target datanodes (storages).

Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10335:
-
Summary: Mover$Processor#chooseTarget() always chooses the first matching 
target storage group  (was: Mover$Processor#chooseTarget() always chooses the 
first matching target datanode)

> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the datanodes, regardless of 
> other available storage resources. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target datanodes 
> (storages).
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-26 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259384#comment-15259384
 ] 

Walter Su commented on HDFS-10220:
--

bq. I think it add some readability and also because it is used twice.
I only took a peek last time. Yeah, i'm ok with that.
Another problem when I go through the details,
{code}
while(!sortedLeases.isEmpty() && sortedLeases.peek().expiredHardLimit()
  && !isMaxLockHoldToReleaseLease(start)) {
  Lease leaseToCheck = sortedLeases.poll();
  ...
  Collection files = leaseToCheck.getFiles();
 ...
  for(Long id : leaseINodeIds) {
...
} finally {
  filesLeasesChecked++;
  if (isMaxLockHoldToReleaseLease(start)) {
LOG.debug("Breaking out of checkLeases() after " +
filesLeasesChecked + " file leases checked.");
break;
  }
  }
{code}
You can't just break the inside for-loop, the {{leaseToCheck}} has been polled 
out of the queue. This will cause some files won't be closed.

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, 
> threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-26 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259368#comment-15259368
 ] 

John Zhuge commented on HDFS-10297:
---

[~andrew.wang], I uploaded a patch for branch-2.8, please merge.

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.branch-2.8.patch, HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10201) Implement undo log in parity datanode for hflush operations

2016-04-26 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259341#comment-15259341
 ] 

GAO Rui commented on HDFS-10201:


[~liuml07], I have attached the demo patch of the new UndoLog design we have 
discussed last week. The point is keeping the undo log related handling inside 
of FlushUndoLogManager, and for the specific undo log of each respective 
internal parity block file, both the writer and reader access the undo log via 
the same instance of FlushUndoLog. By doing this way, the reader and writer 
could suffer from minimum affects, almost both writing and reading undo log is 
handled in datanode side, and nearly be transparent to writer and reader.
 

> Implement undo log in parity datanode for hflush operations
> ---
>
> Key: HDFS-10201
> URL: https://issues.apache.org/jira/browse/HDFS-10201
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-10201-demo.patch
>
>
> According to the current design doc for hflush support in erasure coding (see 
> [HDFS-7661]), the parity datanode (DN) needs an undo log for flush 
> operations. After hflush/hsync, the last cell will be overwritten when 1) the 
> current strip is full, 2) the file is closed, 3) or the hflush/hsync is 
> called again for the current non-full stripe. To serve new reader client and 
> to tolerate failures between successful hflush/hsync and overwrite operation, 
> the parity DN should preserve the old cell in the undo log before overwriting 
> it.
> As parities correspond to block group (BG) length and parity data of 
> different BG length may have the same block length, the undo log should also 
> save the respective block group (BG) length information for the flushed data.
> This jira is to track the effort of designing and implementing an undo log in 
> parity DN to support hflush/hsync operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10201) Implement undo log in parity datanode for hflush operations

2016-04-26 Thread GAO Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-10201:
---
Attachment: HDFS-10201-demo.patch

> Implement undo log in parity datanode for hflush operations
> ---
>
> Key: HDFS-10201
> URL: https://issues.apache.org/jira/browse/HDFS-10201
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-10201-demo.patch
>
>
> According to the current design doc for hflush support in erasure coding (see 
> [HDFS-7661]), the parity datanode (DN) needs an undo log for flush 
> operations. After hflush/hsync, the last cell will be overwritten when 1) the 
> current strip is full, 2) the file is closed, 3) or the hflush/hsync is 
> called again for the current non-full stripe. To serve new reader client and 
> to tolerate failures between successful hflush/hsync and overwrite operation, 
> the parity DN should preserve the old cell in the undo log before overwriting 
> it.
> As parities correspond to block group (BG) length and parity data of 
> different BG length may have the same block length, the undo log should also 
> save the respective block group (BG) length information for the flushed data.
> This jira is to track the effort of designing and implementing an undo log in 
> parity DN to support hflush/hsync operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10324) Trash directory in an encryption zone should be pre-created with sticky bit

2016-04-26 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259311#comment-15259311
 ] 

Xiaoyu Yao commented on HDFS-10324:
---

Thanks [~andrew.wang] and [~jojochuang]. I agree with you analysis that we 
probably don't need a provisionTrash API separately. I mean to have an overload 
version of HdfsAdmin#createEncryptionZone with a provisionTrash parameter and 
switch crypto CLI to call the new API. This way, we can deprecate the existing 
API that does not create .Trash with permissions in the next few releases. Give 
this will be an Admin wrapper API over DFS API with opt-out parameter, I think 
it is an acceptable solution to save future document/support cost. What do you 
think?

> Trash directory in an encryption zone should be pre-created with sticky bit
> ---
>
> Key: HDFS-10324
> URL: https://issues.apache.org/jira/browse/HDFS-10324
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.8.0
> Environment: CDH5.7.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10324.001.patch, HDFS-10324.002.patch, 
> HDFS-10324.003.patch
>
>
> We encountered a bug in HDFS-8831:
> After HDFS-8831, a deleted file in an encryption zone is moved to a .Trash 
> subdirectory within the encryption zone.
> However, if this .Trash subdirectory is not created beforehand, it will be 
> created and owned by the first user who deleted a file, with permission 
> drwx--. This creates a serious bug because any other non-privileged user 
> will not be able to delete any files within the encryption zone, because they 
> do not have the permission to move directories to the trash directory.
> We should fix this bug, by pre-creating the .Trash directory with sticky bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10334) trunk test failures: TestHFlush#testHFlushInterrupted and TestFsDatasetImpl#testCleanShutdownOfVolume

2016-04-26 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259303#comment-15259303
 ] 

Xiaobing Zhou commented on HDFS-10334:
--

Thanks [~jojochuang] for the updates. I did a search, there a bunch of JIRAs 
related to testHFlushInterrupted failure.
HDFS-2043, HDFS-3041 and HDFS-4504

> trunk test failures: TestHFlush#testHFlushInterrupted and 
> TestFsDatasetImpl#testCleanShutdownOfVolume
> -
>
> Key: HDFS-10334
> URL: https://issues.apache.org/jira/browse/HDFS-10334
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Xiaobing Zhou
>Priority: Critical
>
> It's been noticed these tests failed on trunk:
> TestHFlush#testHFlushInterrupted TestFsDatasetImpl#testCleanShutdownOfVolume
> {noformat}
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.417 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestHFlush
> testHFlushInterrupted(org.apache.hadoop.hdfs.TestHFlush)  Time elapsed: 0.646 
> sec  <<< ERROR!
> java.nio.channels.ClosedByInterruptException: null
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:653)
> {noformat}
> {noformat}
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 14.438 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
> testCleanShutdownOfVolume(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
>   Time elapsed: 6.882 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testCleanShutdownOfVolume(TestFsDatasetImpl.java:683)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10331) Use java.util.zip.CRC32 for java8 or above in libhadoop

2016-04-26 Thread He Tianyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Tianyi updated HDFS-10331:
-
Description: 
In java8, performance of intrinsic CRC32 has been dramatically improved.
See: https://bugs.openjdk.java.net/browse/JDK-7088419

I carried an in-memory benchmark of throughput, on a server with two E5-2630 v2 
cpus, results:
(single threaded)
java7  java.util.zip.CRC32: 0.81GB/s
hdfs DataChecksum, native: 1.46GB/s
java8  java.util.zip.CRC32: 2.39GB/s
hdfs DataChecksum, CRC32 on java8: 2.39GB/s

IMHO I think we could either:
A) provide a configuration for user to switch CRC32 implementations;
or B) On java8 or above, always use intrinsic CRC32.


  was:
In java8, performance of intrinsic CRC32 has been dramatically improved.
See: https://bugs.openjdk.java.net/browse/JDK-7088419

I carried an in-memory benchmark of throughput, on a server with two E5-2630 v2 
cpus, results:
java7  java.util.zip.CRC32: 0.81GB/s
hdfs DataChecksum, native: 1.46GB/s
java8  java.util.zip.CRC32: 2.39GB/s
hdfs DataChecksum, CRC32 on java8: 2.39GB/s

IMHO I think we could either:
A) provide a configuration for user to switch CRC32 implementations;
or B) On java8 or above, always use intrinsic CRC32.



> Use java.util.zip.CRC32 for java8 or above in libhadoop
> ---
>
> Key: HDFS-10331
> URL: https://issues.apache.org/jira/browse/HDFS-10331
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs, hdfs-client
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>
> In java8, performance of intrinsic CRC32 has been dramatically improved.
> See: https://bugs.openjdk.java.net/browse/JDK-7088419
> I carried an in-memory benchmark of throughput, on a server with two E5-2630 
> v2 cpus, results:
> (single threaded)
> java7  java.util.zip.CRC32: 0.81GB/s
> hdfs DataChecksum, native: 1.46GB/s
> java8  java.util.zip.CRC32: 2.39GB/s
> hdfs DataChecksum, CRC32 on java8: 2.39GB/s
> IMHO I think we could either:
> A) provide a configuration for user to switch CRC32 implementations;
> or B) On java8 or above, always use intrinsic CRC32.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10334) trunk test failures: TestHFlush#testHFlushInterrupted and TestFsDatasetImpl#testCleanShutdownOfVolume

2016-04-26 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259293#comment-15259293
 ] 

Wei-Chiu Chuang commented on HDFS-10334:


Thanks [~xiaobingo] for reporting the test failures.
Please see HDFS-10260 for Rushabh's patch for 
TestFsDatasetImpl#testCleanShutdownOfVolume

> trunk test failures: TestHFlush#testHFlushInterrupted and 
> TestFsDatasetImpl#testCleanShutdownOfVolume
> -
>
> Key: HDFS-10334
> URL: https://issues.apache.org/jira/browse/HDFS-10334
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Xiaobing Zhou
>Priority: Critical
>
> It's been noticed these tests failed on trunk:
> TestHFlush#testHFlushInterrupted TestFsDatasetImpl#testCleanShutdownOfVolume
> {noformat}
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.417 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestHFlush
> testHFlushInterrupted(org.apache.hadoop.hdfs.TestHFlush)  Time elapsed: 0.646 
> sec  <<< ERROR!
> java.nio.channels.ClosedByInterruptException: null
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:653)
> {noformat}
> {noformat}
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 14.438 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
> testCleanShutdownOfVolume(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
>   Time elapsed: 6.882 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testCleanShutdownOfVolume(TestFsDatasetImpl.java:683)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10304) Implement moveToLocal

2016-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259291#comment-15259291
 ] 

Hadoop QA commented on HDFS-10304:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 54s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 52s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 55s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-common-project/hadoop-common: patch generated 1 
new + 12 unchanged - 0 fixed = 13 total (was 12) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 51s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_92. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 37s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 37s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_92 Failed junit tests | hadoop.cli.TestCLI |
| JDK v1.7.0_95 Failed junit tests | hadoop.cli.TestCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800913/HDFS-10304.001.patch |
| JIRA Issue | HDFS-10304 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 7e78b77fab4d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| 

[jira] [Commented] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259285#comment-15259285
 ] 

Hudson commented on HDFS-10224:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9674 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9674/])
HDFS-10224. Implement asynchronous rename for DistributedFileSystem.  
(szetszwo: rev fc94810d3f537e51e826fc21ade7867892b9d8dc)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/AsyncDistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestAsyncIPC.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestAsyncDFSRename.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java


> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Fix For: 2.8.0
>
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-HDFS-9924.004.patch, 
> HDFS-10224-HDFS-9924.005.patch, HDFS-10224-HDFS-9924.006.patch, 
> HDFS-10224-HDFS-9924.007.patch, HDFS-10224-HDFS-9924.008.patch, 
> HDFS-10224-HDFS-9924.009.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-26 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259284#comment-15259284
 ] 

Konstantin Shvachko commented on HDFS-10301:


May be we should go with a different approach for this problem.
- *The problem.* NameNode thinks that the reporting DN has the following set of 
storages . But the DataNode reports  , because one of 
its drives was replaced, reformatted, or taken out of service. The NameNode 
should update the list of storages to the ones reported by the DataNode, 
potentially removing some of them.
- *Constraints.* Single block report can be split into multiple RPCs. Within 
single block-report-RPC NameNode processes each storage under a lock, but then 
releases and re-acquires the lock for the next storage, so that multiple RPC 
reports can interleave due to multi-threading.
- *Approach.* DN should report full list of its storages in the first 
block-report-RPC. The NameNode first cleans up unreported storages and replicas 
belonging them, then start processing the rest of block reports as usually.
So DataNodes explicitly report storages that they have, which eliminates 
NameNode guessing, which storage is the last in the block report RPC.

I did not look if any changes in the RPC message structure are needed, but it 
think that all necessary fields should be already present.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-26 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259278#comment-15259278
 ] 

Konstantin Shvachko commented on HDFS-10301:


Hey Colin, I reviewed your patch more thoroughly. There is still a problem with 
interleaving reports. See {{updateBlockReportContext()}}. Suppose that block 
reports interleave like this: . Then br1-s2 
will reset {{curBlockReportRpcsSeen}} since {{curBlockReportId}} is not the 
same as in the report, which will discard the bit set for s1 in br2-s1, and the 
count of {{rpcsSeen = 0}} will be wrong for br2-s2. So possibly unreported 
(zombie) storages will not be removed. LMK if you see what I see.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10334) TestHFlush#testHFlushInterrupted and TestFsDatasetImpl#testCleanShutdownOfVolume failed on trunk

2016-04-26 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10334:
-
Description: 
It's been noticed these tests failed on trunk:
TestHFlush#testHFlushInterrupted TestFsDatasetImpl#testCleanShutdownOfVolume

{noformat}
Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.417 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestHFlush
testHFlushInterrupted(org.apache.hadoop.hdfs.TestHFlush)  Time elapsed: 0.646 
sec  <<< ERROR!
java.nio.channels.ClosedByInterruptException: null
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:653)
{noformat}

{noformat}
Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 14.438 sec <<< 
FAILURE! - in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
testCleanShutdownOfVolume(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
  Time elapsed: 6.882 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testCleanShutdownOfVolume(TestFsDatasetImpl.java:683)
{noformat}

  was:
It's been noticed these tests failed on trunk:
TestHFlush#testHFlushInterrupted TestFsDatasetImpl#testCleanShutdownOfVolume

{format}
Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.417 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestHFlush
testHFlushInterrupted(org.apache.hadoop.hdfs.TestHFlush)  Time elapsed: 0.646 
sec  <<< ERROR!
java.nio.channels.ClosedByInterruptException: null
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:653)
{format}

{format}
Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 14.438 sec <<< 
FAILURE! - in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
testCleanShutdownOfVolume(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
  Time elapsed: 6.882 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testCleanShutdownOfVolume(TestFsDatasetImpl.java:683)
{format}


> TestHFlush#testHFlushInterrupted and 
> TestFsDatasetImpl#testCleanShutdownOfVolume failed on trunk
> 
>
> Key: HDFS-10334
> URL: https://issues.apache.org/jira/browse/HDFS-10334
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Xiaobing Zhou
>Priority: Critical
>
> It's been noticed these tests failed on trunk:
> TestHFlush#testHFlushInterrupted TestFsDatasetImpl#testCleanShutdownOfVolume
> {noformat}
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.417 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestHFlush
> testHFlushInterrupted(org.apache.hadoop.hdfs.TestHFlush)  Time elapsed: 0.646 
> sec  <<< ERROR!
> java.nio.channels.ClosedByInterruptException: null
>   at 
> 

[jira] [Updated] (HDFS-10334) trunk test failures: TestHFlush#testHFlushInterrupted and TestFsDatasetImpl#testCleanShutdownOfVolume

2016-04-26 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10334:
-
Summary: trunk test failures: TestHFlush#testHFlushInterrupted and 
TestFsDatasetImpl#testCleanShutdownOfVolume  (was: 
TestHFlush#testHFlushInterrupted and 
TestFsDatasetImpl#testCleanShutdownOfVolume failed on trunk)

> trunk test failures: TestHFlush#testHFlushInterrupted and 
> TestFsDatasetImpl#testCleanShutdownOfVolume
> -
>
> Key: HDFS-10334
> URL: https://issues.apache.org/jira/browse/HDFS-10334
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Xiaobing Zhou
>Priority: Critical
>
> It's been noticed these tests failed on trunk:
> TestHFlush#testHFlushInterrupted TestFsDatasetImpl#testCleanShutdownOfVolume
> {noformat}
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.417 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestHFlush
> testHFlushInterrupted(org.apache.hadoop.hdfs.TestHFlush)  Time elapsed: 0.646 
> sec  <<< ERROR!
> java.nio.channels.ClosedByInterruptException: null
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>   at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:653)
> {noformat}
> {noformat}
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 14.438 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
> testCleanShutdownOfVolume(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
>   Time elapsed: 6.882 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testCleanShutdownOfVolume(TestFsDatasetImpl.java:683)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10334) TestHFlush#testHFlushInterrupted and TestFsDatasetImpl#testCleanShutdownOfVolume failed on trunk

2016-04-26 Thread Xiaobing Zhou (JIRA)
Xiaobing Zhou created HDFS-10334:


 Summary: TestHFlush#testHFlushInterrupted and 
TestFsDatasetImpl#testCleanShutdownOfVolume failed on trunk
 Key: HDFS-10334
 URL: https://issues.apache.org/jira/browse/HDFS-10334
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Xiaobing Zhou
Priority: Critical


It's been noticed these tests failed on trunk:
TestHFlush#testHFlushInterrupted TestFsDatasetImpl#testCleanShutdownOfVolume

{format}
Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 25.417 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestHFlush
testHFlushInterrupted(org.apache.hadoop.hdfs.TestHFlush)  Time elapsed: 0.646 
sec  <<< ERROR!
java.nio.channels.ClosedByInterruptException: null
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:653)
{format}

{format}
Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 14.438 sec <<< 
FAILURE! - in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
testCleanShutdownOfVolume(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
  Time elapsed: 6.882 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testCleanShutdownOfVolume(TestFsDatasetImpl.java:683)
{format}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10331) Use java.util.zip.CRC32 for java8 or above in libhadoop

2016-04-26 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-10331:
---
Summary: Use java.util.zip.CRC32 for java8 or above in libhadoop  (was: Use 
java.util.zip.CRC32 for checksum in java8 or above)

(Revised Summary.)

> Use java.util.zip.CRC32 for java8 or above in libhadoop
> ---
>
> Key: HDFS-10331
> URL: https://issues.apache.org/jira/browse/HDFS-10331
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs, hdfs-client
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>
> In java8, performance of intrinsic CRC32 has been dramatically improved.
> See: https://bugs.openjdk.java.net/browse/JDK-7088419
> I carried an in-memory benchmark of throughput, on a server with two E5-2630 
> v2 cpus, results:
> java7  java.util.zip.CRC32: 0.81GB/s
> hdfs DataChecksum, native: 1.46GB/s
> java8  java.util.zip.CRC32: 2.39GB/s
> hdfs DataChecksum, CRC32 on java8: 2.39GB/s
> IMHO I think we could either:
> A) provide a configuration for user to switch CRC32 implementations;
> or B) On java8 or above, always use intrinsic CRC32.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-26 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-10224:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Xiaobing!

> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Fix For: 2.8.0
>
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-HDFS-9924.004.patch, 
> HDFS-10224-HDFS-9924.005.patch, HDFS-10224-HDFS-9924.006.patch, 
> HDFS-10224-HDFS-9924.007.patch, HDFS-10224-HDFS-9924.008.patch, 
> HDFS-10224-HDFS-9924.009.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-26 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259234#comment-15259234
 ] 

Mingliang Liu commented on HDFS-10175:
--

Either tomorrow or next week will work for me. Thanks.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10313) Distcp need to enforce the order of snapshot names passed to -diff

2016-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259233#comment-15259233
 ] 

Hudson commented on HDFS-10313:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9673 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9673/])
HDFS-10313. Distcp need to enforce the order of snapshot names passed to 
(yzhang: rev 959a28dd1216dfac78d05b438828e8503108d963)
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java


> Distcp need to enforce the order of snapshot names passed to -diff
> --
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10324) Trash directory in an encryption zone should be pre-created with sticky bit

2016-04-26 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259231#comment-15259231
 ] 

Andrew Wang commented on HDFS-10324:


Thanks for the discussion all, and Wei-Chiu for the patch.

HdfsAdmin is a public API, so I don't think we should modify createZone there 
to also provisionTrash. It'd mean we have no way of creating a zone without a 
Trash directory (the old behavior) and also gets into my earlier concerns about 
atomicity.

We can revisit this later if we find we need Java API access to provisionTrash, 
but that should be rare since you can just do it with mkdir and chmod.

Review comments:

* Shall we error if provisionTrash isn't called on an EZ root? That more 
clearly reflects that the Trash directory is an EZ-level construct.
* If a Trash directory already exists, it might already be set up properly, so 
it seems harsh to tell the user to rename it. Instead, how about e.g. "Will not 
provision new trash directory for encryption zone /ez/, path already exists."
* We can get fancy with the above, and print additional warnings if .Trash is a 
file, or the permissions are set wrong (and what the right permissions are).
* In the md file, let's not add a step that will return an error return code to 
the example usage. Instead, we should add some help text to the "createZone" 
command, and update the md file too.
* The provisionTrash help description also needs to be added to the md file.
* We can talk about the pre-2.8.0 behavior, sticky bit, and provisionTrash in 
the "Rename and Trash considerations" section.
* One of the unit tests should verify that the sticky bit is set on a 
provisioned Trash directory.

> Trash directory in an encryption zone should be pre-created with sticky bit
> ---
>
> Key: HDFS-10324
> URL: https://issues.apache.org/jira/browse/HDFS-10324
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.8.0
> Environment: CDH5.7.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10324.001.patch, HDFS-10324.002.patch, 
> HDFS-10324.003.patch
>
>
> We encountered a bug in HDFS-8831:
> After HDFS-8831, a deleted file in an encryption zone is moved to a .Trash 
> subdirectory within the encryption zone.
> However, if this .Trash subdirectory is not created beforehand, it will be 
> created and owned by the first user who deleted a file, with permission 
> drwx--. This creates a serious bug because any other non-privileged user 
> will not be able to delete any files within the encryption zone, because they 
> do not have the permission to move directories to the trash directory.
> We should fix this bug, by pre-creating the .Trash directory with sticky bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259223#comment-15259223
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

Hi [~steve_l], does 10:30AM work tomorrow?  Unfortunately I'll be out on 
Thursday and most of Friday, so if we can't do tomorrow we'd have to do Friday 
afternoon or early next week.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10304) Implement moveToLocal

2016-04-26 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259208#comment-15259208
 ] 

Xiaobing Zhou commented on HDFS-10304:
--

Thank you for review [~steve_l]. The patch v001 is posted. It addressed all the 
comments above. Post-process should be called only on success, 
Command#processPaths makes sure that. If IOException is incurred due to 
whatever reason, postProcessPath won't be executed.

> Implement moveToLocal
> -
>
> Key: HDFS-10304
> URL: https://issues.apache.org/jira/browse/HDFS-10304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Xiaobing Zhou
>Priority: Minor
> Attachments: HDFS-10304.000.patch, HDFS-10304.001.patch
>
>
> if you get the usage list of {{hdfs dfs}} it tells you of "-moveToLocal". 
> If you try to use the command, it tells you off "Option '-moveToLocal' is not 
> implemented yet."
> Either the command should be implemented, or it should be removed from the 
> usage list, as it is not technically a command you can use, except in the 
> special case of "I want my shell to print "not implemented yet""



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp need to enforce the order of snapshot names passed to -diff

2016-04-26 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-10313:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

I have committed to trunk, branch-2, branch-2.8. I also cherry-picked 
HDFS-10216 to branch-2.8 for cherry-pick cleanness.

Thanks [~linyiqun] for the contribution, and  [~jingzhao] for comments.





> Distcp need to enforce the order of snapshot names passed to -diff
> --
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10304) Implement moveToLocal

2016-04-26 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10304:
-
Attachment: HDFS-10304.001.patch

> Implement moveToLocal
> -
>
> Key: HDFS-10304
> URL: https://issues.apache.org/jira/browse/HDFS-10304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Xiaobing Zhou
>Priority: Minor
> Attachments: HDFS-10304.000.patch, HDFS-10304.001.patch
>
>
> if you get the usage list of {{hdfs dfs}} it tells you of "-moveToLocal". 
> If you try to use the command, it tells you off "Option '-moveToLocal' is not 
> implemented yet."
> Either the command should be implemented, or it should be removed from the 
> usage list, as it is not technically a command you can use, except in the 
> special case of "I want my shell to print "not implemented yet""



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10324) Trash directory in an encryption zone should be pre-created with sticky bit

2016-04-26 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259195#comment-15259195
 ] 

Wei-Chiu Chuang commented on HDFS-10324:


It is true that this API is used only in unit tests in Hadoop, but downstream 
project might use it. What do we typically consider when changing public API?

> Trash directory in an encryption zone should be pre-created with sticky bit
> ---
>
> Key: HDFS-10324
> URL: https://issues.apache.org/jira/browse/HDFS-10324
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.8.0
> Environment: CDH5.7.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10324.001.patch, HDFS-10324.002.patch, 
> HDFS-10324.003.patch
>
>
> We encountered a bug in HDFS-8831:
> After HDFS-8831, a deleted file in an encryption zone is moved to a .Trash 
> subdirectory within the encryption zone.
> However, if this .Trash subdirectory is not created beforehand, it will be 
> created and owned by the first user who deleted a file, with permission 
> drwx--. This creates a serious bug because any other non-privileged user 
> will not be able to delete any files within the encryption zone, because they 
> do not have the permission to move directories to the trash directory.
> We should fix this bug, by pre-creating the .Trash directory with sticky bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10216) distcp -diff relative path exception

2016-04-26 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-10216:
-
Fix Version/s: (was: 2.9.0)
   2.8.0

Thank you guys for the work here. I just cherry-picked it to branch-2.8.


> distcp -diff relative path exception
> 
>
> Key: HDFS-10216
> URL: https://issues.apache.org/jira/browse/HDFS-10216
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.8.0
>Reporter: John Zhuge
>Assignee: Takashi Ohnishi
> Fix For: 2.8.0
>
> Attachments: HDFS-10216.1.patch, HDFS-10216.2.patch, 
> HDFS-10216.3.patch, HDFS-10216.4.patch
>
>
> Got this exception when running {{distcp -diff}} with relative paths:
> {code}
> $ hadoop distcp -update -diff s1 s2 d1 d2
> 16/03/25 09:45:40 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[d1], 
> targetPath=d2, targetPathExists=true, preserveRawXattrs=false, 
> filtersFile='null'}
> 16/03/25 09:45:40 INFO client.RMProxy: Connecting to ResourceManager at 
> jzhuge-balancer-1.vpc.cloudera.com/172.26.21.70:8032
> 16/03/25 09:45:41 ERROR tools.DistCp: Exception encountered 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: 
> hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2
>   at org.apache.hadoop.fs.Path.initialize(Path.java:206)
>   at org.apache.hadoop.fs.Path.(Path.java:197)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.getPathWithSchemeAndAuthority(SimpleCopyListing.java:193)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.addToFileListing(SimpleCopyListing.java:202)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListingWithSnapshotDiff(SimpleCopyListing.java:243)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:172)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.DistCp.createInputFileListingWithDiff(DistCp.java:388)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:164)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:123)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:436)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> hdfs://jzhuge-balancer-1.vpc.cloudera.com:8020./d1/.snapshot/s2
>   at java.net.URI.checkPath(URI.java:1804)
>   at java.net.URI.(URI.java:752)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:203)
>   ... 11 more
> {code}
> But theses commands worked:
> * Absolute path: {{hadoop distcp -update -diff s1 s2 /user/systest/d1 
> /user/systest/d2}}
> * No {{-diff}}: {{hadoop distcp -update d1 d2}}
> However, everything was fine when I ran {{hadoop distcp -update -diff s1 s2 
> d1 d2}} again. I am not sure the problem only exists with option {{-diff}}. 
> Trying to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10331) Use java.util.zip.CRC32 for checksum in java8 or above

2016-04-26 Thread He Tianyi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259182#comment-15259182
 ] 

He Tianyi commented on HDFS-10331:
--

Yes, okay. Now I see that for array-backed buffers, native implementation in 
libhadoop is not used.

> Use java.util.zip.CRC32 for checksum in java8 or above
> --
>
> Key: HDFS-10331
> URL: https://issues.apache.org/jira/browse/HDFS-10331
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs, hdfs-client
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>
> In java8, performance of intrinsic CRC32 has been dramatically improved.
> See: https://bugs.openjdk.java.net/browse/JDK-7088419
> I carried an in-memory benchmark of throughput, on a server with two E5-2630 
> v2 cpus, results:
> java7  java.util.zip.CRC32: 0.81GB/s
> hdfs DataChecksum, native: 1.46GB/s
> java8  java.util.zip.CRC32: 2.39GB/s
> hdfs DataChecksum, CRC32 on java8: 2.39GB/s
> IMHO I think we could either:
> A) provide a configuration for user to switch CRC32 implementations;
> or B) On java8 or above, always use intrinsic CRC32.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259179#comment-15259179
 ] 

Hadoop QA commented on HDFS-10224:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 56s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 19s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 11s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
8s {color} | {color:green} root: patch generated 0 new + 273 unchanged - 1 
fixed = 273 total (was 274) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 13s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_92. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_92. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 26s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_92. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 14s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 41s {color} 
| {color:red} hadoop-hdfs in the 

[jira] [Commented] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines

2016-04-26 Thread He Tianyi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259173#comment-15259173
 ] 

He Tianyi commented on HDFS-10326:
--

[~cmccabe] [~mingma]
https://www.kernel.org/doc/ols/2009/ols2009-pages-169-184.pdf
This document suggests auto tuning is not introduced in Linux 2.4 before 2.4.27 
or Linux 2.6 before 2.6.7.
That's very old.

So maybe it's appropriate to enable auto tuning by default.

> Disable setting tcp socket send/receive buffers for write pipelines
> ---
>
> Key: HDFS-10326
> URL: https://issues.apache.org/jira/browse/HDFS-10326
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>
> The DataStreamer and the Datanode use a hardcoded 
> DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write 
> pipeline.  Explicitly setting tcp buffer sizes disables tcp stack 
> auto-tuning.  
> The hardcoded value will saturate a 1Gb with 1ms RTT.  105Mbs at 10ms.  
> Paltry 11Mbs over a 100ms long haul.  10Gb networks are underutilized.
> There should either be a configuration to completely disable setting the 
> buffers, or the the setReceiveBuffer and setSendBuffer should be removed 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp need to enforce the order of snapshot names passed to -diff

2016-04-26 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-10313:
-
Summary: Distcp need to enforce the order of snapshot names passed to -diff 
 (was: Distcp does not check the order of snapshot names passed to -diff)

> Distcp need to enforce the order of snapshot names passed to -diff
> --
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10324) Trash directory in an encryption zone should be pre-created with sticky bit

2016-04-26 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259118#comment-15259118
 ] 

Xiaoyu Yao commented on HDFS-10324:
---

Thanks [~jojochuang] for update the patch. The patch v003 looks pretty good to 
me. 

One last comment: Do you want to update HdfsAdmin API/document for 
HdfsAdmin#createEncryptionZone() as well? Though I only find callers of this 
public API from unit tests, it seems to be a good place to implement permission 
change logic based on the class description below.

{code}
HDFSAdmin.java
 * The public API for performing administrative functions on HDFS. Those writing
 * applications against HDFS should prefer this interface to directly accessing
 * functionality in DistributedFileSystem or DFSClient.
{code} 

If we decide not to change the Admin API, I still suggest an update to Javadocs 
of HdfsAdmin#createEncryptionZone() with the necessary permission changes for 
Trash of encryption zone support. 
 

> Trash directory in an encryption zone should be pre-created with sticky bit
> ---
>
> Key: HDFS-10324
> URL: https://issues.apache.org/jira/browse/HDFS-10324
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.8.0
> Environment: CDH5.7.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10324.001.patch, HDFS-10324.002.patch, 
> HDFS-10324.003.patch
>
>
> We encountered a bug in HDFS-8831:
> After HDFS-8831, a deleted file in an encryption zone is moved to a .Trash 
> subdirectory within the encryption zone.
> However, if this .Trash subdirectory is not created beforehand, it will be 
> created and owned by the first user who deleted a file, with permission 
> drwx--. This creates a serious bug because any other non-privileged user 
> will not be able to delete any files within the encryption zone, because they 
> do not have the permission to move directories to the trash directory.
> We should fix this bug, by pre-creating the .Trash directory with sticky bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10330) Add Corrupt Blocks Information in Metasave Output

2016-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259055#comment-15259055
 ] 

Hadoop QA commented on HDFS-10330:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 15s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_92. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 21s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 20s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_92 Failed junit tests | 
hadoop.hdfs.server.datanode.TestLargeBlockReport |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.TestFileAppend |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.TestDistributedFileSystem |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800857/HDFS-10330.002.patch |
| JIRA Issue | HDFS-10330 |
| Optional Tests |  asflicense  compile  javac  javadoc  

[jira] [Updated] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-26 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-10297:
--
Attachment: HDFS-10297.003.branch-2.8.patch

Upload HDFS-10297.003.branch-2.8.patch for branch 2.8.

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.branch-2.8.patch, HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-26 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259009#comment-15259009
 ] 

John Zhuge commented on HDFS-10297:
---

Thanks [~rchiang].

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-26 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259004#comment-15259004
 ] 

Yongjun Zhang commented on HDFS-9732:
-

Thanks a lot [~ste...@apache.org]!

I reran all the failed tests locally and they are all successful, except that 
TestFileAppend failure is intermittent and I created HDFS-10333 for it.

Would you please see if you could +1 on the last rev you reviewed unless [~aw] 
has additional comments?

Thanks again.



> Remove DelegationTokenIdentifier.toString() —for better logging output
> --
>
> Key: HDFS-9732
> URL: https://issues.apache.org/jira/browse/HDFS-9732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Yongjun Zhang
> Attachments: HADOOP-12752-001.patch, HDFS-9732-000.patch, 
> HDFS-9732.001.patch, HDFS-9732.002.patch, HDFS-9732.003.patch, 
> HDFS-9732.004.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> HDFS {{DelegationTokenIdentifier.toString()}} adds some diagnostics info, 
> owner, sequence number. But its superclass,  
> {{AbstractDelegationTokenIdentifier}} contains a lot more information, 
> including token issue and expiry times.
> Because  {{DelegationTokenIdentifier.toString()}} doesn't include this data,
> information that is potentially useful for kerberos diagnostics is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-26 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259005#comment-15259005
 ] 

Ray Chiang commented on HDFS-10297:
---

Yeah, I only did trunk and branch-2 versions for HDFS-8356.  I didn't go 
backwards compatible with all the branch-2 minor versions.

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10333) Intermittent org.apache.hadoop.hdfs.TestFileAppend failure in trunk

2016-04-26 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10333:


 Summary: Intermittent org.apache.hadoop.hdfs.TestFileAppend 
failure in trunk
 Key: HDFS-10333
 URL: https://issues.apache.org/jira/browse/HDFS-10333
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Yongjun Zhang


Java8 (I used JAVA_HOME=/opt/toolchain/jdk1.8.0_25):

{code}
--
 T E S T S
---
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
support was removed in 8.0
Running org.apache.hadoop.hdfs.TestFileAppend
Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 27.75 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestFileAppend
testMultipleAppends(org.apache.hadoop.hdfs.TestFileAppend)  Time elapsed: 3.674 
sec  <<< ERROR!
java.io.IOException: Failed to replace a bad datanode on the existing pipeline 
due to no more good datanodes being available to try. (Nodes: 
current=[DatanodeInfoWithStorage[127.0.0.1:43067,DS-cf80da41-3697-4afa-8f89-93693cd5035d,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:32946,DS-3b08422c-959e-42f0-a624-91b2524c4371,DISK]],
 
original=[DatanodeInfoWithStorage[127.0.0.1:43067,DS-cf80da41-3697-4afa-8f89-93693cd5035d,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:32946,DS-3b08422c-959e-42f0-a624-91b2524c4371,DISK]]).
 The current failed datanode replacement policy is DEFAULT, and a client may 
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' 
in its configuration.
at 
org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1166)
at 
org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232)
at 
org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423)
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599)


{code}

However, when I run with Java1.7, the test is sometimes successful, and it 
sometimes fails with 
{code}
Tests run: 12, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 41.32 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestFileAppend
testMultipleAppends(org.apache.hadoop.hdfs.TestFileAppend)  Time elapsed: 9.099 
sec  <<< ERROR!
java.io.IOException: Failed to replace a bad datanode on the existing pipeline 
due to no more good datanodes being available to try. (Nodes: 
current=[DatanodeInfoWithStorage[127.0.0.1:49006,DS-498240fa-d1c7-4ba1-b97e-a1761cbbefa5,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:43097,DS-b83b49ce-fc14-4b9e-a3fc-7df2cd9fc753,DISK]],
 
original=[DatanodeInfoWithStorage[127.0.0.1:49006,DS-498240fa-d1c7-4ba1-b97e-a1761cbbefa5,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:43097,DS-b83b49ce-fc14-4b9e-a3fc-7df2cd9fc753,DISK]]).
 The current failed datanode replacement policy is DEFAULT, and a client may 
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' 
in its configuration.
at 
org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1162)
at 
org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1232)
at 
org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1423)
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1321)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:599)

{code}


The failure of this test is intermittent, but it fails pretty often.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-26 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258988#comment-15258988
 ] 

John Zhuge commented on HDFS-10297:
---

[~rchiang]'s HDFS-8356 added the missing properties. Is it applicable to 2.8 
and 2.7? My guess is no.

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-26 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258978#comment-15258978
 ] 

John Zhuge commented on HDFS-10297:
---

Wow, so many properties of file {{hdfs-default.xml}} in branch-2 (line 2972 to 
3984) are NOT in branch-2.8 or 2.7 including 
{{dfs.datanode.balance.max.concurrent.moves}}?!

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines

2016-04-26 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258948#comment-15258948
 ] 

Ming Ma commented on HDFS-10326:


Nevermind about the backward compatibility comment, HDFS-9259 was added to 2.8 
which hasn't been released yet.

> Disable setting tcp socket send/receive buffers for write pipelines
> ---
>
> Key: HDFS-10326
> URL: https://issues.apache.org/jira/browse/HDFS-10326
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>
> The DataStreamer and the Datanode use a hardcoded 
> DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write 
> pipeline.  Explicitly setting tcp buffer sizes disables tcp stack 
> auto-tuning.  
> The hardcoded value will saturate a 1Gb with 1ms RTT.  105Mbs at 10ms.  
> Paltry 11Mbs over a 100ms long haul.  10Gb networks are underutilized.
> There should either be a configuration to completely disable setting the 
> buffers, or the the setReceiveBuffer and setSendBuffer should be removed 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-26 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258924#comment-15258924
 ] 

John Zhuge commented on HDFS-10297:
---

Thank you [~andrew.wang]. I will create a patch for 2.8.

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10331) Use java.util.zip.CRC32 for checksum in java8 or above

2016-04-26 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258885#comment-15258885
 ] 

Tsz Wo Nicholas Sze commented on HDFS-10331:


It seems that we are already using java.util.zip.CRC32 for Java 7 or above, 
including Java 8.
{code}
//org.apache.hadoop.util.DataChecksum
  public static Checksum newCrc32() {
return Shell.isJava7OrAbove()? new CRC32(): new PureJavaCrc32();
  }
{code}


> Use java.util.zip.CRC32 for checksum in java8 or above
> --
>
> Key: HDFS-10331
> URL: https://issues.apache.org/jira/browse/HDFS-10331
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs, hdfs-client
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>
> In java8, performance of intrinsic CRC32 has been dramatically improved.
> See: https://bugs.openjdk.java.net/browse/JDK-7088419
> I carried an in-memory benchmark of throughput, on a server with two E5-2630 
> v2 cpus, results:
> java7  java.util.zip.CRC32: 0.81GB/s
> hdfs DataChecksum, native: 1.46GB/s
> java8  java.util.zip.CRC32: 2.39GB/s
> hdfs DataChecksum, CRC32 on java8: 2.39GB/s
> IMHO I think we could either:
> A) provide a configuration for user to switch CRC32 implementations;
> or B) On java8 or above, always use intrinsic CRC32.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258859#comment-15258859
 ] 

Hudson commented on HDFS-10297:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9672 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9672/])
HDFS-10297. Increase default balance bandwidth and concurrent moves. (wang: rev 
6be22ddbf1b6f3c19ec70c63ddb8f5519d18dd72)
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java


> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-26 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-10297:
---
Fix Version/s: 2.9.0

Committed to trunk and branch-2, branch-2.8 wasn't clean. John, if you want it 
in 2.8, do you mind providing a branch-2.8 patch? Else we can just resolve.

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-26 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258821#comment-15258821
 ] 

Andrew Wang commented on HDFS-10297:


LGTM +1, will commit shortly.

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10330) Add Corrupt Blocks Information in Metasave Output

2016-04-26 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated HDFS-10330:
---
Attachment: HDFS-10330.002.patch

Updated patch for testMetasave failure. Also fixes the nit from checkstyle.

> Add Corrupt Blocks Information in Metasave Output
> -
>
> Key: HDFS-10330
> URL: https://issues.apache.org/jira/browse/HDFS-10330
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-10330.001.patch, HDFS-10330.002.patch
>
>
> Along with Datanode information and other vital block information, it would 
> be useful to have corruptblocks' detailed info as part of metasave since 
> currently the jmx tracks only the count of corrupt nodes. This JIRA addresses 
> this improvement. CC: [~kihwal], [~daryn].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-26 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10224:
-
Attachment: (was: HDFS-10224-HDFS-9924.009.patch)

> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-HDFS-9924.004.patch, 
> HDFS-10224-HDFS-9924.005.patch, HDFS-10224-HDFS-9924.006.patch, 
> HDFS-10224-HDFS-9924.007.patch, HDFS-10224-HDFS-9924.008.patch, 
> HDFS-10224-HDFS-9924.009.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-26 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10224:
-
Attachment: HDFS-10224-HDFS-9924.009.patch

> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-HDFS-9924.004.patch, 
> HDFS-10224-HDFS-9924.005.patch, HDFS-10224-HDFS-9924.006.patch, 
> HDFS-10224-HDFS-9924.007.patch, HDFS-10224-HDFS-9924.008.patch, 
> HDFS-10224-HDFS-9924.009.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-26 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258779#comment-15258779
 ] 

Ravi Prakash commented on HDFS-10220:
-

Thanks Nicolas! One nit: In {{MAX_LOCK_HOLD_TO_RELEASE_LAESE_MS}} you've 
misspelt LEASE . Could you please fix it?

If there are no more objections, I'll commit the patch once you fix the nit.

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, 
> threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines

2016-04-26 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258771#comment-15258771
 ] 

Ming Ma commented on HDFS-10326:


I tend to agree it should be fine to change the default to enable TCP auto 
tuning. But would like to hear more from [~He Tianyi] about the scenario 
mentioned first. In addition, is changing the default considered back 
incompatible?

> Disable setting tcp socket send/receive buffers for write pipelines
> ---
>
> Key: HDFS-10326
> URL: https://issues.apache.org/jira/browse/HDFS-10326
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>
> The DataStreamer and the Datanode use a hardcoded 
> DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write 
> pipeline.  Explicitly setting tcp buffer sizes disables tcp stack 
> auto-tuning.  
> The hardcoded value will saturate a 1Gb with 1ms RTT.  105Mbs at 10ms.  
> Paltry 11Mbs over a 100ms long haul.  10Gb networks are underutilized.
> There should either be a configuration to completely disable setting the 
> buffers, or the the setReceiveBuffer and setSendBuffer should be removed 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark/Mapreduce

2016-04-26 Thread Mitchell Gudmundson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258734#comment-15258734
 ] 

Mitchell Gudmundson commented on HDFS-10327:


Greetings,

Unless I'm mistaken this is not a Spark specific issue. Even when running 
simple mapreduce jobs you end up with a directory of part files part-r where r 
is the reducer number. These directories are generally meant to be interpreted 
as one logical "file". In the Spark world when writing out an RDD or Dataframe 
you get a part file per partition (just the same as you would per reducer on 
the MR framework), however the concept is no different than on other 
distributed processing engines. It seems that one would want to be able to 
retrieve back the file contents of the various parts as a whole.

Regards,
-Mitch

> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> 
>
> Key: HDFS-10327
> URL: https://issues.apache.org/jira/browse/HDFS-10327
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Thomas Hille
>  Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many 
> parts of the file. When you read it with spark programmatically, you can read 
> this directory as it is a normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
>  is not a file: [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258726#comment-15258726
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

bq. I see that, but stream-level counters are essential at least for the tests 
which verify forward and lazy seeks. Which means that yes, they do have to go 
into the 2.8.0 release. What I can do is set up the scope so that they are 
package private, then, in the test code, implement the assertions about 
metric-derived state into that package.

I guess my hope here is that whatever mechanism we come up with is something 
that could easily be integrated into the upcoming 2.8 release.  Since we have 
talked about requiring our new metrics to not modify existing stable public 
interfaces, that seems very reasonable.

One thing that is a bit concerning about metrics2 is that I think people feel 
that this interface should be stable (i.e. don't remove or alter things once 
they're in), which would be a big constraint on us.  Perhaps we could document 
that per-fs stats were \@Public \@Evolving rather than stable?

bq. Regarding the metrics2 instrumentation in HADOOP-13028, I'm aggregating the 
stream statistics back into the metrics 2 data. That's something which isn't 
needed for the hadoop tests, but which I'm logging in spark test runs, such as 
(formatted for readability):

Do we have any ideas about how Spark will consume these metrics in the longer 
term?  Do they prefer to go through metrics2, for example?  I definitely don't 
object to putting this kind of stuff in metrics2, but if we go that route, we 
have to accept that we'll just get global (or at best per-fs-type) statistics, 
rather than per-fs-instance statistics.  Is that acceptable?  So far, nobody 
has spoken up strongly in favor of per-fs-instance statistics.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258706#comment-15258706
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

bq. I prefer earlier (being in UK time and all); I could do the first half hour 
of the webex \[at 12:30pm\]

How about 10:30AM PST to noon on tomorrow on Wednesday?

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier

2016-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258696#comment-15258696
 ] 

Hadoop QA commented on HDFS-10332:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 2s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
29s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 47s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 47s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
8s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 28s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.8.0_92. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 42s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 39s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800831/HDFS-10332.HDFS-8707.001.patch
 |
| JIRA Issue | HDFS-10332 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux aa1c2ffc9c3f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / e6d17da |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_92 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| JDK v1.7.0_95  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15298/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15298/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> hdfs-native-client fails to build with CMake 2.8.11 or earlier
> --
>
> Key: HDFS-10332
> URL: https://issues.apache.org/jira/browse/HDFS-10332
> Project: Hadoop 

[jira] [Commented] (HDFS-9545) DiskBalancer : Add Plan Command

2016-04-26 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258671#comment-15258671
 ] 

Anu Engineer commented on HDFS-9545:


Hi [~eddyxu] Thanks for your comments. Please see my responses to your comments.

bq. Should we consider cluster URI “file://" as illegal for DiskBalancer? Since 
there is no DN?
You are absolutely right, but this allows me to load a snapshot of a node -- 
and see how diskBalancer is working in case of issues. This is more like a bug 
reporting tool and I envisage me being the principal user of this feature. 
Mainly to debug issues when reported by users.

bq. If we just let diskBalancerLogs = path, in this way, it might be easier for 
admins to write scripts against DiskBalancer as they can fully control the 
output dirs.
Good catch, Thank you for bringing this up.  I would hate it if a tool did this 
to me :(

bq. A general question regarding generating logs dir, are the commands usually 
issued against the cluster or just one specific DN? If it only works for one 
specific DN, we might want to put DN hostname or IP into diskBalancerLogs path
 We are right now doing it against a specific datanode, in future some commands 
might work against the cluster. We write LOGDIR/*nodename*.before.json and 
LOGDIR/*nodename*.plan.json right now


I will fix all the other issues mentioned by you and upload a new patch.


> DiskBalancer : Add Plan Command
> ---
>
> Key: HDFS-9545
> URL: https://issues.apache.org/jira/browse/HDFS-9545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9545-HDFS-1312.001.patch
>
>
> Allows user to create a Plan and persist it. This is useful if the users want 
> to evaluate the actions of disk balancer before running the balancing job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9943) Support reconfiguring namenode replication confs

2016-04-26 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258656#comment-15258656
 ] 

Xiaobing Zhou commented on HDFS-9943:
-

Patch v004 addressed your comment 1/2. I will look into your last one. Thank 
you [~arpiagariu].

> Support reconfiguring namenode replication confs
> 
>
> Key: HDFS-9943
> URL: https://issues.apache.org/jira/browse/HDFS-9943
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9943-HDFS-9000.000.patch, 
> HDFS-9943-HDFS-9000.001.patch, HDFS-9943-HDFS-9000.002.patch, 
> HDFS-9943-HDFS-9000.003.patch, HDFS-9943-HDFS-9000.004.patch
>
>
> The following confs should be re-configurable in runtime.
> - dfs.namenode.replication.work.multiplier.per.iteration
> - dfs.namenode.replication.interval
> - dfs.namenode.replication.max-streams
> - dfs.namenode.replication.max-streams-hard-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9943) Support reconfiguring namenode replication confs

2016-04-26 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-9943:

Attachment: HDFS-9943-HDFS-9000.004.patch

> Support reconfiguring namenode replication confs
> 
>
> Key: HDFS-9943
> URL: https://issues.apache.org/jira/browse/HDFS-9943
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9943-HDFS-9000.000.patch, 
> HDFS-9943-HDFS-9000.001.patch, HDFS-9943-HDFS-9000.002.patch, 
> HDFS-9943-HDFS-9000.003.patch, HDFS-9943-HDFS-9000.004.patch
>
>
> The following confs should be re-configurable in runtime.
> - dfs.namenode.replication.work.multiplier.per.iteration
> - dfs.namenode.replication.interval
> - dfs.namenode.replication.max-streams
> - dfs.namenode.replication.max-streams-hard-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover

2016-04-26 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258637#comment-15258637
 ] 

Arpit Agarwal commented on HDFS-9543:
-

Hi Anu, thanks for the updated patch. My comments:
# In copyBlocks, the openPoolIters call should be inside the while loop, just 
before you open the try block.
# I didn't understand the new computeDelay calculation. Your original 
calculation (v1 patch) looked correct, all that remained was to subtract 
{{timeUsed}} and check for negative result.
# Thanks for the pointer to {{DirectoryScanner#scan}}. The synchronization is 
needed only when making changes to the in-memory block map. 
{{FsDatasetImpl#moveBlock}} and its callees already synchronize on the dataset 
when updating the map so we should remove the {{synchronized}} block in 
{{DiskBalancerMover#copyBlocks}}.
# Nitpick: Line 818: We can also log the maximum error count value here for 
quick reference. Also a typo _cound_ -> _count_.
# bq. I will make this comment clearer in the next update. Balancer supports a 
flag called -blockpools – HDFS-8890 seems to have added this. Just leaving a 
note that we don't do this yet.
I think the -blockpools flag just restricts the balancing to a subset of 
blockpools. We cannot copy blocks across blockpools. Would you consider 
removing the TODO and filing a Jira if you think a similar flag will be useful 
for disk balancer.

Sorry about not catching some of these on the last pass. 

> DiskBalancer : Add Data mover 
> --
>
> Key: HDFS-9543
> URL: https://issues.apache.org/jira/browse/HDFS-9543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9543-HDFS-1312.001.patch, 
> HDFS-9543-HDFS-1312.002.patch
>
>
> This patch adds the actual mover logic to the datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-26 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258609#comment-15258609
 ] 

Yongjun Zhang commented on HDFS-10313:
--

Thanks [~linyiqun], +1 on rev 003. Will commit soon.


> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines

2016-04-26 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258605#comment-15258605
 ] 

Mingliang Liu commented on HDFS-10326:
--

Ping [~mingma] for more input.

> Disable setting tcp socket send/receive buffers for write pipelines
> ---
>
> Key: HDFS-10326
> URL: https://issues.apache.org/jira/browse/HDFS-10326
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>
> The DataStreamer and the Datanode use a hardcoded 
> DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write 
> pipeline.  Explicitly setting tcp buffer sizes disables tcp stack 
> auto-tuning.  
> The hardcoded value will saturate a 1Gb with 1ms RTT.  105Mbs at 10ms.  
> Paltry 11Mbs over a 100ms long haul.  10Gb networks are underutilized.
> There should either be a configuration to completely disable setting the 
> buffers, or the the setReceiveBuffer and setSendBuffer should be removed 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258603#comment-15258603
 ] 

Colin Patrick McCabe commented on HDFS-10305:
-

bq. That's interesting then. hdfs dfs -mkdir /dirAlreadyExists returns a 
non-zero return code. I assumed a non-zero error code == a failed operation. 
Obviously I was wrong.

A non-zero error code on the shell does indicate a failed operation.  You can 
see that FsShell explicitly checks to see whether the path exists and exits 
with an error code if so.  The code is in 
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Mkdir.java
  I don't think this has anything to do with what hdfs should put in the audit 
log, since in this case, FsShell doesn't even call mkdir.

> Hdfs audit shouldn't log mkdir operaton if the directory already exists.
> 
>
> Key: HDFS-10305
> URL: https://issues.apache.org/jira/browse/HDFS-10305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Minor
>
> Currently Hdfs audit logs mkdir operation even if the directory already 
> exists.
> This creates confusion while analyzing audit logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258592#comment-15258592
 ] 

Colin Patrick McCabe commented on HDFS-10326:
-

bq. Some system may not support auto tuning, defaulting to a small window size 
(say 64k? which may make the scenario worse).

Can you give a concrete example of a system where Hadoop is actually deployed 
which doesn't support auto-tuning?

bq. I'd suggest we keep the configuration. Or maybe add another one, say 
dfs.socket.detect-auto-turning. When this is set to true (maybe turned on by 
default), socket buffer behavior depends on whether OS supports auto-tuning. If 
auto-tuning is not supported, use configured value automatically.

Hmm.  As far as I know, there is no way to detect auto-tuning.  If there is, 
then we wouldn't need a new configuration... we could just set the appropriate 
value when no configuration was given.

> Disable setting tcp socket send/receive buffers for write pipelines
> ---
>
> Key: HDFS-10326
> URL: https://issues.apache.org/jira/browse/HDFS-10326
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>
> The DataStreamer and the Datanode use a hardcoded 
> DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write 
> pipeline.  Explicitly setting tcp buffer sizes disables tcp stack 
> auto-tuning.  
> The hardcoded value will saturate a 1Gb with 1ms RTT.  105Mbs at 10ms.  
> Paltry 11Mbs over a 100ms long haul.  10Gb networks are underutilized.
> There should either be a configuration to completely disable setting the 
> buffers, or the the setReceiveBuffer and setSendBuffer should be removed 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258577#comment-15258577
 ] 

Hadoop QA commented on HDFS-10224:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} 
| {color:red} Docker failed to build yetus/hadoop:fbe3e86. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800830/HDFS-10224-HDFS-9924.009.patch
 |
| JIRA Issue | HDFS-10224 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15297/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-HDFS-9924.004.patch, 
> HDFS-10224-HDFS-9924.005.patch, HDFS-10224-HDFS-9924.006.patch, 
> HDFS-10224-HDFS-9924.007.patch, HDFS-10224-HDFS-9924.008.patch, 
> HDFS-10224-HDFS-9924.009.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark/Mapreduce

2016-04-26 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258567#comment-15258567
 ] 

Chris Nauroth commented on HDFS-10327:
--

It looks like in that example, myfile.csv is a directory, and its contents are 
3 files: _SUCCESS, part-0 and part-1.  Attempting to open myfile.csv 
directly as a file definitely won't work.  If Spark has a feature that lets you 
"open" it directly, then perhaps this is implemented at the application layer 
by Spark?  Maybe it does something equivalent to {{hdfs dfs -cat 
myfile.csv/part*}}?

That last example demonstrates the separation of concerns I'm talking about: 
the Hadoop shell command performs glob expansion to identify all files matching 
a pattern, and then it opens and displays each file separately, using HDFS APIs 
that operate on individual file paths.

> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> 
>
> Key: HDFS-10327
> URL: https://issues.apache.org/jira/browse/HDFS-10327
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Thomas Hille
>  Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many 
> parts of the file. When you read it with spark programmatically, you can read 
> this directory as it is a normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
>  is not a file: [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier

2016-04-26 Thread Tibor Kiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kiss updated HDFS-10332:
--
Attachment: HDFS-10332.HDFS-8707.001.patch

> hdfs-native-client fails to build with CMake 2.8.11 or earlier
> --
>
> Key: HDFS-10332
> URL: https://issues.apache.org/jira/browse/HDFS-10332
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: HDFS-10332-HDFS-8707.001.patch, HDFS-10332.01.patch, 
> HDFS-10332.HDFS-8707.001.patch
>
>
> Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's 
> function when VAR=DIRECTORY) the native-client won't build.
> Currently RHEL6 & 7 are using older version of CMake. 
> Error log:
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client ---
> [INFO] Executing tasks
> main:
>  [exec] JAVA_HOME=, 
> JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
>  [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, 
> JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux
>  [exec] Located all JNI components successfully.
>  [exec] -- Could NOT find PROTOBUF (missing:  PROTOBUF_LIBRARY 
> PROTOBUF_INCLUDE_DIR)
>  [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
>  [exec] -- checking for module 'fuse'
>  [exec] --   package 'fuse' not found
>  [exec] -- Failed to find Linux FUSE libraries or include files.  Will 
> not build FUSE client.
>  [exec] -- Configuring incomplete, errors occurred!
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error: The following variables are used in this project, 
> but they are set to NOTFOUND.
>  [exec] Please set them or make sure they are set and tested correctly in 
> the CMake files:
>  [exec] PROTOBUF_LIBRARY (ADVANCED)
>  [exec] linked by target "hdfspp" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "hdfspp_static" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "protoc-gen-hrpc" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto
>  [exec] linked by target "bad_datanode_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfs_builder_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfspp_errors_test" in directory 
> /home/tiborkiss/devel/workspace
>  [exec] 
> /hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" 
> in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test
>  [exec] linked by target "logging_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "node_exclusion_test" in directory 
> 

[jira] [Updated] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-26 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10224:
-
Attachment: HDFS-10224-HDFS-9924.009.patch

> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-HDFS-9924.004.patch, 
> HDFS-10224-HDFS-9924.005.patch, HDFS-10224-HDFS-9924.006.patch, 
> HDFS-10224-HDFS-9924.007.patch, HDFS-10224-HDFS-9924.008.patch, 
> HDFS-10224-HDFS-9924.009.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-26 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10224:
-
Status: Patch Available  (was: Open)

> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-HDFS-9924.004.patch, 
> HDFS-10224-HDFS-9924.005.patch, HDFS-10224-HDFS-9924.006.patch, 
> HDFS-10224-HDFS-9924.007.patch, HDFS-10224-HDFS-9924.008.patch, 
> HDFS-10224-HDFS-9924.009.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-26 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10224:
-
Status: Open  (was: Patch Available)

> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-HDFS-9924.004.patch, 
> HDFS-10224-HDFS-9924.005.patch, HDFS-10224-HDFS-9924.006.patch, 
> HDFS-10224-HDFS-9924.007.patch, HDFS-10224-HDFS-9924.008.patch, 
> HDFS-10224-HDFS-9924.009.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-26 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10224:
-
Attachment: (was: HDFS-10224-HDFS-9924.009.patch)

> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-HDFS-9924.004.patch, 
> HDFS-10224-HDFS-9924.005.patch, HDFS-10224-HDFS-9924.006.patch, 
> HDFS-10224-HDFS-9924.007.patch, HDFS-10224-HDFS-9924.008.patch, 
> HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark/Mapreduce

2016-04-26 Thread Thomas Hille (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258520#comment-15258520
 ] 

Thomas Hille commented on HDFS-10327:
-

GET 
http://:50070/webhdfs/v1/data/output/adp/myfile.csv?user.name=alice=OPEN

returns:

{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
 is not a file: /data/output/adp/adp_perf_7milx.csv\n\tat 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)\n\tat
 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)\n\tat
 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)\n\tat
 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)\n\tat
 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)\n\tat
 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652)\n\tat
 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)\n\tat
 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)\n\tat
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)\n\tat 
java.security.AccessController.doPrivileged(Native Method)\n\tat 
javax.security.auth.Subject.doAs(Subject.java:422)\n\tat 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)\n\tat
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)\n"}}


and GET 
http://:50070/webhdfs/v1/data/output/adp/myfile.csv?user.name=alice=LISTSTATUS
returns

{"FileStatuses":{"FileStatus":[
{"accessTime":1460595235123,"blockSize":134217728,"childrenNum":0,"fileId":169558,"group":"hdfs","length":0,"modificationTime":1460595235193,"owner":"lroot","pathSuffix":"_SUCCESS","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1460590333797,"blockSize":134217728,"childrenNum":0,"fileId":155388,"group":"hdfs","length":3529732,"modificationTime":1460590334756,"owner":"lroot","pathSuffix":"part-0","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1460590295008,"blockSize":134217728,"childrenNum":0,"fileId":154918,"group":"hdfs","length":3540006,"modificationTime":1460590296204,"owner":"lroot","pathSuffix":"part-1","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},

> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> 
>
> Key: HDFS-10327
> URL: https://issues.apache.org/jira/browse/HDFS-10327
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Thomas Hille
>  Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many 
> parts of the file. When you read it with spark programmatically, you can read 
> this directory as it is a normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
>  is not a file: [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark/Mapreduce

2016-04-26 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258517#comment-15258517
 ] 

Thomas Graves commented on HDFS-10327:
--

What command are you using to try to read when you get the error?

> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> 
>
> Key: HDFS-10327
> URL: https://issues.apache.org/jira/browse/HDFS-10327
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Thomas Hille
>  Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many 
> parts of the file. When you read it with spark programmatically, you can read 
> this directory as it is a normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
>  is not a file: [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier

2016-04-26 Thread Tibor Kiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kiss updated HDFS-10332:
--
Attachment: HDFS-10332-HDFS-8707.001.patch

> hdfs-native-client fails to build with CMake 2.8.11 or earlier
> --
>
> Key: HDFS-10332
> URL: https://issues.apache.org/jira/browse/HDFS-10332
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: HDFS-10332-HDFS-8707.001.patch, HDFS-10332.01.patch
>
>
> Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's 
> function when VAR=DIRECTORY) the native-client won't build.
> Currently RHEL6 & 7 are using older version of CMake. 
> Error log:
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client ---
> [INFO] Executing tasks
> main:
>  [exec] JAVA_HOME=, 
> JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
>  [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, 
> JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux
>  [exec] Located all JNI components successfully.
>  [exec] -- Could NOT find PROTOBUF (missing:  PROTOBUF_LIBRARY 
> PROTOBUF_INCLUDE_DIR)
>  [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
>  [exec] -- checking for module 'fuse'
>  [exec] --   package 'fuse' not found
>  [exec] -- Failed to find Linux FUSE libraries or include files.  Will 
> not build FUSE client.
>  [exec] -- Configuring incomplete, errors occurred!
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error: The following variables are used in this project, 
> but they are set to NOTFOUND.
>  [exec] Please set them or make sure they are set and tested correctly in 
> the CMake files:
>  [exec] PROTOBUF_LIBRARY (ADVANCED)
>  [exec] linked by target "hdfspp" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "hdfspp_static" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "protoc-gen-hrpc" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto
>  [exec] linked by target "bad_datanode_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfs_builder_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfspp_errors_test" in directory 
> /home/tiborkiss/devel/workspace
>  [exec] 
> /hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" 
> in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test
>  [exec] linked by target "logging_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "node_exclusion_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests

[jira] [Commented] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark/Mapreduce

2016-04-26 Thread Thomas Hille (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258488#comment-15258488
 ] 

Thomas Hille commented on HDFS-10327:
-

Hi guys,
It looks like splitting the file in parts is a mapreduce feature rather than 
spak specific (https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html -- 
look at the output of $bin/hadoop dfs -cat 
/usr/joe/wordcount/output/part-0).
So its maybe still something for you guys?

> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> 
>
> Key: HDFS-10327
> URL: https://issues.apache.org/jira/browse/HDFS-10327
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Thomas Hille
>  Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many 
> parts of the file. When you read it with spark programmatically, you can read 
> this directory as it is a normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
>  is not a file: [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark/Mapreduce

2016-04-26 Thread Thomas Hille (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Hille reopened HDFS-10327:
-

> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> 
>
> Key: HDFS-10327
> URL: https://issues.apache.org/jira/browse/HDFS-10327
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Thomas Hille
>  Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many 
> parts of the file. When you read it with spark programmatically, you can read 
> this directory as it is a normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
>  is not a file: [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark/Mapreduce

2016-04-26 Thread Thomas Hille (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Hille updated HDFS-10327:

Summary: Open files in WEBHDFS which are stored in folders by 
Spark/Mapreduce  (was: Open files in WEBHDFS which are stored in folders by 
Spark)

> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> 
>
> Key: HDFS-10327
> URL: https://issues.apache.org/jira/browse/HDFS-10327
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Thomas Hille
>  Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many 
> parts of the file. When you read it with spark programmatically, you can read 
> this directory as it is a normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
>  is not a file: [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier

2016-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258474#comment-15258474
 ] 

Hadoop QA commented on HDFS-10332:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} HDFS-10332 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800822/HDFS-10332.01.patch |
| JIRA Issue | HDFS-10332 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15295/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> hdfs-native-client fails to build with CMake 2.8.11 or earlier
> --
>
> Key: HDFS-10332
> URL: https://issues.apache.org/jira/browse/HDFS-10332
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: HDFS-10332.01.patch
>
>
> Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's 
> function when VAR=DIRECTORY) the native-client won't build.
> Currently RHEL6 & 7 are using older version of CMake. 
> Error log:
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client ---
> [INFO] Executing tasks
> main:
>  [exec] JAVA_HOME=, 
> JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
>  [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, 
> JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux
>  [exec] Located all JNI components successfully.
>  [exec] -- Could NOT find PROTOBUF (missing:  PROTOBUF_LIBRARY 
> PROTOBUF_INCLUDE_DIR)
>  [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
>  [exec] -- checking for module 'fuse'
>  [exec] --   package 'fuse' not found
>  [exec] -- Failed to find Linux FUSE libraries or include files.  Will 
> not build FUSE client.
>  [exec] -- Configuring incomplete, errors occurred!
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error: The following variables are used in this project, 
> but they are set to NOTFOUND.
>  [exec] Please set them or make sure they are set and tested correctly in 
> the CMake files:
>  [exec] PROTOBUF_LIBRARY (ADVANCED)
>  [exec] linked by target "hdfspp" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "hdfspp_static" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "protoc-gen-hrpc" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto
>  [exec] linked by target "bad_datanode_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfs_builder_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec]

[jira] [Updated] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier

2016-04-26 Thread Tibor Kiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kiss updated HDFS-10332:
--
Status: Patch Available  (was: Open)

> hdfs-native-client fails to build with CMake 2.8.11 or earlier
> --
>
> Key: HDFS-10332
> URL: https://issues.apache.org/jira/browse/HDFS-10332
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: HDFS-10332.01.patch
>
>
> Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's 
> function when VAR=DIRECTORY) the native-client won't build.
> Currently RHEL6 & 7 are using older version of CMake. 
> Error log:
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client ---
> [INFO] Executing tasks
> main:
>  [exec] JAVA_HOME=, 
> JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
>  [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, 
> JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux
>  [exec] Located all JNI components successfully.
>  [exec] -- Could NOT find PROTOBUF (missing:  PROTOBUF_LIBRARY 
> PROTOBUF_INCLUDE_DIR)
>  [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
>  [exec] -- checking for module 'fuse'
>  [exec] --   package 'fuse' not found
>  [exec] -- Failed to find Linux FUSE libraries or include files.  Will 
> not build FUSE client.
>  [exec] -- Configuring incomplete, errors occurred!
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error: The following variables are used in this project, 
> but they are set to NOTFOUND.
>  [exec] Please set them or make sure they are set and tested correctly in 
> the CMake files:
>  [exec] PROTOBUF_LIBRARY (ADVANCED)
>  [exec] linked by target "hdfspp" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "hdfspp_static" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "protoc-gen-hrpc" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto
>  [exec] linked by target "bad_datanode_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfs_builder_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfspp_errors_test" in directory 
> /home/tiborkiss/devel/workspace
>  [exec] 
> /hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" 
> in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test
>  [exec] linked by target "logging_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "node_exclusion_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
> tiborkiss@eiger ~/d/w/hadoop ❯❯❯ 

[jira] [Updated] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier

2016-04-26 Thread Tibor Kiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kiss updated HDFS-10332:
--
Attachment: HDFS-10332.01.patch

> hdfs-native-client fails to build with CMake 2.8.11 or earlier
> --
>
> Key: HDFS-10332
> URL: https://issues.apache.org/jira/browse/HDFS-10332
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: HDFS-10332.01.patch
>
>
> Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's 
> function when VAR=DIRECTORY) the native-client won't build.
> Currently RHEL6 & 7 are using older version of CMake. 
> Error log:
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client ---
> [INFO] Executing tasks
> main:
>  [exec] JAVA_HOME=, 
> JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
>  [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, 
> JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux
>  [exec] Located all JNI components successfully.
>  [exec] -- Could NOT find PROTOBUF (missing:  PROTOBUF_LIBRARY 
> PROTOBUF_INCLUDE_DIR)
>  [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
>  [exec] -- checking for module 'fuse'
>  [exec] --   package 'fuse' not found
>  [exec] -- Failed to find Linux FUSE libraries or include files.  Will 
> not build FUSE client.
>  [exec] -- Configuring incomplete, errors occurred!
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error: The following variables are used in this project, 
> but they are set to NOTFOUND.
>  [exec] Please set them or make sure they are set and tested correctly in 
> the CMake files:
>  [exec] PROTOBUF_LIBRARY (ADVANCED)
>  [exec] linked by target "hdfspp" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "hdfspp_static" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "protoc-gen-hrpc" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto
>  [exec] linked by target "bad_datanode_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfs_builder_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfspp_errors_test" in directory 
> /home/tiborkiss/devel/workspace
>  [exec] 
> /hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" 
> in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test
>  [exec] linked by target "logging_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "node_exclusion_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
> tiborkiss@eiger ~/d/w/hadoop ❯❯❯ cat 

[jira] [Updated] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier

2016-04-26 Thread Tibor Kiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kiss updated HDFS-10332:
--
Description: 
Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's 
function when VAR=DIRECTORY) the native-client won't build.

Currently RHEL6 & 7 are using older version of CMake. 

Error log:
{noformat}
[INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client ---
[INFO] Executing tasks

main:
 [exec] JAVA_HOME=, 
JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
 [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, 
JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux
 [exec] Located all JNI components successfully.
 [exec] -- Could NOT find PROTOBUF (missing:  PROTOBUF_LIBRARY 
PROTOBUF_INCLUDE_DIR)
 [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
 [exec] -- checking for module 'fuse'
 [exec] --   package 'fuse' not found
 [exec] -- Failed to find Linux FUSE libraries or include files.  Will not 
build FUSE client.
 [exec] -- Configuring incomplete, errors occurred!
 [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
(get_filename_component):
 [exec]   get_filename_component unknown component DIRECTORY
 [exec] Call Stack (most recent call first):
 [exec]   main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand)
 [exec]
 [exec]
 [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
(get_filename_component):
 [exec]   get_filename_component unknown component DIRECTORY
 [exec] Call Stack (most recent call first):
 [exec]   main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand)
 [exec]
 [exec]
 [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
(get_filename_component):
 [exec]   get_filename_component unknown component DIRECTORY
 [exec] Call Stack (most recent call first):
 [exec]   main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand)
 [exec]
 [exec]
 [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
(get_filename_component):
 [exec]   get_filename_component unknown component DIRECTORY
 [exec] Call Stack (most recent call first):
 [exec]   main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand)
 [exec]
 [exec]
 [exec] CMake Error: The following variables are used in this project, but 
they are set to NOTFOUND.
 [exec] Please set them or make sure they are set and tested correctly in 
the CMake files:
 [exec] PROTOBUF_LIBRARY (ADVANCED)
 [exec] linked by target "hdfspp" in directory 
/home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
 [exec] linked by target "hdfspp_static" in directory 
/home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
 [exec] linked by target "protoc-gen-hrpc" in directory 
/home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto
 [exec] linked by target "bad_datanode_test" in directory 
/home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
 [exec] linked by target "hdfs_builder_test" in directory 
/home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
 [exec] linked by target "hdfspp_errors_test" in directory 
/home/tiborkiss/devel/workspace
 [exec] 
/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
 [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" in 
directory 
/home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test
 [exec] linked by target "logging_test" in directory 
/home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
 [exec] linked by target "node_exclusion_test" in directory 
/home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
tiborkiss@eiger ~/d/w/hadoop ❯❯❯ cat ../HADOOP-cmake-error.txt
[INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client ---
[INFO] Executing tasks

main:
 [exec] JAVA_HOME=, 
JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
 [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, 
JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux
 [exec] Located all JNI components successfully.
 [exec] -- Could NOT find PROTOBUF (missing:  PROTOBUF_LIBRARY 
PROTOBUF_INCLUDE_DIR)
 [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
 [exec] -- checking for module 'fuse'
 [exec] --   package 'fuse' not found
 [exec] -- 

  1   2   >