[jira] [Updated] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-21 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10224:
-
Attachment: HDFS-10224-HDFS-9924.004.patch

> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-HDFS-9924.004.patch, 
> HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253354#comment-15253354
 ] 

Walter Su commented on HDFS-10220:
--

You are right. The only question I have is I have no idea if the default value 
1000 is a right choice, or the approach of throttling the rate. I kind of hope 
it's out-of-the-box. Small companies with small clusters have cluster 
administrators who may not quite understand what the configuration means.

bq. Counting the time since better in term of funcionnality but I'm afraid 
about adding extra computation time on this check compare to a simple count of 
files. The idea is not to spend more times to release those lease. What is your 
feeling about it?
I believe the overhead can be ignored. Or we can calc the elapse time after 
processing a small batch.

I saw {{BlockManager.BlockReportProcessingThread}} release the writeLock if it 
holds it more than 4ms. Do you think the same idea works here?

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253339#comment-15253339
 ] 

Hadoop QA commented on HDFS-10313:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 9s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 47s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 7s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 26s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800165/HDFS-10313.003.patch |
| JIRA Issue | HDFS-10313 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 97089724b0fd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 337bcde |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_77 

[jira] [Commented] (HDFS-8872) Reporting of missing blocks is different in fsck and namenode ui/metasave

2016-04-21 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253325#comment-15253325
 ] 

Haohui Mai commented on HDFS-8872:
--

I think it is a good to call it either way, just need to make sure things are 
consistent :-)

> Reporting of missing blocks is different in fsck and namenode ui/metasave
> -
>
> Key: HDFS-8872
> URL: https://issues.apache.org/jira/browse/HDFS-8872
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>
> Namenode ui and metasave will not report a block as missing if the only 
> replica is on decommissioning/decomissioned node while fsck will show it as 
> MISSING.
> Since decommissioned node can be formatted/removed anytime, we can actually 
> lose the block.
> Its better to alert on namenode ui if the only copy is on 
> decomissioned/decommissioning node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253309#comment-15253309
 ] 

Hadoop QA commented on HDFS-10301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 
217 unchanged - 0 fixed = 220 total (was 217) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 38s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 50s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 218m 14s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.namenode.TestEditLog |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
| JDK v1.7.0_95 Failed junit tests | 

[jira] [Commented] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-21 Thread ChenFolin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253284#comment-15253284
 ] 

ChenFolin commented on HDFS-10322:
--

Hello Chris Nautoth
Thanks for your reply.
I see  all bugs (HADOOP-11333, HADOOP-11604, HADOOP-11648 and HDFS-8429) are 
not the same with me.

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-21 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Attachment: HDFS-10313.003.patch

Update the patch for the latest comments, pending jenkins.

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253236#comment-15253236
 ] 

Walter Su commented on HDFS-10301:
--

I like your idea of counting storages with same reportId, and no purge if 
there's any interleaving. I guest {{rpcsSeen}} can be removed or replaced by 
{{storagesSeen}}?

Processing the retransmissioned reports is kind of wasting resource. I think 
the best approach is as Colin said, "to remove existing DataNode storage report 
RPCs with the old ID from the queue when we receive one with a new block report 
ID." Let's consider it as an optimization in another jira.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253226#comment-15253226
 ] 

Hadoop QA commented on HDFS-6515:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 0s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 51s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
5s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 29s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 13s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 47s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 10s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 184m 41s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.ipc.TestRPC |
|   | 

[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253204#comment-15253204
 ] 

Walter Su commented on HDFS-10301:
--

The handler threads will wait anyway, either waiting the queue monitor or the 
fsn writeLock. The queue processingThread will contend for fsn writeLock. In 
the end, there's no difference.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10317) dfs.domain.socket.path is not set in TestShortCircuitLocalRead.testReadWithRemoteBlockReader

2016-04-21 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253201#comment-15253201
 ] 

Li Bo commented on HDFS-10317:
--

I use Intellij IDEA to run the test and the problem occurs, but it will not 
happen using mvn test. I will check it further.

> dfs.domain.socket.path is not set in 
> TestShortCircuitLocalRead.testReadWithRemoteBlockReader
> 
>
> Key: HDFS-10317
> URL: https://issues.apache.org/jira/browse/HDFS-10317
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Li Bo
>
> org.apache.hadoop.HadoopIllegalArgumentException: The short-circuit local 
> reads feature is enabled but dfs.domain.socket.path is not set.
>   at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.(DomainSocketFactory.java:115)
>   at org.apache.hadoop.hdfs.ClientContext.(ClientContext.java:132)
>   at org.apache.hadoop.hdfs.ClientContext.get(ClientContext.java:157)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:358)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:275)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:266)
>   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:258)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:2466)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:2512)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1632)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:844)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead.doTestShortCircuitReadWithRemoteBlockReader(TestShortCircuitLocalRead.java:608)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead.testReadWithRemoteBlockReader(TestShortCircuitLocalRead.java:590)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10311) libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket

2016-04-21 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10311:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket
> -
>
> Key: HDFS-10311
> URL: https://issues.apache.org/jira/browse/HDFS-10311
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-10311.HDFS-8707.000.patch, 
> HDFS-10311.HDFS-8707.001.patch, HDFS-10311.HDFS-8707.002.patch
>
>
> DataNodeConnectionImpl calls reset on the unique_ptr that references the 
> underlying asio::tcp::socket.  If this happens after the continuation 
> pipeline checks the cancel state but before asio uses the socket it will 
> segfault because unique_ptr::reset will explicitly change it's value to 
> nullptr.
> Cancel should only call shutdown() and close() on the socket but keep the 
> instance of it alive.  The socket can probably also be turned into a member 
> of DataNodeConnectionImpl to get rid of the unique pointer and simplify 
> things a bit.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10311) libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket

2016-04-21 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253186#comment-15253186
 ] 

James Clampffer commented on HDFS-10311:


I've committed this to HDFS-8707, thanks for the reviews Bob and Stephen.

Good catch Stephen; I have another patch for HDFS-10310 I'll be posting 
tomorrow and I'll incorporate your suggestion in that.

> libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket
> -
>
> Key: HDFS-10311
> URL: https://issues.apache.org/jira/browse/HDFS-10311
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-10311.HDFS-8707.000.patch, 
> HDFS-10311.HDFS-8707.001.patch, HDFS-10311.HDFS-8707.002.patch
>
>
> DataNodeConnectionImpl calls reset on the unique_ptr that references the 
> underlying asio::tcp::socket.  If this happens after the continuation 
> pipeline checks the cancel state but before asio uses the socket it will 
> segfault because unique_ptr::reset will explicitly change it's value to 
> nullptr.
> Cancel should only call shutdown() and close() on the socket but keep the 
> instance of it alive.  The socket can probably also be turned into a member 
> of DataNodeConnectionImpl to get rid of the unique pointer and simplify 
> things a bit.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253181#comment-15253181
 ] 

Walter Su commented on HDFS-10301:
--

bq. Enabling HDFS-9198 will fifo process BRs. It doesn't solve this 
implementation bug but virtually eliminates it from occurring.
bq. This addresses Daryn's comment rather than solving the reported bug, as BTW 
Daryn correctly stated.
that's incorrect. Please run the test in 001 patch with-and-without the fix, 
you'll see the difference. It does solve the issue. Because, 

The bug only exists when reports are contained in one rpc. If they are splitted 
into multiple RPCs, it's not problem, because the {{rpcsSeen}} guard prevent it 
from happening. So, my approach is to process reports contained in one rpc 
contiguously, by putting them into the queue atomically.


> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9894) Add unsetStoragePolicy API to FileContext/AbstractFileSystem and derivatives

2016-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253163#comment-15253163
 ] 

Hudson commented on HDFS-9894:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9651 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9651/])
HDFS-9894. Add unsetStoragePolicy API to FileContext/AbstractFileSystem (jing9: 
rev 7149cdb3c2d9dd390cd8668883cbe5db94090e0a)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileContext.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/AbstractFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/fs/Hdfs.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFs.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ChRootedFs.java


> Add unsetStoragePolicy API to FileContext/AbstractFileSystem and derivatives
> 
>
> Key: HDFS-9894
> URL: https://issues.apache.org/jira/browse/HDFS-9894
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-9894.000.patch
>
>
> This is to augment FileContext/AbstractFileSystem and derivatives with newly 
> added API unsetStoragePolicy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10300) TestDistCpSystem should share MiniDFSCluster

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253156#comment-15253156
 ] 

Hadoop QA commented on HDFS-10300:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 22s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 36s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 59s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800134/HDFS-10300.001.patch |
| JIRA Issue | HDFS-10300 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d6380f20fbd9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7149cdb |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_77 

[jira] [Updated] (HDFS-9545) DiskBalancer : Add Plan Command

2016-04-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9545:
---
Attachment: HDFS-9545-HDFS-1312.001.patch

> DiskBalancer : Add Plan Command
> ---
>
> Key: HDFS-9545
> URL: https://issues.apache.org/jira/browse/HDFS-9545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9545-HDFS-1312.001.patch
>
>
> Allows user to create a Plan and persist it. This is useful if the users want 
> to evaluate the actions of disk balancer before running the balancing job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9545) DiskBalancer : Add Plan Command

2016-04-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9545:
---
Attachment: (was: HDFS-9545-HDFS-1312.001.patch)

> DiskBalancer : Add Plan Command
> ---
>
> Key: HDFS-9545
> URL: https://issues.apache.org/jira/browse/HDFS-9545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>
> Allows user to create a Plan and persist it. This is useful if the users want 
> to evaluate the actions of disk balancer before running the balancing job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9545) DiskBalancer : Add Plan Command

2016-04-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9545:
---
Attachment: HDFS-9545-HDFS-1312.001.patch

depends on HDFS-9543. Posting here for early code review.

> DiskBalancer : Add Plan Command
> ---
>
> Key: HDFS-9545
> URL: https://issues.apache.org/jira/browse/HDFS-9545
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Attachments: HDFS-9545-HDFS-1312.001.patch
>
>
> Allows user to create a Plan and persist it. This is useful if the users want 
> to evaluate the actions of disk balancer before running the balancing job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9544) DiskBalancer : Command utitlities

2016-04-21 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer resolved HDFS-9544.

Resolution: Duplicate

> DiskBalancer : Command utitlities
> -
>
> Key: HDFS-9544
> URL: https://issues.apache.org/jira/browse/HDFS-9544
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>
> Disk Balancer commands that users can execute. This is base classes that is 
> used by all other command classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9894) Add unsetStoragePolicy API to FileContext/AbstractFileSystem and derivatives

2016-04-21 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9894:

Labels:   (was: 2.8.0)

> Add unsetStoragePolicy API to FileContext/AbstractFileSystem and derivatives
> 
>
> Key: HDFS-9894
> URL: https://issues.apache.org/jira/browse/HDFS-9894
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-9894.000.patch
>
>
> This is to augment FileContext/AbstractFileSystem and derivatives with newly 
> added API unsetStoragePolicy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9894) Add unsetStoragePolicy API to FileContext/AbstractFileSystem and derivatives

2016-04-21 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9894:

Component/s: (was: tools)

> Add unsetStoragePolicy API to FileContext/AbstractFileSystem and derivatives
> 
>
> Key: HDFS-9894
> URL: https://issues.apache.org/jira/browse/HDFS-9894
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-9894.000.patch
>
>
> This is to augment FileContext/AbstractFileSystem and derivatives with newly 
> added API unsetStoragePolicy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9894) Add unsetStoragePolicy API to FileContext/AbstractFileSystem and derivatives

2016-04-21 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9894:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

I've committed the patch into trunk and branch-2. Thanks [~xiaobingo] for the 
contribution!

> Add unsetStoragePolicy API to FileContext/AbstractFileSystem and derivatives
> 
>
> Key: HDFS-9894
> URL: https://issues.apache.org/jira/browse/HDFS-9894
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-9894.000.patch
>
>
> This is to augment FileContext/AbstractFileSystem and derivatives with newly 
> added API unsetStoragePolicy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253076#comment-15253076
 ] 

Hadoop QA commented on HDFS-10224:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 6s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 51s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
5s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
0s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 1s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 54s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 54s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s 
{color} | {color:red} root: patch generated 20 new + 143 unchanged - 1 fixed = 
163 total (was 144) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 56s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client generated 1 new + 
0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 56s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 32s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 45s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 33s {color} 
| {color:red} hadoop-hdfs-client in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | 

[jira] [Updated] (HDFS-10300) TestDistCpSystem should share MiniDFSCluster

2016-04-21 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-10300:
--
Attachment: HDFS-10300.001.patch

Patch 001:
* Share a MiniDFSCluster in test cases
* Switch to JUnit 4 annotation style

Passed {{TestDistCpSystem}} unit test.

> TestDistCpSystem should share MiniDFSCluster
> 
>
> Key: HDFS-10300
> URL: https://issues.apache.org/jira/browse/HDFS-10300
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Trivial
>  Labels: quality, test
> Attachments: HDFS-10300.001.patch
>
>
> The test cases in this class should share MiniDFSCluster if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10300) TestDistCpSystem should share MiniDFSCluster

2016-04-21 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-10300:
--
Labels: quality test  (was: )
Status: Patch Available  (was: In Progress)

> TestDistCpSystem should share MiniDFSCluster
> 
>
> Key: HDFS-10300
> URL: https://issues.apache.org/jira/browse/HDFS-10300
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Trivial
>  Labels: quality, test
> Attachments: HDFS-10300.001.patch
>
>
> The test cases in this class should share MiniDFSCluster if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10301:

Attachment: HDFS-10301.003.patch

added a unit test

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8057) Move BlockReader implementation to the client implementation package

2016-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253008#comment-15253008
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8057:
---

Thanks for the new patch.
- We should not change Apache_Hadoop_HDFS_2.6.0.xml, which is a generated file.
- Let's also move the tests (BlockReaderTestUtil, TestBlockReaderBase, 
TestBlockReaderFactory, TestBlockReaderLocal, TestBlockReaderRemote, 
TestBlockReaderRemote2, TestBlockReaderLocalLegacy) to client.impl in 
hadoop-hdfs-project/hadoop-hdfs (not hadoop-hdfs-project/hadoop-hdfs-client).
- After moved the tests, we don't need to change the BlockReaderLocal methods 
to public.
- We may get RemoteBlockReader2.LOG by 
LoggerFactory.getLogger(BlockReaderRemote2.class) in 
TestClientBlockVerification.  Then we don't need to change 
RemoteBlockReader2.LOG to public.

> Move BlockReader implementation to the client implementation package
> 
>
> Key: HDFS-8057
> URL: https://issues.apache.org/jira/browse/HDFS-8057
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Takanobu Asanuma
> Attachments: HDFS-8057.1.patch, HDFS-8057.2.patch
>
>
> BlockReaderLocal, RemoteBlockReader, etc should be moved to 
> org.apache.hadoop.hdfs.client.impl.  We may as well rename RemoteBlockReader 
> to BlockReaderRemote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9943) Support reconfiguring namenode replication confs

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252986#comment-15252986
 ] 

Hadoop QA commented on HDFS-9943:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 6 new + 
290 unchanged - 4 fixed = 296 total (was 294) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 35s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 59s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
|   | hadoop.hdfs.TestHFlush |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800064/HDFS-9943-HDFS-9000.003.patch
 |
| JIRA Issue | HDFS-9943 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 8eb2421e2623 

[jira] [Updated] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)

2016-04-21 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6515:
--
Priority: Major  (was: Blocker)

I'm downgrading the priority of this issue, since lack of PPC support is not a 
regression.

> testPageRounder   (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
> -
>
> Key: HDFS-6515
> URL: https://issues.apache.org/jira/browse/HDFS-6515
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 2.4.0, 2.4.1
> Environment: Linux on PPC64
> Tested with Hadoop 3.0.0 SNAPSHOT, on RHEL 6.5, on Ubuntu 14.04, on Fedora 
> 19, using mvn -Dtest=TestFsDatasetCache#testPageRounder -X test
>Reporter: Tony Reix
>  Labels: BB2015-05-TBR, hadoop, test
> Attachments: HDFS-6515-1.patch, HDFS-6515-2.patch
>
>
> I have an issue with test :
>testPageRounder
>   (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
> on Linux/PowerPC.
> On Linux/Intel, test runs fine.
> On Linux/PowerPC, I have:
> testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)  
> Time elapsed: 64.037 sec  <<< ERROR!
> java.lang.Exception: test timed out after 6 milliseconds
> Looking at details, I see that some "Failed to cache " messages appear in the 
> traces. Only 10 on Intel, but 186 on PPC64.
> On PPC64, it looks like some thread is waiting for something that never 
> happens, generating a TimeOut.
> I'm now using IBM JVM, however I've just checked that the issue also appears 
> with OpenJDK.
> I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 .
> I need help for understanding what the test is doing, what traces are 
> expected, in order to understand what/where is the root cause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252969#comment-15252969
 ] 

Hadoop QA commented on HDFS-9732:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 58s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 17s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 1s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 19s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} 

[jira] [Commented] (HDFS-10309) Balancer doesn't honor dfs.blocksize value defined with suffix k(kilo), m(mega), g(giga)

2016-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252959#comment-15252959
 ] 

Hudson commented on HDFS-10309:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9650 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9650/])
HDFS-10309 Balancer doesn't honor dfs.blocksize value defined with (szetszwo: 
rev 14ab7a81e2519935ff28ad40519649599e204732)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java


> Balancer doesn't honor dfs.blocksize value defined with suffix k(kilo), 
> m(mega), g(giga)
> 
>
> Key: HDFS-10309
> URL: https://issues.apache.org/jira/browse/HDFS-10309
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Amit Anand
>Assignee: Amit Anand
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-10309.01.patch, HDFS-10309.02.patch
>
>
> While running HDFS Balancer I get error given below when {{dfs.blockSize}} is 
> defined with suffix {{k(kilo), m(mega), g(giga)}} in {{hdfs-site.xml}}. In my 
> deployment {{dfs.blocksize}} is set to {{128m}}. 
> {code}
> hdfs@bcpc-vm1:/home/ubuntu$ hdfs balancer
> 16/04/19 08:49:51 INFO balancer.Balancer: namenodes  = [hdfs://Test-Laptop]
> 16/04/19 08:49:51 INFO balancer.Balancer: parameters = 
> Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle 
> iteration = 5, #excluded nodes = 0, #included nodes = 0, #source 
> nodes = 0, #blockpools = 0, run during upgrade = false]
> 16/04/19 08:49:51 INFO balancer.Balancer: included nodes = []
> 16/04/19 08:49:51 INFO balancer.Balancer: excluded nodes = []
> 16/04/19 08:49:51 INFO balancer.Balancer: source nodes = []
> Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved
> 16/04/19 08:49:52 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 16/04/19 08:49:52 INFO block.BlockTokenSecretManager: Setting block keys
> 16/04/19 08:49:52 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 
> 540 (default=540)
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 
> (default=1000)
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 
> 200 (default=200)
> 16/04/19 08:49:52 INFO balancer.Balancer: 
> dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 
> 2147483648 (default=2147483648)
> 16/04/19 08:49:52 INFO balancer.Balancer: 
> dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
> 16/04/19 08:49:52 INFO block.BlockTokenSecretManager: Setting block keys
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 
> 10737418240 (default=10737418240)
> Apr 19, 2016 8:49:52 AM  Balancing took 1.408 seconds
> 16/04/19 08:49:52 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.NumberFormatException: For input string: "128m"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Long.parseLong(Long.java:589)
> at java.lang.Long.parseLong(Long.java:631)
> at 
> org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.getLong(Balancer.java:221)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.(Balancer.java:281)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:660)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:774)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:903)
> {code}
> However, the workaround for this is to run {{hdfs balancer}} with passing 
> numeric value for {{dfs.blocksize}} or change your {{hdfs-site.xml}}.
> {code}
> hdfs balancer -Ddfs.blocksize=134217728
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10319) Balancer should not try to pair storages with different types

2016-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252960#comment-15252960
 ] 

Hudson commented on HDFS-10319:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9650 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9650/])
HDFS-10319. Balancer should not try to pair storages with different (szetszwo: 
rev bbce1d525e0016c9b8e573b86af3c87aa39582bd)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java


> Balancer should not try to pair storages with different types
> -
>
> Key: HDFS-10319
> URL: https://issues.apache.org/jira/browse/HDFS-10319
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 2.7.3
>
> Attachments: h10319_20160420.patch
>
>
> This is a performance bug – Balancer may pair a source datanode and a target 
> datanode with different storage types. Fortunately, it will fail schedule any 
> blocks in such pair since it will find out that the storage types are not 
> matched later on.
> The bug won't lead to incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10309) Balancer doesn't honor dfs.blocksize value defined with suffix k(kilo), m(mega), g(giga)

2016-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-10309:
---
Summary: Balancer doesn't honor dfs.blocksize value defined with suffix 
k(kilo), m(mega), g(giga)  (was: HDFS Balancer doesn't honor dfs.blocksize 
value defined with suffix k(kilo), m(mega), g(giga))

> Balancer doesn't honor dfs.blocksize value defined with suffix k(kilo), 
> m(mega), g(giga)
> 
>
> Key: HDFS-10309
> URL: https://issues.apache.org/jira/browse/HDFS-10309
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Amit Anand
>Assignee: Amit Anand
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-10309.01.patch, HDFS-10309.02.patch
>
>
> While running HDFS Balancer I get error given below when {{dfs.blockSize}} is 
> defined with suffix {{k(kilo), m(mega), g(giga)}} in {{hdfs-site.xml}}. In my 
> deployment {{dfs.blocksize}} is set to {{128m}}. 
> {code}
> hdfs@bcpc-vm1:/home/ubuntu$ hdfs balancer
> 16/04/19 08:49:51 INFO balancer.Balancer: namenodes  = [hdfs://Test-Laptop]
> 16/04/19 08:49:51 INFO balancer.Balancer: parameters = 
> Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle 
> iteration = 5, #excluded nodes = 0, #included nodes = 0, #source 
> nodes = 0, #blockpools = 0, run during upgrade = false]
> 16/04/19 08:49:51 INFO balancer.Balancer: included nodes = []
> 16/04/19 08:49:51 INFO balancer.Balancer: excluded nodes = []
> 16/04/19 08:49:51 INFO balancer.Balancer: source nodes = []
> Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved
> 16/04/19 08:49:52 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 16/04/19 08:49:52 INFO block.BlockTokenSecretManager: Setting block keys
> 16/04/19 08:49:52 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 
> 540 (default=540)
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 
> (default=1000)
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 
> 200 (default=200)
> 16/04/19 08:49:52 INFO balancer.Balancer: 
> dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 
> 2147483648 (default=2147483648)
> 16/04/19 08:49:52 INFO balancer.Balancer: 
> dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
> 16/04/19 08:49:52 INFO block.BlockTokenSecretManager: Setting block keys
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 
> 10737418240 (default=10737418240)
> Apr 19, 2016 8:49:52 AM  Balancing took 1.408 seconds
> 16/04/19 08:49:52 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.NumberFormatException: For input string: "128m"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Long.parseLong(Long.java:589)
> at java.lang.Long.parseLong(Long.java:631)
> at 
> org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.getLong(Balancer.java:221)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.(Balancer.java:281)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:660)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:774)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:903)
> {code}
> However, the workaround for this is to run {{hdfs balancer}} with passing 
> numeric value for {{dfs.blocksize}} or change your {{hdfs-site.xml}}.
> {code}
> hdfs balancer -Ddfs.blocksize=134217728
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10309) HDFS Balancer doesn't honor dfs.blocksize value defined with suffix k(kilo), m(mega), g(giga)

2016-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-10309:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Amit!

> HDFS Balancer doesn't honor dfs.blocksize value defined with suffix k(kilo), 
> m(mega), g(giga)
> -
>
> Key: HDFS-10309
> URL: https://issues.apache.org/jira/browse/HDFS-10309
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Amit Anand
>Assignee: Amit Anand
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-10309.01.patch, HDFS-10309.02.patch
>
>
> While running HDFS Balancer I get error given below when {{dfs.blockSize}} is 
> defined with suffix {{k(kilo), m(mega), g(giga)}} in {{hdfs-site.xml}}. In my 
> deployment {{dfs.blocksize}} is set to {{128m}}. 
> {code}
> hdfs@bcpc-vm1:/home/ubuntu$ hdfs balancer
> 16/04/19 08:49:51 INFO balancer.Balancer: namenodes  = [hdfs://Test-Laptop]
> 16/04/19 08:49:51 INFO balancer.Balancer: parameters = 
> Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle 
> iteration = 5, #excluded nodes = 0, #included nodes = 0, #source 
> nodes = 0, #blockpools = 0, run during upgrade = false]
> 16/04/19 08:49:51 INFO balancer.Balancer: included nodes = []
> 16/04/19 08:49:51 INFO balancer.Balancer: excluded nodes = []
> 16/04/19 08:49:51 INFO balancer.Balancer: source nodes = []
> Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
> Bytes Being Moved
> 16/04/19 08:49:52 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 16/04/19 08:49:52 INFO block.BlockTokenSecretManager: Setting block keys
> 16/04/19 08:49:52 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 
> 540 (default=540)
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 
> (default=1000)
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 
> 200 (default=200)
> 16/04/19 08:49:52 INFO balancer.Balancer: 
> dfs.datanode.balance.max.concurrent.moves = 5 (default=5)
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 
> 2147483648 (default=2147483648)
> 16/04/19 08:49:52 INFO balancer.Balancer: 
> dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
> 16/04/19 08:49:52 INFO block.BlockTokenSecretManager: Setting block keys
> 16/04/19 08:49:52 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 
> 10737418240 (default=10737418240)
> Apr 19, 2016 8:49:52 AM  Balancing took 1.408 seconds
> 16/04/19 08:49:52 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.NumberFormatException: For input string: "128m"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Long.parseLong(Long.java:589)
> at java.lang.Long.parseLong(Long.java:631)
> at 
> org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1311)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.getLong(Balancer.java:221)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.(Balancer.java:281)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:660)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:774)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:903)
> {code}
> However, the workaround for this is to run {{hdfs balancer}} with passing 
> numeric value for {{dfs.blocksize}} or change your {{hdfs-site.xml}}.
> {code}
> hdfs balancer -Ddfs.blocksize=134217728
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252937#comment-15252937
 ] 

Hadoop QA commented on HDFS-10301:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 
217 unchanged - 0 fixed = 220 total (was 217) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 10s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 18s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 144m 44s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.hdfs.server.datanode.TestFsDatasetCache |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.server.namenode.TestEditLog |
|   | hadoop.hdfs.server.datanode.TestFsDatasetCache |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800061/HDFS-10301.002.patch |
| JIRA Issue | HDFS-10301 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| 

[jira] [Updated] (HDFS-10319) Balancer should not try to pair storages with different types

2016-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-10319:
---
   Resolution: Fixed
Fix Version/s: 2.7.3
   Status: Resolved  (was: Patch Available)

Thanks Chris for reviewing the patch.

I have committed this.

> Balancer should not try to pair storages with different types
> -
>
> Key: HDFS-10319
> URL: https://issues.apache.org/jira/browse/HDFS-10319
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Fix For: 2.7.3
>
> Attachments: h10319_20160420.patch
>
>
> This is a performance bug – Balancer may pair a source datanode and a target 
> datanode with different storage types. Fortunately, it will fail schedule any 
> blocks in such pair since it will find out that the storage types are not 
> matched later on.
> The bug won't lead to incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9869) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-2]

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252913#comment-15252913
 ] 

Hadoop QA commented on HDFS-9869:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 20 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 8m 9s {color} 
| {color:red} root-jdk1.8.0_77 with JDK v1.8.0_77 generated 2 new + 737 
unchanged - 2 fixed = 739 total (was 739) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 15m 5s {color} 
| {color:red} root-jdk1.7.0_95 with JDK v1.7.0_95 generated 2 new + 734 
unchanged - 2 fixed = 736 total (was 736) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 56s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 17s 
{color} | {color:red} root: patch generated 2 new + 774 unchanged - 4 fixed = 
776 total (was 778) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 4s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | 

[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver

2016-04-21 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252894#comment-15252894
 ] 

Arpit Agarwal commented on HDFS-10264:
--

Thanks for the backport [~shv]. I added back 2.7.3 to 'Fix Version/s' as these 
parallel release lines (no semantic versioning).

> Logging improvements in FSImageFormatProtobuf.Saver
> ---
>
> Key: HDFS-10264
> URL: https://issues.apache.org/jira/browse/HDFS-10264
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Xiaobing Zhou
>  Labels: newbie
> Fix For: 2.7.3, 2.6.5
>
> Attachments: HDFS-10264.000.patch, HDFS-10264.001.patch
>
>
> There are two missing LOG messages in {{FSImageFormat.Saver}} that are 
> missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of 
> fsimage saving. Would be good to have them logged for protobuf images as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver

2016-04-21 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-10264:
-
Fix Version/s: 2.7.3

> Logging improvements in FSImageFormatProtobuf.Saver
> ---
>
> Key: HDFS-10264
> URL: https://issues.apache.org/jira/browse/HDFS-10264
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Xiaobing Zhou
>  Labels: newbie
> Fix For: 2.7.3, 2.6.5
>
> Attachments: HDFS-10264.000.patch, HDFS-10264.001.patch
>
>
> There are two missing LOG messages in {{FSImageFormat.Saver}} that are 
> missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of 
> fsimage saving. Would be good to have them logged for protobuf images as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10301:

Summary: BlockReport retransmissions may lead to storages falsely being 
declared zombie if storage report processing happens out of order  (was: Blocks 
removed by thousands due to falsely detected zombie storages)

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.01.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver

2016-04-21 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-10264:
---
Fix Version/s: (was: 2.7.3)
   2.6.5

Committed to branch 2.6.

> Logging improvements in FSImageFormatProtobuf.Saver
> ---
>
> Key: HDFS-10264
> URL: https://issues.apache.org/jira/browse/HDFS-10264
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Xiaobing Zhou
>  Labels: newbie
> Fix For: 2.6.5
>
> Attachments: HDFS-10264.000.patch, HDFS-10264.001.patch
>
>
> There are two missing LOG messages in {{FSImageFormat.Saver}} that are 
> missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of 
> fsimage saving. Would be good to have them logged for protobuf images as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

2016-04-21 Thread Ben Podgursky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Podgursky updated HDFS-10323:
-
Description: 
After switching to using a ViewFileSystem, fs.deleteOnExit calls began failing 
frequently, displaying this error on failure:

16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for path 
/tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84

Since FileSystem eats the error involved, it is difficult to be sure what the 
error is, but I believe what is happening is that the ViewFileSystem’s child 
FileSystems are being close()’d before the ViewFileSystem, due to the random 
order ClientFinalizer closes FileSystems; so then when the ViewFileSystem tries 
to close(), it tries to forward the delete() calls to the appropriate child, 
and fails because the child is already closed.

I’m unsure how to write an actual Hadoop test to reproduce this, since it 
involves testing behavior on actual JVM shutdown.  However, I can verify that 
while

{code:java}
fs.deleteOnExit(randomTemporaryDir);

{code}

regularly (~50% of the time) fails to delete the temporary directory, this code:

{code:java}
ViewFileSystem viewfs = (ViewFileSystem)fs1;

for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
  
  if (fileSystem.exists(randomTemporaryDir)) {

fileSystem.deleteOnExit(randomTemporaryDir);
  
  }

}

{code}

always successfully deletes the temporary directory on JVM shutdown.

I am not very familiar with FileSystem inheritance hierarchies, but at first 
glance I see two ways to fix this behavior:

1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
FileSystem, and not hold onto that path itself.

2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
other FileSystems.  

Would appreciate any thoughts of whether this seems accurate, and thoughts (or 
help) on the fix.

  was:
After switching to using a ViewFileSystem, fs.deleteOnExit calls began failing 
frequently, displaying this error on failure:

16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for path 
/tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84

Since FileSystem eats the error involved, it is difficult to be sure what the 
error is, but I believe what is happening is that the ViewFileSystem’s child 
FileSystems are being close()’d before the ViewFileSystem, due to the random 
order ClientFinalizer closes FileSystems; so then when the ViewFileSystem tries 
to close(), it tries to forward the delete() calls to the appropriate child, 
and fails because the child is already closed.

I’m unsure how to write an actual Hadoop test to reproduce this, since it 
involves testing behavior on actual JVM shutdown.  However, I can verify that 
while

{code:java}
fs.deleteOnExit(randomTemporaryDir);

{code}

regularly (~50% of the time) fails to delete the temporary directory, this code:

{code:java}
ViewFileSystem viewfs = (ViewFileSystem)fs1;
for (FileSystem fileSystem : 
viewfs.getChildFileSystems()) {
  if (fileSystem.exists(randomTemporaryDir)) {
 
   fileSystem.deleteOnExit(randomTemporaryDir);
  }
}

{code}

always successfully deletes the temporary directory on JVM shutdown.

I am not very familiar with FileSystem inheritance hierarchies, but at first 
glance I see two ways to fix this behavior:

1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
FileSystem, and not hold onto that path itself.

2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
other FileSystems.  

Would appreciate any thoughts of whether this seems accurate, and thoughts (or 
help) on the fix.


> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> 
>
> Key: HDFS-10323
> URL: https://issues.apache.org/jira/browse/HDFS-10323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Ben Podgursky
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began 
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for 
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the 
> error is, but I believe what is happening is that the ViewFileSystem’s child 
> FileSystems are being close()’d before the ViewFileSystem, due to the random 
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem 
> tries to close(), it tries to forward the delete() calls to the appropriate 
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it 
> involves testing 

[jira] [Created] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

2016-04-21 Thread Ben Podgursky (JIRA)
Ben Podgursky created HDFS-10323:


 Summary: transient deleteOnExit failure in ViewFileSystem due to 
close() ordering
 Key: HDFS-10323
 URL: https://issues.apache.org/jira/browse/HDFS-10323
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: federation
Reporter: Ben Podgursky


After switching to using a ViewFileSystem, fs.deleteOnExit calls began failing 
frequently, displaying this error on failure:

16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for path 
/tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84

Since FileSystem eats the error involved, it is difficult to be sure what the 
error is, but I believe what is happening is that the ViewFileSystem’s child 
FileSystems are being close()’d before the ViewFileSystem, due to the random 
order ClientFinalizer closes FileSystems; so then when the ViewFileSystem tries 
to close(), it tries to forward the delete() calls to the appropriate child, 
and fails because the child is already closed.

I’m unsure how to write an actual Hadoop test to reproduce this, since it 
involves testing behavior on actual JVM shutdown.  However, I can verify that 
while

{code:java}
fs.deleteOnExit(randomTemporaryDir);

{code}

regularly (~50% of the time) fails to delete the temporary directory, this code:

{code:java}
ViewFileSystem viewfs = (ViewFileSystem)fs1;
for (FileSystem fileSystem : 
viewfs.getChildFileSystems()) {
  if (fileSystem.exists(randomTemporaryDir)) {
 
   fileSystem.deleteOnExit(randomTemporaryDir);
  }
}

{code}

always successfully deletes the temporary directory on JVM shutdown.

I am not very familiar with FileSystem inheritance hierarchies, but at first 
glance I see two ways to fix this behavior:

1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
FileSystem, and not hold onto that path itself.

2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
other FileSystems.  

Would appreciate any thoughts of whether this seems accurate, and thoughts (or 
help) on the fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252813#comment-15252813
 ] 

Yongjun Zhang commented on HDFS-10313:
--

Hi [~linyiqun],

Thanks for the new rev. A few more minor things below. I am +1 after they are 
addressed.

1.
{code}
117   throw new InvalidInputException("Snapshot not be found: " + nfe);
{code}
Add the following method in {{CopyListing.java}}
{code}
public InvalidInputException(String message, Throwable cause) {
  super(message, cause);
}
{code}
and change the call to
{code}
117   throw new InvalidInputException("Input snapshot is not found", 
nfe);
{code}

2. In {{DistCp#createAndSubmitJob()}}

{code}
 throw new InvalidInputException(
  "Distcp sync failed, because of invalid options: " + 
inputOptions);
{code}
DistCp sync may have failed for different reasons. Though I initially suggested 
{{InvalidInputException}}, I now think using {{Exception}} here is better. 
Sorry about that.


3. Add spaces by changing
{code}
731 try{
...
735 }catch(HadoopIllegalArgumentException e){
{code}

to

{code}
731 try {
...
735 } catch (HadoopIllegalArgumentException e) {
{code}

Thanks.

--Yongjun

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252804#comment-15252804
 ] 

Tsz Wo Nicholas Sze commented on HDFS-10224:


- Please also update statistics.

> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-21 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252797#comment-15252797
 ] 

Mingliang Liu commented on HDFS-10175:
--

Hi [~cmccabe], I think you proposed an innovative idea for supporting the 
per-operation stats. I'm willing to change the current design if your idea 
makes more sense in the long term. I'll prepare a full patch after review your 
sample code.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252796#comment-15252796
 ] 

Tsz Wo Nicholas Sze commented on HDFS-10224:


Thanks for working on this.  Some comments on the patch:
- Let's rename getMsgCallback() to getReturnMessageCallback(), 
getValueCallback() to getReturnValueCallback() and getFsStatistics() to 
getStatistics() .
- AsyncDistributedFileSystem and its public methods need javadoc.
- Please annotate AsyncDistributedFileSystem as @Unstable.
- VOID_INSTANCE is not needed.  Just return null.
- Please remove the unused imports added by the patch.


> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252790#comment-15252790
 ] 

Hadoop QA commented on HDFS-9958:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 
130 unchanged - 0 fixed = 132 total (was 130) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m 30s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 38s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 114m 9s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.hdfs.TestReadStripedFileWithDecoding |
|   | hadoop.hdfs.TestFileStatus |
|   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
|   | hadoop.hdfs.server.namenode.TestINodeFile |
|   | hadoop.fs.contract.hdfs.TestHDFSContractOpen |
|   | hadoop.hdfs.server.datanode.TestFsDatasetCache |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestDatanodeRestart |
|   | hadoop.hdfs.server.namenode.TestAuditLoggerWithCommands |
|   | hadoop.hdfs.TestFileCreationDelete |
|   | hadoop.hdfs.server.namenode.ha.TestHASafeMode |
|   | 

[jira] [Updated] (HDFS-10224) Implement asynchronous rename for DistributedFileSystem

2016-04-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-10224:
---
Description: This is proposed to implement an asynchronous 
DistributedFileSystem based on AsyncFileSystem APIs in HADOOP-12910. In 
addition, rename is implemented in this JIRA.  (was: This is proposed to 
implement an asynchronous DistributedFileSystem based on AsyncFileSystem APIs 
in HADOOP-12910. In addition, rename is implemented as well.)
Summary: Implement asynchronous rename for DistributedFileSystem  (was: 
Implement an asynchronous DistributedFileSystem)

> Implement asynchronous rename for DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10224) Implement an asynchronous DistributedFileSystem

2016-04-21 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10224:
-
Attachment: HDFS-10224-HDFS-9924.003.patch

> Implement an asynchronous DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10224) Implement an asynchronous DistributedFileSystem

2016-04-21 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10224:
-
Attachment: (was: HDFS-10224-HDFS-9924.003.patch)

> Implement an asynchronous DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver

2016-04-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252710#comment-15252710
 ] 

Konstantin Shvachko commented on HDFS-10264:


My bad.

> Logging improvements in FSImageFormatProtobuf.Saver
> ---
>
> Key: HDFS-10264
> URL: https://issues.apache.org/jira/browse/HDFS-10264
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Xiaobing Zhou
>  Labels: newbie
> Fix For: 2.7.3
>
> Attachments: HDFS-10264.000.patch, HDFS-10264.001.patch
>
>
> There are two missing LOG messages in {{FSImageFormat.Saver}} that are 
> missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of 
> fsimage saving. Would be good to have them logged for protobuf images as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver

2016-04-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252709#comment-15252709
 ] 

Konstantin Shvachko commented on HDFS-10264:


Sorry, please ignore the above: it is in 2.7.3. Merging into 2.6 now.

> Logging improvements in FSImageFormatProtobuf.Saver
> ---
>
> Key: HDFS-10264
> URL: https://issues.apache.org/jira/browse/HDFS-10264
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Xiaobing Zhou
>  Labels: newbie
> Fix For: 2.7.3
>
> Attachments: HDFS-10264.000.patch, HDFS-10264.001.patch
>
>
> There are two missing LOG messages in {{FSImageFormat.Saver}} that are 
> missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of 
> fsimage saving. Would be good to have them logged for protobuf images as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver

2016-04-21 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252705#comment-15252705
 ] 

Arpit Agarwal commented on HDFS-10264:
--

That's odd. Are you sure your repo is synced?
branch-2: 
https://git-wip-us.apache.org/repos/asf?p=hadoop.git;a=commit;h=b776db36f73b73097af52f98519b38aef8cc537d
branch-2.7: 
https://git-wip-us.apache.org/repos/asf?p=hadoop.git;a=commit;h=12c6c2c9a6e929b26b6fa1e0acdac24205c922f3
branch-2.8: 
https://git-wip-us.apache.org/repos/asf?p=hadoop.git;a=commit;h=a13628fe4a7ee0bc1d803cfc983cf30a6f6cb665

> Logging improvements in FSImageFormatProtobuf.Saver
> ---
>
> Key: HDFS-10264
> URL: https://issues.apache.org/jira/browse/HDFS-10264
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Xiaobing Zhou
>  Labels: newbie
> Fix For: 2.7.3
>
> Attachments: HDFS-10264.000.patch, HDFS-10264.001.patch
>
>
> There are two missing LOG messages in {{FSImageFormat.Saver}} that are 
> missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of 
> fsimage saving. Would be good to have them logged for protobuf images as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10264) Logging improvements in FSImageFormatProtobuf.Saver

2016-04-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252696#comment-15252696
 ] 

Konstantin Shvachko commented on HDFS-10264:


Hey [~arpitagarwal], it looks like you committed this only to trunk, but the 
Fix Version says 2.7.3?

> Logging improvements in FSImageFormatProtobuf.Saver
> ---
>
> Key: HDFS-10264
> URL: https://issues.apache.org/jira/browse/HDFS-10264
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Xiaobing Zhou
>  Labels: newbie
> Fix For: 2.7.3
>
> Attachments: HDFS-10264.000.patch, HDFS-10264.001.patch
>
>
> There are two missing LOG messages in {{FSImageFormat.Saver}} that are 
> missing in {{FSImageFormatProtobuf.Saver}}, which mark start and end of 
> fsimage saving. Would be good to have them logged for protobuf images as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10311) libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket

2016-04-21 Thread Stephen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252611#comment-15252611
 ] 

Stephen commented on HDFS-10311:


+1. Though it may be a good idea to release the mutex before logging. Seems 
easy enough to do.

{code}
 void DataNodeConnectionImpl::Cancel() {
-  conn_.reset();
+  mutex_guard state_lock(state_lock_);
+  std::string err = SafeDisconnect(conn_.get());
+  if(!err.empty()) {
+LOG_WARN(kBlockReader, << "Error disconnecting socket in 
DataNodeConnectionImpl::Cancel, " << err);
+  }
 }
{code}

> libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket
> -
>
> Key: HDFS-10311
> URL: https://issues.apache.org/jira/browse/HDFS-10311
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-10311.HDFS-8707.000.patch, 
> HDFS-10311.HDFS-8707.001.patch, HDFS-10311.HDFS-8707.002.patch
>
>
> DataNodeConnectionImpl calls reset on the unique_ptr that references the 
> underlying asio::tcp::socket.  If this happens after the continuation 
> pipeline checks the cancel state but before asio uses the socket it will 
> segfault because unique_ptr::reset will explicitly change it's value to 
> nullptr.
> Cancel should only call shutdown() and close() on the socket but keep the 
> instance of it alive.  The socket can probably also be turned into a member 
> of DataNodeConnectionImpl to get rid of the unique pointer and simplify 
> things a bit.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252612#comment-15252612
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

I have posted a new patch, which I posted as HDFS-10301.002.patch.  The idea 
here is that we know the number of storage reports we expect to see in the 
block report.  We should not be removing any storages as zombies unless we have 
seen this number of storages and marked these storages with the ID of the 
latest block report.

I feel that this approach is better than the one used in 001.patch, since it 
correctly handles the "interleaved" case.  It is very difficult to prove that 
we can never get interleaved storage reports for the DataNode.  This is because 
of issues like queuing inside the RPCs system, packets getting reordered or 
delayed by the network, and queuing inside the deferred work mechanism added by 
HDFS-9198.  So we should handle this case correctly.

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.01.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10260) TestFsDatasetImpl#testCleanShutdownOfVolume often fails

2016-04-21 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252606#comment-15252606
 ] 

Rushabh S Shah commented on HDFS-10260:
---

[~jojochuang]: do you have any additional comments on this patch ?

> TestFsDatasetImpl#testCleanShutdownOfVolume often fails
> ---
>
> Key: HDFS-10260
> URL: https://issues.apache.org/jira/browse/HDFS-10260
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, test
>Reporter: Wei-Chiu Chuang
>Assignee: Rushabh S Shah
> Attachments: HDFS-10260-v1.patch, HDFS-10260.patch
>
>
> This test failure occurs in upstream Jenkins. Looking at the test code, I 
> think it should be improved to capture the root cause of failure:
> E.g. change {{Thread.sleep(1000)}} to {{GenericTestUtils.waitFor}} and use 
> {{GenericTestUtils.assertExceptionContains}} to replace 
> {code}
> Assert.assertTrue(ioe.getMessage().contains(info.toString()));
> {code}
> https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/1062/testReport/junit/org.apache.hadoop.hdfs.server.datanode.fsdataset.impl/TestFsDatasetImpl/testCleanShutdownOfVolume/
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testCleanShutdownOfVolume(TestFsDatasetImpl.java:683)
> Standard Error
> Exception in thread "DataNode: 
> [[[DISK]file:/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Java8/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
>  
> [DISK]file:/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Java8/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
>   heartbeating to localhost/127.0.0.1:35113" java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1714)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdownBlockPool(FsDatasetImpl.java:2591)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:1479)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:411)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:494)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:749)
>   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9943) Support reconfiguring namenode replication confs

2016-04-21 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252601#comment-15252601
 ] 

Xiaobing Zhou commented on HDFS-9943:
-

Patch v003 is rebased on trunk. There are some refactoring added.

> Support reconfiguring namenode replication confs
> 
>
> Key: HDFS-9943
> URL: https://issues.apache.org/jira/browse/HDFS-9943
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9943-HDFS-9000.000.patch, 
> HDFS-9943-HDFS-9000.001.patch, HDFS-9943-HDFS-9000.002.patch, 
> HDFS-9943-HDFS-9000.003.patch
>
>
> The following confs should be re-configurable in runtime.
> - dfs.namenode.replication.work.multiplier.per.iteration
> - dfs.namenode.replication.interval
> - dfs.namenode.replication.max-streams
> - dfs.namenode.replication.max-streams-hard-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9943) Support reconfiguring namenode replication confs

2016-04-21 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-9943:

Attachment: HDFS-9943-HDFS-9000.003.patch

> Support reconfiguring namenode replication confs
> 
>
> Key: HDFS-9943
> URL: https://issues.apache.org/jira/browse/HDFS-9943
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9943-HDFS-9000.000.patch, 
> HDFS-9943-HDFS-9000.001.patch, HDFS-9943-HDFS-9000.002.patch, 
> HDFS-9943-HDFS-9000.003.patch
>
>
> The following confs should be re-configurable in runtime.
> - dfs.namenode.replication.work.multiplier.per.iteration
> - dfs.namenode.replication.interval
> - dfs.namenode.replication.max-streams
> - dfs.namenode.replication.max-streams-hard-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.

2016-04-21 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah resolved HDFS-10305.
---
Resolution: Won't Fix

Closing this as Won't Fix

> Hdfs audit shouldn't log mkdir operaton if the directory already exists.
> 
>
> Key: HDFS-10305
> URL: https://issues.apache.org/jira/browse/HDFS-10305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Minor
>
> Currently Hdfs audit logs mkdir operation even if the directory already 
> exists.
> This creates confusion while analyzing audit logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-21 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10301:

Attachment: HDFS-10301.002.patch

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.01.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10311) libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252581#comment-15252581
 ] 

Hadoop QA commented on HDFS-10311:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 34s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
39s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 44s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 45s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 39s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 42s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 47s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800017/HDFS-10311.HDFS-8707.002.patch
 |
| JIRA Issue | HDFS-10311 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux bac4cbd3f297 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / d8653c8 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_77 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| JDK v1.7.0_95  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15240/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15240/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket
> -
>
> Key: HDFS-10311
> URL: https://issues.apache.org/jira/browse/HDFS-10311

[jira] [Commented] (HDFS-9670) DistCp throws NPE when source is root

2016-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252531#comment-15252531
 ] 

Hudson commented on HDFS-9670:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9647 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9647/])
HDFS-9670. DistCp throws NPE when source is root. (John Zhuge via (yzhang: rev 
a749ba0ceaa843aa83146b6bea19e031c8dc3296)
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSystem.java


> DistCp throws NPE when source is root
> -
>
> Key: HDFS-9670
> URL: https://issues.apache.org/jira/browse/HDFS-9670
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9670.001.patch, HDFS-9670.002.patch
>
>
> Symptom:
> {quote}
> [root@vb0724 ~]# hadoop distcp hdfs://X:8020/ hdfs://Y:8020/
> 16/01/20 11:33:33 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[hdfs://X:8020/], targetPath=hdfs://Y:8020/, 
> targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
> 16/01/20 11:33:33 INFO client.RMProxy: Connecting to ResourceManager at Z:8032
> 16/01/20 11:33:33 ERROR tools.DistCp: Exception encountered 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:598)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListingRoot(SimpleCopyListing.java:583)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:313)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
> {quote}
> Relevant code:
> {code}
>   private Path computeSourceRootPath(FileStatus sourceStatus,
>  DistCpOptions options) throws 
> IOException {
> Path target = options.getTargetPath();
> FileSystem targetFS = target.getFileSystem(getConf());
> final boolean targetPathExists = options.getTargetPathExists();
> boolean solitaryFile = options.getSourcePaths().size() == 1
> && 
> !sourceStatus.isDirectory();
> if (solitaryFile) {
>   if (targetFS.isFile(target) || !targetPathExists) {
> return sourceStatus.getPath();
>   } else {
> return sourceStatus.getPath().getParent();
>   }
> } else {
>   boolean specialHandling = (options.getSourcePaths().size() == 1 && 
> !targetPathExists) ||
>   options.shouldSyncFolder() || options.shouldOverwrite();
>   return specialHandling && sourceStatus.isDirectory() ? 
> sourceStatus.getPath() :
>   sourceStatus.getPath().getParent();
> }
>   }
> {code}
> We can see that it could return NULL at the end when doing 
> {{sourceStatus.getPath().getParent()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9670) DistCp throws NPE when source is root

2016-04-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252525#comment-15252525
 ] 

Yongjun Zhang commented on HDFS-9670:
-

Committed to trunk, branch-2, branch-2.8.

Thanks [~jzhuge] much for the contribution!


> DistCp throws NPE when source is root
> -
>
> Key: HDFS-9670
> URL: https://issues.apache.org/jira/browse/HDFS-9670
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9670.001.patch, HDFS-9670.002.patch
>
>
> Symptom:
> {quote}
> [root@vb0724 ~]# hadoop distcp hdfs://X:8020/ hdfs://Y:8020/
> 16/01/20 11:33:33 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[hdfs://X:8020/], targetPath=hdfs://Y:8020/, 
> targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
> 16/01/20 11:33:33 INFO client.RMProxy: Connecting to ResourceManager at Z:8032
> 16/01/20 11:33:33 ERROR tools.DistCp: Exception encountered 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:598)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListingRoot(SimpleCopyListing.java:583)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:313)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
> {quote}
> Relevant code:
> {code}
>   private Path computeSourceRootPath(FileStatus sourceStatus,
>  DistCpOptions options) throws 
> IOException {
> Path target = options.getTargetPath();
> FileSystem targetFS = target.getFileSystem(getConf());
> final boolean targetPathExists = options.getTargetPathExists();
> boolean solitaryFile = options.getSourcePaths().size() == 1
> && 
> !sourceStatus.isDirectory();
> if (solitaryFile) {
>   if (targetFS.isFile(target) || !targetPathExists) {
> return sourceStatus.getPath();
>   } else {
> return sourceStatus.getPath().getParent();
>   }
> } else {
>   boolean specialHandling = (options.getSourcePaths().size() == 1 && 
> !targetPathExists) ||
>   options.shouldSyncFolder() || options.shouldOverwrite();
>   return specialHandling && sourceStatus.isDirectory() ? 
> sourceStatus.getPath() :
>   sourceStatus.getPath().getParent();
> }
>   }
> {code}
> We can see that it could return NULL at the end when doing 
> {{sourceStatus.getPath().getParent()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9670) DistCp throws NPE when source is root

2016-04-21 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252527#comment-15252527
 ] 

John Zhuge commented on HDFS-9670:
--

Thanks [~yzhangal] for reporting, reviewing, and committing the jira.

> DistCp throws NPE when source is root
> -
>
> Key: HDFS-9670
> URL: https://issues.apache.org/jira/browse/HDFS-9670
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9670.001.patch, HDFS-9670.002.patch
>
>
> Symptom:
> {quote}
> [root@vb0724 ~]# hadoop distcp hdfs://X:8020/ hdfs://Y:8020/
> 16/01/20 11:33:33 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[hdfs://X:8020/], targetPath=hdfs://Y:8020/, 
> targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
> 16/01/20 11:33:33 INFO client.RMProxy: Connecting to ResourceManager at Z:8032
> 16/01/20 11:33:33 ERROR tools.DistCp: Exception encountered 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:598)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListingRoot(SimpleCopyListing.java:583)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:313)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
> {quote}
> Relevant code:
> {code}
>   private Path computeSourceRootPath(FileStatus sourceStatus,
>  DistCpOptions options) throws 
> IOException {
> Path target = options.getTargetPath();
> FileSystem targetFS = target.getFileSystem(getConf());
> final boolean targetPathExists = options.getTargetPathExists();
> boolean solitaryFile = options.getSourcePaths().size() == 1
> && 
> !sourceStatus.isDirectory();
> if (solitaryFile) {
>   if (targetFS.isFile(target) || !targetPathExists) {
> return sourceStatus.getPath();
>   } else {
> return sourceStatus.getPath().getParent();
>   }
> } else {
>   boolean specialHandling = (options.getSourcePaths().size() == 1 && 
> !targetPathExists) ||
>   options.shouldSyncFolder() || options.shouldOverwrite();
>   return specialHandling && sourceStatus.isDirectory() ? 
> sourceStatus.getPath() :
>   sourceStatus.getPath().getParent();
> }
>   }
> {code}
> We can see that it could return NULL at the end when doing 
> {{sourceStatus.getPath().getParent()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9670) DistCp throws NPE when source is root

2016-04-21 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9670:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> DistCp throws NPE when source is root
> -
>
> Key: HDFS-9670
> URL: https://issues.apache.org/jira/browse/HDFS-9670
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9670.001.patch, HDFS-9670.002.patch
>
>
> Symptom:
> {quote}
> [root@vb0724 ~]# hadoop distcp hdfs://X:8020/ hdfs://Y:8020/
> 16/01/20 11:33:33 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
> copyStrategy='uniformsize', sourceFileListing=null, 
> sourcePaths=[hdfs://X:8020/], targetPath=hdfs://Y:8020/, 
> targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
> 16/01/20 11:33:33 INFO client.RMProxy: Connecting to ResourceManager at Z:8032
> 16/01/20 11:33:33 ERROR tools.DistCp: Exception encountered 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:598)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.writeToFileListingRoot(SimpleCopyListing.java:583)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:313)
>   at 
> org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)
> {quote}
> Relevant code:
> {code}
>   private Path computeSourceRootPath(FileStatus sourceStatus,
>  DistCpOptions options) throws 
> IOException {
> Path target = options.getTargetPath();
> FileSystem targetFS = target.getFileSystem(getConf());
> final boolean targetPathExists = options.getTargetPathExists();
> boolean solitaryFile = options.getSourcePaths().size() == 1
> && 
> !sourceStatus.isDirectory();
> if (solitaryFile) {
>   if (targetFS.isFile(target) || !targetPathExists) {
> return sourceStatus.getPath();
>   } else {
> return sourceStatus.getPath().getParent();
>   }
> } else {
>   boolean specialHandling = (options.getSourcePaths().size() == 1 && 
> !targetPathExists) ||
>   options.shouldSyncFolder() || options.shouldOverwrite();
>   return specialHandling && sourceStatus.isDirectory() ? 
> sourceStatus.getPath() :
>   sourceStatus.getPath().getParent();
> }
>   }
> {code}
> We can see that it could return NULL at the end when doing 
> {{sourceStatus.getPath().getParent()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-21 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252521#comment-15252521
 ] 

Ravi Prakash commented on HDFS-10220:
-

Anyway! I am not going to be a stickler on this. In the interest of getting the 
patch committed I'm fine with either way.

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-21 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252517#comment-15252517
 ] 

Ravi Prakash commented on HDFS-10220:
-

bq.  It's about implemention detail
I've heard that argument before and I honestly don't know what that means. We 
have a lot of "implementation details" which are tunable via configurations. I 
don't see anything bad about that. The user is not expected to know anything 
about those configurations, cluster administrators are! Without these 
configurations, cluster admins have no knobs to turn to fix issues they see. 

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9894) Add unsetStoragePolicy API to FileContext/AbstractFileSystem and derivatives

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252493#comment-15252493
 ] 

Hadoop QA commented on HDFS-9894:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 59s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 49s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 21s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 25s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | 

[jira] [Commented] (HDFS-10224) Implement an asynchronous DistributedFileSystem

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252494#comment-15252494
 ] 

Hadoop QA commented on HDFS-10224:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} HDFS-10224 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800024/HDFS-10224-HDFS-9924.003.patch
 |
| JIRA Issue | HDFS-10224 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15242/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Implement an asynchronous DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252475#comment-15252475
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

Hmm.  This is a challenging one.  [~walter.k.su], I think I agree that the 
queue added in HDFS-9198 might be part of the problem here.  In CDH, we haven't 
yet backported the deferred queuing stuff implemented in HDFS-9198, which might 
explain why we never saw this.  Since we don't have a queue, and since NN RPCs 
are almost always handled in the order they arrive, CDH5 doesn't implement 
"reordering" of resent storage reports.

Independently of this bug, I do think it's concerning that the DN keeps piling 
on retransmissions of FBRs even before the old ones were processed and 
acknowledged.  This kind of behavior will obviously lead to congestion collapse 
if congestion is what caused the original FBRs to be processed but not 
acknowledged.

{code}
void enqueue(List actions) throws InterruptedException {
  synchronized (queue) {
for (Runnable action : actions) {
  if (!queue.offer(action)) {
if (!isAlive() && namesystem.isRunning()) {
  ExitUtil.terminate(1, getName() + " is not running");
}
long now = Time.monotonicNow();
if (now - lastFull > 4000) {
  lastFull = now;
  LOG.info("Block report queue is full");
}
queue.put(action);
  }
}
  }
}
  }
{code}
This is going to be problematic when contention gets high, because threads will 
spend a long time waiting to enter the {{synchronized (queue)}} section.  And 
this will not be logged or reflected back to the admin in any way.  
Unfortunately, the operation that you want here, the ability to atomically add 
a bunch of items to the {{BlockingQueue}}, simply is not provided by 
{{BlockingQueue}}.  The solution also seems somewhat brittle since reordering 
could happen because of network issues in a multi-RPC BlockReport.

I'm thinking about this a little more, and it seems like the root of the 
problem is that in the single-RPC case, we're throwing away the information 
about how many storages were in the original report.  We need to find a way to 
include that information in there...

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252453#comment-15252453
 ] 

Steve Loughran commented on HDFS-9732:
--

I'm happy with this, you've done some great work here.

[~aw] do you have any issues with the patch as is? Notice how there are even 
unit tests to verify the output is stable, which is something no other bit of 
the code has

> Remove DelegationTokenIdentifier.toString() —for better logging output
> --
>
> Key: HDFS-9732
> URL: https://issues.apache.org/jira/browse/HDFS-9732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Yongjun Zhang
> Attachments: HADOOP-12752-001.patch, HDFS-9732-000.patch, 
> HDFS-9732.001.patch, HDFS-9732.002.patch, HDFS-9732.003.patch, 
> HDFS-9732.004.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> HDFS {{DelegationTokenIdentifier.toString()}} adds some diagnostics info, 
> owner, sequence number. But its superclass,  
> {{AbstractDelegationTokenIdentifier}} contains a lot more information, 
> including token issue and expiry times.
> Because  {{DelegationTokenIdentifier.toString()}} doesn't include this data,
> information that is potentially useful for kerberos diagnostics is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9869) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-2]

2016-04-21 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252438#comment-15252438
 ] 

Rakesh R commented on HDFS-9869:


Attached new patch, where I've changed the way of deprecation. [~zhz] could you 
please verify this section. Thanks!

> Erasure Coding: Rename replication-based names in BlockManager to more 
> generic [part-2]
> ---
>
> Key: HDFS-9869
> URL: https://issues.apache.org/jira/browse/HDFS-9869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Rakesh R
>Assignee: Rakesh R
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-9869-001.patch, HDFS-9869-002.patch, 
> HDFS-9869-003.patch, HDFS-9869-004.patch, HDFS-9869-005.patch, 
> HDFS-9869-006.patch, HDFS-9869-007.patch
>
>
> The idea of this jira is to rename the following entities in BlockManager as,
> - {{PendingReplicationBlocks}} to {{PendingReconstructionBlocks}}
> - {{excessReplicateMap}} to {{extraRedundancyMap}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9869) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-2]

2016-04-21 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-9869:
---
Attachment: HDFS-9869-007.patch

> Erasure Coding: Rename replication-based names in BlockManager to more 
> generic [part-2]
> ---
>
> Key: HDFS-9869
> URL: https://issues.apache.org/jira/browse/HDFS-9869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Rakesh R
>Assignee: Rakesh R
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-9869-001.patch, HDFS-9869-002.patch, 
> HDFS-9869-003.patch, HDFS-9869-004.patch, HDFS-9869-005.patch, 
> HDFS-9869-006.patch, HDFS-9869-007.patch
>
>
> The idea of this jira is to rename the following entities in BlockManager as,
> - {{PendingReplicationBlocks}} to {{PendingReconstructionBlocks}}
> - {{excessReplicateMap}} to {{extraRedundancyMap}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10224) Implement an asynchronous DistributedFileSystem

2016-04-21 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252424#comment-15252424
 ] 

Xiaobing Zhou commented on HDFS-10224:
--

I posted patch v003:
1. removed unnecessary usage of Future, instead, used Callable
2. added some more tests.
3. did some other refactoring.

> Implement an asynchronous DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10224) Implement an asynchronous DistributedFileSystem

2016-04-21 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-10224:
-
Attachment: HDFS-10224-HDFS-9924.003.patch

> Implement an asynchronous DistributedFileSystem
> ---
>
> Key: HDFS-10224
> URL: https://issues.apache.org/jira/browse/HDFS-10224
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10224-HDFS-9924.000.patch, 
> HDFS-10224-HDFS-9924.001.patch, HDFS-10224-HDFS-9924.002.patch, 
> HDFS-10224-HDFS-9924.003.patch, HDFS-10224-and-HADOOP-12909.000.patch
>
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910. In addition, rename is implemented as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252413#comment-15252413
 ] 

Yongjun Zhang commented on HDFS-9732:
-

Hi [~ste...@apache.org],

Thanks a lot for your review, all good comments. I just uploaded rev 004 that 
tries to address all your comments. Would you please help taking a look when 
you have chance? thanks.





> Remove DelegationTokenIdentifier.toString() —for better logging output
> --
>
> Key: HDFS-9732
> URL: https://issues.apache.org/jira/browse/HDFS-9732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Yongjun Zhang
> Attachments: HADOOP-12752-001.patch, HDFS-9732-000.patch, 
> HDFS-9732.001.patch, HDFS-9732.002.patch, HDFS-9732.003.patch, 
> HDFS-9732.004.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> HDFS {{DelegationTokenIdentifier.toString()}} adds some diagnostics info, 
> owner, sequence number. But its superclass,  
> {{AbstractDelegationTokenIdentifier}} contains a lot more information, 
> including token issue and expiry times.
> Because  {{DelegationTokenIdentifier.toString()}} doesn't include this data,
> information that is potentially useful for kerberos diagnostics is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-21 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9732:

Attachment: HDFS-9732.004.patch

> Remove DelegationTokenIdentifier.toString() —for better logging output
> --
>
> Key: HDFS-9732
> URL: https://issues.apache.org/jira/browse/HDFS-9732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Yongjun Zhang
> Attachments: HADOOP-12752-001.patch, HDFS-9732-000.patch, 
> HDFS-9732.001.patch, HDFS-9732.002.patch, HDFS-9732.003.patch, 
> HDFS-9732.004.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> HDFS {{DelegationTokenIdentifier.toString()}} adds some diagnostics info, 
> owner, sequence number. But its superclass,  
> {{AbstractDelegationTokenIdentifier}} contains a lot more information, 
> including token issue and expiry times.
> Because  {{DelegationTokenIdentifier.toString()}} doesn't include this data,
> information that is potentially useful for kerberos diagnostics is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-21 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9732:

Attachment: (was: HDFS-9732.004.patch)

> Remove DelegationTokenIdentifier.toString() —for better logging output
> --
>
> Key: HDFS-9732
> URL: https://issues.apache.org/jira/browse/HDFS-9732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Yongjun Zhang
> Attachments: HADOOP-12752-001.patch, HDFS-9732-000.patch, 
> HDFS-9732.001.patch, HDFS-9732.002.patch, HDFS-9732.003.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> HDFS {{DelegationTokenIdentifier.toString()}} adds some diagnostics info, 
> owner, sequence number. But its superclass,  
> {{AbstractDelegationTokenIdentifier}} contains a lot more information, 
> including token issue and expiry times.
> Because  {{DelegationTokenIdentifier.toString()}} doesn't include this data,
> information that is potentially useful for kerberos diagnostics is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9732) Remove DelegationTokenIdentifier.toString() —for better logging output

2016-04-21 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9732:

Attachment: HDFS-9732.004.patch

> Remove DelegationTokenIdentifier.toString() —for better logging output
> --
>
> Key: HDFS-9732
> URL: https://issues.apache.org/jira/browse/HDFS-9732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Yongjun Zhang
> Attachments: HADOOP-12752-001.patch, HDFS-9732-000.patch, 
> HDFS-9732.001.patch, HDFS-9732.002.patch, HDFS-9732.003.patch, 
> HDFS-9732.004.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> HDFS {{DelegationTokenIdentifier.toString()}} adds some diagnostics info, 
> owner, sequence number. But its superclass,  
> {{AbstractDelegationTokenIdentifier}} contains a lot more information, 
> including token issue and expiry times.
> Because  {{DelegationTokenIdentifier.toString()}} doesn't include this data,
> information that is potentially useful for kerberos diagnostics is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) Blocks removed by thousands due to falsely detected zombie storages

2016-04-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252377#comment-15252377
 ] 

Konstantin Shvachko commented on HDFS-10301:


Hey Walter, your patch looks good by itself, but it does not address the bug in 
the zombie storage recognition.
Took me some time to review your patch, would have been easier if you explained 
your approach.
So your patch is reordering block reports for different storages in such a way 
that storages from the same report are placed as a contiguous segment in the 
block report queue, so that processing of different BRs is not interleaved. 
This addresses Daryn's comment rather than solving the reported bug, as BTW 
Daryn correctly stated.
If you want to go forward with reordering of BRs you should probably do it in 
another issue. I personally am not a supporter because
# It introduces an unnecessary restriction on the order of execution of block 
reports, and
# adds even more complexity to BR processing logic.

I see the main problem here that block reports used to be idempotent per 
storage, but HDFS-7960 made execution of a subsequent storage dependent on the 
state produced during execution of the previous ones. I think idempotent is 
good, and we should keep it. I think we can mitigate the problem by one of the 
following
# Changing the criteria of zombie storage recognition. Why should it depend on 
block report IDs?
# Eliminating the notion of zombie storage altogether. E.g., NN can DN to run 
{{DirectoryScanner}} if NN thinks DN's state is outdated.
# Try to move {{curBlockReportId}} from {{DatanodeDescriptor}} to 
{{StorageInfo}}, which will eliminate global state between storages.

Also if we cannot come up with a quick solution, then we should probably roll 
back HDFS-7960 for now and revisit it later, because this is a critical bug 
effecting all of our latest releases. And that is a lot of clusters and PBs out 
there.

> Blocks removed by thousands due to falsely detected zombie storages
> ---
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10311) libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket

2016-04-21 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10311:
---
Attachment: HDFS-10311.HDFS-8707.002.patch

New patch addressing [~bobhansen]'s comments
-got rid of extra is_open
-return e.what
-don't hold lock before event hooks

> libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket
> -
>
> Key: HDFS-10311
> URL: https://issues.apache.org/jira/browse/HDFS-10311
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-10311.HDFS-8707.000.patch, 
> HDFS-10311.HDFS-8707.001.patch, HDFS-10311.HDFS-8707.002.patch
>
>
> DataNodeConnectionImpl calls reset on the unique_ptr that references the 
> underlying asio::tcp::socket.  If this happens after the continuation 
> pipeline checks the cancel state but before asio uses the socket it will 
> segfault because unique_ptr::reset will explicitly change it's value to 
> nullptr.
> Cancel should only call shutdown() and close() on the socket but keep the 
> instance of it alive.  The socket can probably also be turned into a member 
> of DataNodeConnectionImpl to get rid of the unique pointer and simplify 
> things a bit.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system

2016-04-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252320#comment-15252320
 ] 

Zhe Zhang commented on HDFS-9806:
-

Thanks Chris, I think we were drafting comments at same time :) Looking forward 
to the design doc!

> Allow HDFS block replicas to be provided by an external storage system
> --
>
> Key: HDFS-9806
> URL: https://issues.apache.org/jira/browse/HDFS-9806
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Chris Douglas
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system

2016-04-21 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252312#comment-15252312
 ] 

Zhe Zhang commented on HDFS-9806:
-

Thanks [~cnauroth] for the discussion! Yes, HADOOP-12077 and my PoC are indeed 
best suited for multi-HDFS scenarios -- BTW I also started reviewing the 
HADOOP-12077 patch but haven't finished yet. The file permissions issue is very 
interesting. I was trying to summarize a list of gaps like this between HDFS 
and {{FileSystem}}.

I think it's possible to stage data on file level, while providing a true HDFS 
to applications. We essentially want to guarantee that _all operations go 
through the upper HDFS layer_, including reads, writes, and metadata ops. For 
writing we should always create files in HDFS and keep all UC files in HDFS. 
The tricky part is how to handle files that are finalized and "staged out" to 
the external store. 

One option is to follow the Linux VFS-PageCache-ExtFS model, and persist an 
{{INodeFile}} somewhere when it's staged out. E.g. we can serialize the inode 
and put it at the beginning of the external blob. This way, the reading logic 
is to always restore the file in HDFS first, and then read from the HDFS file. 
With this option, directory operations are a little tricky. Since in this 
architecture the role of the external store is to provide mass storage, we 
probably shouldn't stage out any directory. When a directory is accessed, e.g. 
via {{listStatus}}, we need a way to know that some of its files are staged 
out. A dump method is to always check the same directory on the external store 
and see if there's a gap between list of children files -- in case of gaps, 
always restore all children files.

So another option is to keep an {{INodeFile}} in NN memory when the file is 
staged out, but set an {{XAttr}} which serializes original {{INodeFile}} info 
such as permissions. This way for metadata operations such as directory 
listing, the data can be returned from the {{XAttr}} without staging the actual 
blob from external store.

So basically, my 2¢ is that _faking a file in HDFS_ is easier than doing so for 
a block, because of all the complex block tracking and management logic.

> Allow HDFS block replicas to be provided by an external storage system
> --
>
> Key: HDFS-9806
> URL: https://issues.apache.org/jira/browse/HDFS-9806
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Chris Douglas
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-04-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252292#comment-15252292
 ] 

Hudson commented on HDFS-10207:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9644 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9644/])
HDFS-10207. Support enable Hadoop IPC backoff without namenode restart. (xyao: 
rev b4be288c5d6801988f555a566c2eb793c88a15a4)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdmin.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallQueueManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeReconfigure.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java


> Support enable Hadoop IPC backoff without namenode restart
> --
>
> Key: HDFS-10207
> URL: https://issues.apache.org/jira/browse/HDFS-10207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-10207-HDFS-9000.000.patch, 
> HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch, 
> HDFS-10207-HDFS-9000.003.patch, HDFS-10207-HDFS-9000.004.patch, 
> HDFS-10207-HDFS-9000.005.patch, HDFS-10207-HDFS-9000.006.patch, 
> HDFS-10207-HDFS-9000.007.patch, HDFS-10207-HDFS-9000.008.patch
>
>
> It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a 
> namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10319) Balancer should not try to pair storages with different types

2016-04-21 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-10319:
-
Hadoop Flags: Reviewed

+1 for the patch.  The test failures look unrelated.  Thank you, [~szetszwo].

> Balancer should not try to pair storages with different types
> -
>
> Key: HDFS-10319
> URL: https://issues.apache.org/jira/browse/HDFS-10319
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h10319_20160420.patch
>
>
> This is a performance bug – Balancer may pair a source datanode and a target 
> datanode with different storage types. Fortunately, it will fail schedule any 
> blocks in such pair since it will find out that the storage types are not 
> matched later on.
> The bug won't lead to incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-04-21 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10207:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

Thanks [~xiaobingo] for the contribution. I've committed the patch to trunk and 
branch-2.9.

> Support enable Hadoop IPC backoff without namenode restart
> --
>
> Key: HDFS-10207
> URL: https://issues.apache.org/jira/browse/HDFS-10207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-10207-HDFS-9000.000.patch, 
> HDFS-10207-HDFS-9000.001.patch, HDFS-10207-HDFS-9000.002.patch, 
> HDFS-10207-HDFS-9000.003.patch, HDFS-10207-HDFS-9000.004.patch, 
> HDFS-10207-HDFS-9000.005.patch, HDFS-10207-HDFS-9000.006.patch, 
> HDFS-10207-HDFS-9000.007.patch, HDFS-10207-HDFS-9000.008.patch
>
>
> It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a 
> namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system

2016-04-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252213#comment-15252213
 ] 

Chris Douglas commented on HDFS-9806:
-

We started down that path, citing similar reasoning. [~jghoman] and I even 
implemented a similar prototype. We elected to abandon a client-driven approach 
for one driven by the infrastructure, for a few reasons:
* Like ViewFS, each client maintaining its own mount tables is powerful, but 
that flexibility interferes with sharing. Defining consistency between storage 
tiers when each client has an idiosyncratic mapping is not trivial. For 
example, if two clients map a path in {{smallFS}} to a different location in 
{{bigFS}}, the result of many operations will be undefined. By contrast, if 
that mapping is part of HDFS, then one class of potential conflicts is 
obviated. Conflicts still exist, but these will be between concurrent writers 
to {{bigFS}}.
* Fault-tolerant, client-driven write-through caching is difficult, and in some 
cases impossible, to implement. Unless other clients (that must share the same 
mapping) can recover operations from {{smallFS}} to {{bigFS}}, client failures 
will create inconsistency. For example, if a client appends to {{smallFS}} and 
fails- or is partitioned from {{bigFS}}, credentials expire, etc.- what 
component will recover the operation? If recovery is implemented in the client, 
then another client appending to {{smallFS}} must first replay that operation 
in {{bigFS}}.
* Migrating data, evicting data from the cache, quotas, etc. are all 
expressible using _existing_ HDFS machinery. If external storage complements 
the existing abstractions, then it is not yet another service in Hadoop 
clusters, neither will it need to re-implement (and require reconciliation) 
with the functionality already written and debugged in HDFS.
* Some storage systems have simpler security models than HDFS, often a single 
key. To avoid giving every client access to the {{bigFS}} storage account 
(breaking HDFS security), an operator can embed the credentials in HDFS.
* Not all external storage systems will be FileSystems. Many sync operations 
are much easier when blocks are mapped to objects, rather than file regions.

You raise a good point about the granularity of caching policies. Files, 
directories, and blocks are all viable. The policy that directs the content of 
the cache need not match the mechanism; even if we serve by block we may keep 
metrics at the file, or even directory level.

We'll add more detail to the design doc on this point (apologies for its delay; 
my fault).

> Allow HDFS block replicas to be provided by an external storage system
> --
>
> Key: HDFS-9806
> URL: https://issues.apache.org/jira/browse/HDFS-9806
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Chris Douglas
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-21 Thread Nicolas Fraison (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252179#comment-15252179
 ] 

Nicolas Fraison commented on HDFS-10220:


[~walter.k.su]
1. I think that it simplify the code
2. Ok I will removed this change as we only called it once since it is removed 
from loop lease break
3. Ok I will change it to DEBUG
4. Counting the time since better in term of funcionnality but I'm afraid about 
adding extra computation time on this check compare to a simple count of files. 
The idea is not to spend more times to release those lease. What is your 
feeling about it?
5. It should solve the failover issue but it will let the lock being held 
during this release of lease, slowing donw/stucking operation on the namenode

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252124#comment-15252124
 ] 

Chris Nauroth commented on HDFS-10322:
--

Hello [~chenfolin].  In addition to HADOOP-11802, there have been multiple 
other bug fixes in this area in the past year: HADOOP-11333, HADOOP-11604, 
HADOOP-11648 and HDFS-8429.  The affects version here is listed as 2.5.0.  If 
that's the version you are running, then it doesn't have all of these fixes.  I 
recommend reviewing those to see if they look relevant to what you're seeing.  
HADOOP-11333 in particular looks relevant.  If it turns out this is already 
fixed, then please resolve this as a duplicate.  If you still think there is an 
unfixed bug remaining, can you please provide additional details?  Thank you.

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252118#comment-15252118
 ] 

Hadoop QA commented on HDFS-9958:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 
130 unchanged - 0 fixed = 132 total (was 130) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 50s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 11s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 27s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 54s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.TestReservedRawPaths |
|   | hadoop.hdfs.server.namenode.snapshot.TestUpdatePipelineWithSnapshots |
|   | hadoop.hdfs.TestModTime |
|   | hadoop.fs.TestUrlStreamHandler |
|   | hadoop.hdfs.security.TestDelegationToken |
|   | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead |
|   | hadoop.hdfs.server.namenode.TestFileLimit |
|   | hadoop.hdfs.TestParallelShortCircuitRead |
|   | hadoop.hdfs.server.namenode.snapshot.TestFileContextSnapshot |
|   | hadoop.TestRefreshCallQueue |
|   | hadoop.cli.TestCryptoAdminCLI |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | 

[jira] [Commented] (HDFS-9806) Allow HDFS block replicas to be provided by an external storage system

2016-04-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252105#comment-15252105
 ] 

Chris Nauroth commented on HDFS-9806:
-

Hello [~zhz].  Have you seen HADOOP-12077?  It's very similar to what you 
described: extensions on ViewFs for automatic backup replication and/or 
failover of files to alternative file systems.  Twitter has been using 
HADOOP-12077 effectively for a while.  It's very interesting work.  
Unfortunately, I haven't been able to free up personal bandwidth to review and 
commit it.

However, I think the proposal here on HDFS-9806 has the potential to provide 
benefits different from your PoC/HADOOP-12077.  The alternative file systems 
commonly suffer from limitations related to lack of complete implementation of 
HDFS features.  One common example I encounter a lot is file permissions.  With 
S3A and WASB, the authentication model is based on credentials granting access 
to a specific bucket/storage account.  The entire file system tree is persisted 
within that bucket/storage account.  Because of that, Hadoop file permissions 
become meaningless.  It is impossible to enforce different authorization rules 
for different files within the file system tree.  If it's absolutely necessary, 
then it's possible to split the files between mutliple file systems backed by 
different buckets/storage accounts with different credentials, but this is 
cumbersome for administrators.  If instead we had a model where the files could 
be presented as true HDFS files, but the blocks could be backed by alternative 
storage systems at the DataNode layer, then we'd have full access to HDFS file 
authorization features, but administrators could still get the benefits of 
offloading to cloud storage.

That's one example of how this proposal is beneficial.  Chris D described 
others too, like quotas.

> Allow HDFS block replicas to be provided by an external storage system
> --
>
> Key: HDFS-9806
> URL: https://issues.apache.org/jira/browse/HDFS-9806
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Chris Douglas
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10311) libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket

2016-04-21 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252043#comment-15252043
 ] 

James Clampffer commented on HDFS-10311:


bq. The calling code can provide the context of the error, but loses the 
details of what failed. Sometimes (though not as often as we'd like), the 
exception's what() method returns very useful information.
Good point.  I believe it just does a "throw std::exception()" or equivalent in 
most places but it's been a few weeks since I checked that out.  If that's 
passing useful info I'll propagate it out; keeping track of if it failed in 
shutdown or close is also handy (wanted to do that but was rushing a bit).

bq. By my reading, the SafeDisconnect always checks is_open first thing and 
returns true if the socket is closed or null
You're correct, my mistake. I did intend to have it return false if it's 
already been closed.  Refactoring + lack of sleep = broken assumptions I 
suppose.

bq. If a consumer has the event handler try to shut down the connection in any 
way, we'll get a deadlock on the state_lock_.
Ah, good catch.  I thought you meant asio callbacks.  Would it suffice to just 
push the lock_guard below the callback invocation?  From my perspective I'm 
only concerned about serializing access to the socket.


> libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket
> -
>
> Key: HDFS-10311
> URL: https://issues.apache.org/jira/browse/HDFS-10311
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-10311.HDFS-8707.000.patch, 
> HDFS-10311.HDFS-8707.001.patch
>
>
> DataNodeConnectionImpl calls reset on the unique_ptr that references the 
> underlying asio::tcp::socket.  If this happens after the continuation 
> pipeline checks the cancel state but before asio uses the socket it will 
> segfault because unique_ptr::reset will explicitly change it's value to 
> nullptr.
> Cancel should only call shutdown() and close() on the socket but keep the 
> instance of it alive.  The socket can probably also be turned into a member 
> of DataNodeConnectionImpl to get rid of the unique pointer and simplify 
> things a bit.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10311) libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket

2016-04-21 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252006#comment-15252006
 ] 

Bob Hansen commented on HDFS-10311:
---

Good comments, [~James Clampffer]

bq. I'd rather have the calling code report it so you know if it was a DN or NN 
issue and if it was a cancel or normal disconnect that lead to it.
The calling code can provide the context of the error, but loses the details of 
what failed.  Sometimes (though not as often as we'd like), the exception's 
what() method returns very useful information.  

bq. If it doesn't check and Cancel has already been called on that connection 
you end up with deterministic false positive warning messages for every 
canceled FD.
By my reading, the SafeDisconnect always checks is_open first thing and returns 
true if the socket is closed or null:
{code}
bool SafeDisconnect(asio::ip::tcp::socket *sock) {
  bool good = true;
  if(sock && sock->is_open()) {
 ...
  }
  return good;
}
{code}

It doesn't hurt to have it in the SocketDeleter, but I think it is redundant.

bq.  Which callback are you referring to? Right now it's only guarding the 
asio::async_ calls because the socket objects aren't thread safe. 

Again, by my read, patch 001 includes in DataNodeConnection:
{code}
   void async_read_some(const MutableBuffers ,
 std::function<...> handler) {
mutex_guard state_lock(state_lock_);
event_handlers_->call("DN_read_req", "", "", buf.end() - buf.begin());
conn_->async_read_some(buf, handler);
   };
 
   void async_write_some(const ConstBuffers ,
  std::function<...> handler) {
mutex_guard state_lock(state_lock_);
event_handlers_->call("DN_write_req", "", "", buf.end() - buf.begin());
...
   ;}
 {code}
If a consumer has the event handler try to shut down the connection in any way, 
we'll get a deadlock on the state_lock_.  

My reading may be wrong there; please correct me if it is.

> libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket
> -
>
> Key: HDFS-10311
> URL: https://issues.apache.org/jira/browse/HDFS-10311
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-10311.HDFS-8707.000.patch, 
> HDFS-10311.HDFS-8707.001.patch
>
>
> DataNodeConnectionImpl calls reset on the unique_ptr that references the 
> underlying asio::tcp::socket.  If this happens after the continuation 
> pipeline checks the cancel state but before asio uses the socket it will 
> segfault because unique_ptr::reset will explicitly change it's value to 
> nullptr.
> Cancel should only call shutdown() and close() on the socket but keep the 
> instance of it alive.  The socket can probably also be turned into a member 
> of DataNodeConnectionImpl to get rid of the unique pointer and simplify 
> things a bit.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6489) DFS Used space is not correct computed on frequent append operations

2016-04-21 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251992#comment-15251992
 ] 

Weiwei Yang commented on HDFS-6489:
---

The conflict was caused by some changes from HADOOP-12973, I'll consolidate a 
patch based on that. Appreciate any comments, thanks.

> DFS Used space is not correct computed on frequent append operations
> 
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0, 2.7.1, 2.7.2
>Reporter: stanley shi
>Assignee: Weiwei Yang
> Attachments: HDFS-6489.001.patch, HDFS-6489.002.patch, 
> HDFS-6489.003.patch, HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space 
> on each block write operation. This is correct in most scenario (create new 
> file), but sometimes it will behave in-correct(append small data to a large 
> block).
> For example, I have a file with only one block(say, 60M). Then I try to 
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the 
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large 
> number of files (1000+), assume the block size is 32M (half of the default 
> value), then the dfs used will be increased 1000*32M = 32G on each append to 
> the files; but actually I only write 10K bytes; this will cause the datanode 
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock  
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received 
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: 
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, 
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sda3  16G  2.9G   13G  20% /
> tmpfs 1.9G   72K  1.9G   1% /dev/shm
> /dev/sda1  97M   32M   61M  35% /boot
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10296) FileContext.getDelegationTokens() fails to obtain KMS delegation token

2016-04-21 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251966#comment-15251966
 ] 

Harsh J commented on HDFS-10296:


We do special handling in DistributedFileSystem#addDelegationTokens to detect 
TDE features and inject an additional KMS DT; this enhancement is missing in 
FileContext.

> FileContext.getDelegationTokens() fails to obtain KMS delegation token
> --
>
> Key: HDFS-10296
> URL: https://issues.apache.org/jira/browse/HDFS-10296
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.6.0
> Environment: CDH 5.6 with a Java KMS
>Reporter: Andreas Neumann
>
> This little program demonstrates the problem: With FileSystem, we can get 
> both the HDFS and the kms-dt token, whereas with FileContext, we can only 
> obtain the HDFS delegation token. 
> {code}
> public class SimpleTest {
>   public static void main(String[] args) throws IOException {
> YarnConfiguration hConf = new YarnConfiguration();
> String renewer = "renewer";
> FileContext fc = FileContext.getFileContext(hConf);
> List tokens = fc.getDelegationTokens(new Path("/"), renewer);
> for (Token token : tokens) {
>   System.out.println("Token from FC: " + token);
> }
> FileSystem fs = FileSystem.get(hConf);
> for (Token token : fs.addDelegationTokens(renewer, new Credentials())) 
> {
>   System.out.println("Token from FS: " + token);
> }
>   }
> }
> {code}
> Sample output (host/user name x'ed out):
> {noformat}
> Token from FC: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: 
> (HDFS_DELEGATION_TOKEN token 49 for xxx)
> Token from FS: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:xxx, Ident: 
> (HDFS_DELEGATION_TOKEN token 50 for xxx)
> Token from FS: Kind: kms-dt, Service: xx.xx.xx.xx:16000, Ident: 00 04 63 64 
> 61 70 07 72 65 6e 65 77 65 72 00 8a 01 54 16 96 c2 95 8a 01 54 3a a3 46 95 0e 
> 02
> {noformat}
> Apparently FileContext does not return the KMS token. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10311) libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket

2016-04-21 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251959#comment-15251959
 ] 

James Clampffer commented on HDFS-10311:


bq. SafeDisconnect should log some details at DEBUG level if there was an 
exception during disconnect; it's the kind of thing we may want to know.
I'd rather have the calling code report it so you know if it was a DN or NN 
issue and if it was a cancel or normal disconnect that lead to it.  I'm going 
to be using this for HDFS-10310 as well so it's important to specialize the 
error message for different contexts to get meaningful log messages. I don't 
see much value in a general "disconnect failed somewhere" message.

bq. SocketDeleter doesn't need to check if the socket is open; SafeDisconnect 
does that for you.
If it doesn't check and Cancel has already been called on that connection you 
end up with deterministic false positive warning messages for every canceled FD.

bq. Holding an internal lock during a callback is generally an anti-pattern 
(unless the lock it solely to serialize callbacks, and even then it's 
dangerous). If the callback consumer attempts to call back into the object or 
destroy it, you're going to have a deadlock or a segfault. We should look at 
the lock in DataNodeConnectionImpl a little more closely.
Which callback are you referring to?  Right now it's only guarding the 
asio::async_ calls because the socket objects aren't thread safe.  
Those asio methods are only queuing operations (where asio isn't getting it's 
own lock) so they return nearly instantaneously.  It's not captured by any of 
the callbacks as far as I can tell.


> libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket
> -
>
> Key: HDFS-10311
> URL: https://issues.apache.org/jira/browse/HDFS-10311
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-10311.HDFS-8707.000.patch, 
> HDFS-10311.HDFS-8707.001.patch
>
>
> DataNodeConnectionImpl calls reset on the unique_ptr that references the 
> underlying asio::tcp::socket.  If this happens after the continuation 
> pipeline checks the cancel state but before asio uses the socket it will 
> segfault because unique_ptr::reset will explicitly change it's value to 
> nullptr.
> Cancel should only call shutdown() and close() on the socket but keep the 
> instance of it alive.  The socket can probably also be turned into a member 
> of DataNodeConnectionImpl to get rid of the unique pointer and simplify 
> things a bit.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-21 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated HDFS-9958:
--
Attachment: HDFS-9958.003.patch

Thanks a lot for the helpful comments [~walter.k.su].

bq. btw, which is not related to this topic, I think 
findAndMarkBlockAsCorrupt(..) shouldn't support adding blk to the map if the 
storage is not found.

I agree, fixing that in v3 of the patch fixes the test in a way and is coherent 
with the blocksMap and corruptReplicaMap being in sync. It now throws an 
IOException if such a case is encountered.

bq. I think countNodes(blk) going thru all storages is unnecessary. Also I 
think numMachines should only include NORMAL and READ_ONLY. So 
createLocatedBlock(..) going thru all storages is unnecessary.

After thinking on this, I agree that failed storages should not count towards 
any decision that countNodes() is responsible for (which are quite a few), 
hence no change there.

Additionally, if the latest change makes sense then having {{numCorruptNodes}} 
seems futile to me. This patch doesn't remove it  just yet.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >