[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116636#comment-15116636
 ] 

Hadoop QA commented on HDFS-8999:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} HDFS-8999 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12784330/h8999_20160121c_branch-2.patch
 |
| JIRA Issue | HDFS-8999 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/14240/console |


This message was automatically generated.



> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -
>
> Key: HDFS-8999
> URL: https://issues.apache.org/jira/browse/HDFS-8999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8999_20151228.patch, h8999_20160106.patch, 
> h8999_20160106b.patch, h8999_20160106c.patch, h8999_20160111.patch, 
> h8999_20160113.patch, h8999_20160114.patch, h8999_20160121.patch, 
> h8999_20160121b.patch, h8999_20160121c.patch, h8999_20160121c_branch-2.patch
>
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
># NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
># If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
># When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2016-01-25 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116748#comment-15116748
 ] 

GAO Rui commented on HDFS-9494:
---

[~szetszwo],[~rakeshr], thanks a lot for your advice. For 
{{executor.shutdownNow()}}, if not all the tasks had been completed, we might 
not arrive to the end of {{flushAllInternals()}}. But there is no harmless to 
ensure executor shutdown, so I add {{executor.shutdownNow()}} as well.   

After checking the related codes, it seems that we haven't set a timeout for 
{{waitForAckedSeqno()}}. Maybe we could consider to set a timeout for it in 
another new Jira. I have updated the 05 patch. Could you kindly review it? 
Thank you very much.

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch, 
> HDFS-9494-origin-trunk.05.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8071) Redundant checkFileProgress() in PART II of getAdditionalBlock()

2016-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116846#comment-15116846
 ] 

Hudson commented on HDFS-8071:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9186 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9186/])
HDFS-9690. ClientProtocol.addBlock is not idempotent after HDFS-8071. 
(szetszwo: rev 45c763ad6171bc7808c2ddcb9099a4215113da2a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSClientRetries.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java


> Redundant checkFileProgress() in PART II of getAdditionalBlock()
> 
>
> Key: HDFS-8071
> URL: https://issues.apache.org/jira/browse/HDFS-8071
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 2.7.0
>
> Attachments: HDFS-8071-01.patch, HDFS-8071-02.patch, 
> HDFS-8071-branch-2.7.patch
>
>
> {{FSN.getAdditionalBlock()}} consists of two parts I and II. Each part calls 
> {{analyzeFileState()}}, which among other things check replication of the 
> penultimate block via {{checkFileProgress()}}. See details in HDFS-4452.
> Checking file progress in Part II is not necessary, because Part I already 
> assured the penultimate block is complete. It cannot change to incomplete, 
> unless the file is truncated, which is not allowed for files under 
> construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9690) addBlock is not idempotent

2016-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116847#comment-15116847
 ] 

Hudson commented on HDFS-9690:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9186 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9186/])
HDFS-9690. ClientProtocol.addBlock is not idempotent after HDFS-8071. 
(szetszwo: rev 45c763ad6171bc7808c2ddcb9099a4215113da2a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSClientRetries.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java


> addBlock is not idempotent
> --
>
> Key: HDFS-9690
> URL: https://issues.apache.org/jira/browse/HDFS-9690
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h9690_20160124.patch, h9690_20160124b.patch, 
> h9690_20160124b_branch-2.7.patch
>
>
> TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the 
> bug. It failed in the following builds.
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9704) terminate progress after namenode recover finished

2016-01-25 Thread Liao, Xiaoge (JIRA)
Liao, Xiaoge created HDFS-9704:
--

 Summary: terminate progress after namenode recover finished
 Key: HDFS-9704
 URL: https://issues.apache.org/jira/browse/HDFS-9704
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.3.0
Reporter: Liao, Xiaoge
Priority: Minor


terminate progress after namenode recover finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116767#comment-15116767
 ] 

Hadoop QA commented on HDFS-9494:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 18s {color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs-client-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 1 new + 13 unchanged - 1 fixed = 14 total (was 14) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 56s {color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs-client-jdk1.7.0_91 with JDK 
v1.7.0_91 generated 1 new + 13 unchanged - 1 fixed = 14 total (was 14) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 37s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 5s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 

[jira] [Updated] (HDFS-9503) Replace -namenode option with -fs for NNThroughputBenchmark

2016-01-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9503:

Attachment: HDFS-9053.002.patch

> Replace -namenode option with -fs for NNThroughputBenchmark
> ---
>
> Key: HDFS-9503
> URL: https://issues.apache.org/jira/browse/HDFS-9503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Konstantin Shvachko
>Assignee: Mingliang Liu
> Attachments: HDFS-9053.000.patch, HDFS-9053.001.patch, 
> HDFS-9053.002.patch
>
>
> HDFS-7847 introduced a new option {{-namenode}}, which is intended to point 
> the benchmark to a remote NameNode. It should use a standard generic option 
> {{-fs}} instead, which is routinely used to specify NameNode URI in shell 
> commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level

2016-01-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116671#comment-15116671
 ] 

Sangjin Lee commented on HDFS-9579:
---

I went over the existing version of the patch.

First, I don't think using a {{HashMap}} for the bytes read per distance is 
thread safe. Note that one thread (the owner) will modify this map in 
{{incrementBytesReadByDistance()}} while any thread can read the values off the 
map via {{getBytesReadByDistance()}} and {{visitAll()}}, all unsynchronized. 
The problems could range memory visibility, ConcurrentModificationException, 
and worse. We need to make this thread safe.

Another reservation I have with using a map: I'm a little concerned about 
memory implications. An additional map per {{StatisticsData}} can add up. Can 
we find a way of avoiding using a map? I know it may sound ugly, but one other 
option is to use individual long (volatile) variables. That can also address 
the thread safety. Thoughts?

Also, in NetworkTopology.java (lines 373-381) {{equals()}} and {{hashCode()}} 
are superfluous here as it does not modify the super behavior in any way.

> Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
> -
>
> Key: HDFS-9579
> URL: https://issues.apache.org/jira/browse/HDFS-9579
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9579-2.patch, HDFS-9579-3.patch, HDFS-9579-4.patch, 
> HDFS-9579.patch, MR job counters.png
>
>
> For cross DC distcp or other applications, it becomes useful to have insight 
> as to the traffic volume for each network distance to distinguish cross-DC 
> traffic, local-DC-remote-rack, etc.
> FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To 
> provide additional metrics for each network distance, we can add additional 
> metrics to FileSystem level and have {{DFSInputStream}} update the value 
> based on the network distance between client and the datanode.
> {{DFSClient}} will resolve client machine's network location as part of its 
> initialization. It doesn't need to resolve datanode's network location for 
> each read as {{DatanodeInfo}} already has the info.
> There are existing HDFS specific metrics such as {{ReadStatistics}} and 
> {{DFSHedgedReadMetrics}}. But these metrics are only accessible via 
> {{DFSClient}} or {{DFSInputStream}}. Not something that application framework 
> such as MR and Tez can get to. That is the benefit of storing these new 
> metrics in FileSystem.Statistics.
> This jira only includes metrics generation by HDFS. The consumption of these 
> metrics at MR and Tez will be tracked by separated jiras.
> We can add similar metrics for HDFS write scenario later if it is necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9701) DN may deadlock when hot-swapping under load

2016-01-25 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116830#comment-15116830
 ] 

Xiao Chen commented on HDFS-9701:
-

Patch 2 fixes checkstyle, findbugs issues, and added some javadocs.
For failed tests, {{TestFsDatasetImpl}} and {{TestDataNodeHotSwapVolumes}} are 
related, others seems not.
- {{TestFsDatasetImpl}}: original test missing cleanup. Added.
- {{TestDataNodeHotSwapVolumes}}: IIUC, we should hflush first in order for the 
{{BlockReceiver}} to hold a ref count. Then we can verify that block reference 
is not removed because block not finalized, even if a reconfig task is 
launched. For this reason, I moved the barrier to fix the test. [~eddyxu] 
please correct me if I'm wrong.

> DN may deadlock when hot-swapping under load
> 
>
> Key: HDFS-9701
> URL: https://issues.apache.org/jira/browse/HDFS-9701
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9701.01.patch, HDFS-9701.02.patch
>
>
> If the DN is under load (new blocks being written), a hot-swap task by {{hdfs 
> dfsadmin -reconfig}} may cause a dead lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9690) addBlock is not idempotent

2016-01-25 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-9690:
--
Attachment: h9690_20160124b_branch-2.7.patch

Thanks Vinay for reviewing and trying committing the patch!

I just have committed the patch down to 2.8.  Here is a patch for 2.7.

h9690_20160124b_branch-2.7.patch: for 2.7.

> addBlock is not idempotent
> --
>
> Key: HDFS-9690
> URL: https://issues.apache.org/jira/browse/HDFS-9690
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h9690_20160124.patch, h9690_20160124b.patch, 
> h9690_20160124b_branch-2.7.patch
>
>
> TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the 
> bug. It failed in the following builds.
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-25 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116845#comment-15116845
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8999:
---

I have committed this to trunk.  Will leave this open for committing to 
branch-2.

> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -
>
> Key: HDFS-8999
> URL: https://issues.apache.org/jira/browse/HDFS-8999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8999_20151228.patch, h8999_20160106.patch, 
> h8999_20160106b.patch, h8999_20160106c.patch, h8999_20160111.patch, 
> h8999_20160113.patch, h8999_20160114.patch, h8999_20160121.patch, 
> h8999_20160121b.patch, h8999_20160121c.patch, h8999_20160121c_branch-2.patch
>
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
># NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
># If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
># When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9663) Optimize some RPC call using lighter weight construct than DatanodeInfo

2016-01-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116634#comment-15116634
 ] 

Colin Patrick McCabe commented on HDFS-9663:


Is that stuff actually sent over the wire in every case?  These fields are 
optional in the protobuf structures.

{code}
/**
 * The status of a Datanode
 */
message DatanodeInfoProto {
  required DatanodeIDProto id = 1;
  optional uint64 capacity = 2 [default = 0];
  optional uint64 dfsUsed = 3 [default = 0];
  optional uint64 remaining = 4 [default = 0];
  optional uint64 blockPoolUsed = 5 [default = 0];
  optional uint64 lastUpdate = 6 [default = 0];
  optional uint32 xceiverCount = 7 [default = 0];
  optional string location = 8;
  enum AdminState {
NORMAL = 0;
DECOMMISSION_INPROGRESS = 1;
DECOMMISSIONED = 2;
  }
  
  optional AdminState adminState = 10 [default = NORMAL];
  optional uint64 cacheCapacity = 11 [default = 0];
  optional uint64 cacheUsed = 12 [default = 0];
  optional uint64 lastUpdateMonotonic = 13 [default = 0];
  optional string upgradeDomain = 14;
}
{code}

I agree that it's messy that these fields are optional, but it's hard to see 
how to change it compatibly at this point.

> Optimize some RPC call using lighter weight construct than DatanodeInfo
> ---
>
> Key: HDFS-9663
> URL: https://issues.apache.org/jira/browse/HDFS-9663
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kai Zheng
>Assignee: Kai Zheng
>
> While working on HDFS-8430 when add a RPC in DataTransferProtocol, it was 
> noticed the very heavy construct either {{DatanodeInfo}} or 
> {{DatanodeInfoWithStorage}} is used to represent a datanode just for 
> connection in most time. However, it's very fat and contains much more 
> information than that needed. See how it's defined:
> {code}
> public class DatanodeInfo extends DatanodeID implements Node {
>   private long capacity;
>   private long dfsUsed;
>   private long remaining;
>   private long blockPoolUsed;
>   private long cacheCapacity;
>   private long cacheUsed;
>   private long lastUpdate;
>   private long lastUpdateMonotonic;
>   private int xceiverCount;
>   private String location = NetworkTopology.DEFAULT_RACK;
>   private String softwareVersion;
>   private List dependentHostNames = new LinkedList<>();
>   private String upgradeDomain;
> ...
> {code}
> In client and datanode sides, for RPC calls like 
> {{DataTransferProtocol#writeBlock}}, looks like the information contained in 
> {{DatanodeID}} is almost enough.
> I did a quick hack that using a light weight construct like 
> {{SimpleDatanodeInfo}} that simply extends DatanodeID (no other field added, 
> but if whatever field needed, then just add it) and changed the 
> DataTransferProtocol#writeBlock call. Manually checked many relevant tests it 
> did work fine. How much network traffic saved, did a simple test with codes 
> in {{Sender}}:
> {code}
>   private static void send(final DataOutputStream out, final Op opcode,
>   final Message proto) throws IOException {
> LOG.trace("Sending DataTransferOp {}: {}",
> proto.getClass().getSimpleName(), proto);
> int before = out.size();
> op(out, opcode);
> proto.writeDelimitedTo(out);
> int after = out.size();
> System.out.println("X sent=" + (after - before));
> out.flush();
>   }
> {code}
> Ran the test {{TestWriteRead#testWriteAndRead}}, the change can  save about 
> 100 bytes in most time for the call. The saving may be not so big because 
> only 3 datanodes are to send, but in situations like in 
> {{BlockECRecoveryCommand}}, there can be 6+ 3 datanodes as targets and 
> sources to send, the saving will be significant.
> Hence, suggest use more light weight construct to represent a datanode in RPC 
> calls when possible. Or other ideas to avoid unnecessary wire data size. This 
> may make sense, as noted, there were some discussions in HDFS-8999 to save 
> some datanodes bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9704) terminate progress after namenode recover finished

2016-01-25 Thread Liao, Xiaoge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liao, Xiaoge updated HDFS-9704:
---
Attachment: HDFS-9704.001.patch

> terminate progress after namenode recover finished
> --
>
> Key: HDFS-9704
> URL: https://issues.apache.org/jira/browse/HDFS-9704
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Liao, Xiaoge
>Priority: Minor
> Attachments: HDFS-9704.001.patch
>
>
> terminate progress after namenode recover finished



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116762#comment-15116762
 ] 

Hadoop QA commented on HDFS-8999:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 38s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 44s 
{color} | {color:red} hadoop-hdfs-project: patch generated 4 new + 1027 
unchanged - 3 fixed = 1031 total (was 1030) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
1s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 11s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 19s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 13s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 103m 46s 
{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
44s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 230m 38s {color} 
| {color:black} {color} |
\\

[jira] [Updated] (HDFS-9655) NN should start JVM pause monitor before loading fsimage

2016-01-25 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9655:

Labels: supportability  (was: )

> NN should start JVM pause monitor before loading fsimage
> 
>
> Key: HDFS-9655
> URL: https://issues.apache.org/jira/browse/HDFS-9655
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Critical
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9655.001.patch
>
>
> We have seen many cases of NameNode startup either extremely slow or even 
> hung. Most of them were caused by insufficient heap size with regard to the 
> metadata size. Those cases were resolved by increasing the heap size.
> However it did take support team some time to root cause. JVM pause warning 
> messages would greatly assist in such diagnosis, but NN starts JVM pause 
> monitor after fsimage/edits loading.
> Propose to start JVM pause monitor before loading fsimage/edits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2016-01-25 Thread GAO Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-9494:
--
Status: In Progress  (was: Patch Available)

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2016-01-25 Thread GAO Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-9494:
--
Status: Patch Available  (was: In Progress)

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch, 
> HDFS-9494-origin-trunk.05.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2016-01-25 Thread GAO Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-9494:
--
Attachment: HDFS-9494-origin-trunk.05.patch

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch, 
> HDFS-9494-origin-trunk.05.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9690) addBlock is not idempotent

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116851#comment-15116851
 ] 

Hadoop QA commented on HDFS-9690:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} HDFS-9690 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12784370/h9690_20160124b_branch-2.7.patch
 |
| JIRA Issue | HDFS-9690 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/14244/console |


This message was automatically generated.



> addBlock is not idempotent
> --
>
> Key: HDFS-9690
> URL: https://issues.apache.org/jira/browse/HDFS-9690
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h9690_20160124.patch, h9690_20160124b.patch, 
> h9690_20160124b_branch-2.7.patch
>
>
> TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the 
> bug. It failed in the following builds.
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7694) FSDataInputStream should support "unbuffer"

2016-01-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116622#comment-15116622
 ] 

Colin Patrick McCabe commented on HDFS-7694:


Hi, [~djp].  This change is compatible, since people are not expected to be 
subclassing {{FSDataInputStream}}.  So it seems fine to backport to 2.6, if the 
maintainers of that branch think it will be useful there.

> FSDataInputStream should support "unbuffer"
> ---
>
> Key: HDFS-7694
> URL: https://issues.apache.org/jira/browse/HDFS-7694
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 2.7.0
>
> Attachments: HDFS-7694.001.patch, HDFS-7694.002.patch, 
> HDFS-7694.003.patch, HDFS-7694.004.patch, HDFS-7694.005.patch
>
>
> For applications that have many open HDFS (or other Hadoop filesystem) files, 
> it would be useful to have an API to clear readahead buffers and sockets.  
> This could be added to the existing APIs as an optional interface, in much 
> the same way as we added setReadahead / setDropBehind / etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9663) Optimize some RPC call using lighter weight construct than DatanodeInfo

2016-01-25 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116701#comment-15116701
 ] 

Kai Zheng commented on HDFS-9663:
-

Thanks for your comments, Colin.
bq. Is that stuff actually sent over the wire in every case? These fields are 
optional in the protobuf structures.
I thought of this and checked it, these optional fields looked like to be sent 
over even when they're not actually needed. I will check again to ensure about 
this.
bq. I agree that it's messy that these fields are optional, but it's hard to 
see how to change it compatibly at this point.
Yes right. Compatibility has to be considered. For protocol that's already 
released out, to avoid sending over unnecessary fields may be the option; for 
others introduced recently like the one mentioned in the description, 
BlockECRecoveryCommand, and new protocols in future, I thought we may be able 
to change using lightweight structure like DatanodeID when possible. Sounds 
good?


> Optimize some RPC call using lighter weight construct than DatanodeInfo
> ---
>
> Key: HDFS-9663
> URL: https://issues.apache.org/jira/browse/HDFS-9663
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kai Zheng
>Assignee: Kai Zheng
>
> While working on HDFS-8430 when add a RPC in DataTransferProtocol, it was 
> noticed the very heavy construct either {{DatanodeInfo}} or 
> {{DatanodeInfoWithStorage}} is used to represent a datanode just for 
> connection in most time. However, it's very fat and contains much more 
> information than that needed. See how it's defined:
> {code}
> public class DatanodeInfo extends DatanodeID implements Node {
>   private long capacity;
>   private long dfsUsed;
>   private long remaining;
>   private long blockPoolUsed;
>   private long cacheCapacity;
>   private long cacheUsed;
>   private long lastUpdate;
>   private long lastUpdateMonotonic;
>   private int xceiverCount;
>   private String location = NetworkTopology.DEFAULT_RACK;
>   private String softwareVersion;
>   private List dependentHostNames = new LinkedList<>();
>   private String upgradeDomain;
> ...
> {code}
> In client and datanode sides, for RPC calls like 
> {{DataTransferProtocol#writeBlock}}, looks like the information contained in 
> {{DatanodeID}} is almost enough.
> I did a quick hack that using a light weight construct like 
> {{SimpleDatanodeInfo}} that simply extends DatanodeID (no other field added, 
> but if whatever field needed, then just add it) and changed the 
> DataTransferProtocol#writeBlock call. Manually checked many relevant tests it 
> did work fine. How much network traffic saved, did a simple test with codes 
> in {{Sender}}:
> {code}
>   private static void send(final DataOutputStream out, final Op opcode,
>   final Message proto) throws IOException {
> LOG.trace("Sending DataTransferOp {}: {}",
> proto.getClass().getSimpleName(), proto);
> int before = out.size();
> op(out, opcode);
> proto.writeDelimitedTo(out);
> int after = out.size();
> System.out.println("X sent=" + (after - before));
> out.flush();
>   }
> {code}
> Ran the test {{TestWriteRead#testWriteAndRead}}, the change can  save about 
> 100 bytes in most time for the call. The saving may be not so big because 
> only 3 datanodes are to send, but in situations like in 
> {{BlockECRecoveryCommand}}, there can be 6+ 3 datanodes as targets and 
> sources to send, the saving will be significant.
> Hence, suggest use more light weight construct to represent a datanode in RPC 
> calls when possible. Or other ideas to avoid unnecessary wire data size. This 
> may make sense, as noted, there were some discussions in HDFS-8999 to save 
> some datanodes bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9701) DN may deadlock when hot-swapping under load

2016-01-25 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9701:

Attachment: HDFS-9701.02.patch

> DN may deadlock when hot-swapping under load
> 
>
> Key: HDFS-9701
> URL: https://issues.apache.org/jira/browse/HDFS-9701
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9701.01.patch, HDFS-9701.02.patch
>
>
> If the DN is under load (new blocks being written), a hot-swap task by {{hdfs 
> dfsadmin -reconfig}} may cause a dead lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level

2016-01-25 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116791#comment-15116791
 ] 

Ming Ma commented on HDFS-9579:
---

Thanks [~sjlee0]! Good point about the thread visibility issue. The reason I 
ended up using the map is to make the code more general to support any network 
distance value without code change. However due to the fact that the available 
network distance values don't change often, using individual long variables 
seems ok given it addresses the issues you mentioned above.

To use individual long variables, it could be something like below. Note that 
it assume tree-based topology; and it should cover the common scenarios. If we 
need to track network distance values, we can update it later. In addition, 
this means bytesReadDistanceOfFour and bytesReadDistanceOfSix won't be used for 
small network topology.

{noformat}
volatile long bytesReadLocalHost;
volatile long bytesReadDistanceOfTwo; // local rack case.
volatile long bytesReadDistanceOfFour; // first-degree remote rack
volatile long bytesReadDistanceOfSix; // second-degree remote rack
{noformat}

I will update the patch once we agree on the new approach.

> Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
> -
>
> Key: HDFS-9579
> URL: https://issues.apache.org/jira/browse/HDFS-9579
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9579-2.patch, HDFS-9579-3.patch, HDFS-9579-4.patch, 
> HDFS-9579.patch, MR job counters.png
>
>
> For cross DC distcp or other applications, it becomes useful to have insight 
> as to the traffic volume for each network distance to distinguish cross-DC 
> traffic, local-DC-remote-rack, etc.
> FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To 
> provide additional metrics for each network distance, we can add additional 
> metrics to FileSystem level and have {{DFSInputStream}} update the value 
> based on the network distance between client and the datanode.
> {{DFSClient}} will resolve client machine's network location as part of its 
> initialization. It doesn't need to resolve datanode's network location for 
> each read as {{DatanodeInfo}} already has the info.
> There are existing HDFS specific metrics such as {{ReadStatistics}} and 
> {{DFSHedgedReadMetrics}}. But these metrics are only accessible via 
> {{DFSClient}} or {{DFSInputStream}}. Not something that application framework 
> such as MR and Tez can get to. That is the benefit of storing these new 
> metrics in FileSystem.Statistics.
> This jira only includes metrics generation by HDFS. The consumption of these 
> metrics at MR and Tez will be tracked by separated jiras.
> We can add similar metrics for HDFS write scenario later if it is necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6489) DFS Used space is not correct computed on frequent append operations

2016-01-25 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated HDFS-6489:
--
Attachment: HDFS6489.java

> DFS Used space is not correct computed on frequent append operations
> 
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0
>Reporter: stanley shi
> Attachments: HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space 
> on each block write operation. This is correct in most scenario (create new 
> file), but sometimes it will behave in-correct(append small data to a large 
> block).
> For example, I have a file with only one block(say, 60M). Then I try to 
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the 
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large 
> number of files (1000+), assume the block size is 32M (half of the default 
> value), then the dfs used will be increased 1000*32M = 32G on each append to 
> the files; but actually I only write 10K bytes; this will cause the datanode 
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock  
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received 
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: 
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, 
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sda3  16G  2.9G   13G  20% /
> tmpfs 1.9G   72K  1.9G   1% /dev/shm
> /dev/sda1  97M   32M   61M  35% /boot
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs

2016-01-25 Thread Staffan Friberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Staffan Friberg updated HDFS-9260:
--
Attachment: HDFS-9260.014.patch

> Improve performance and GC friendliness of startup and FBRs
> ---
>
> Key: HDFS-9260
> URL: https://issues.apache.org/jira/browse/HDFS-9260
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, performance
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Attachments: FBR processing.png, HDFS Block and Replica Management 
> 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, 
> HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, 
> HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, 
> HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, 
> HDFS-9260.013.patch, HDFS-9260.014.patch, HDFSBenchmarks.zip, 
> HDFSBenchmarks2.zip
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to 
> keep them sorted. This allows faster and more GC friendly handling of full 
> block reports.
> Would like to hear peoples feedback on this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6489) DFS Used space is not correct computed on frequent append operations

2016-01-25 Thread Bogdan Raducanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated HDFS-6489:
--
Affects Version/s: 2.7.1

> DFS Used space is not correct computed on frequent append operations
> 
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: stanley shi
> Attachments: HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space 
> on each block write operation. This is correct in most scenario (create new 
> file), but sometimes it will behave in-correct(append small data to a large 
> block).
> For example, I have a file with only one block(say, 60M). Then I try to 
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the 
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large 
> number of files (1000+), assume the block size is 32M (half of the default 
> value), then the dfs used will be increased 1000*32M = 32G on each append to 
> the files; but actually I only write 10K bytes; this will cause the datanode 
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock  
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received 
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: 
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, 
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sda3  16G  2.9G   13G  20% /
> tmpfs 1.9G   72K  1.9G   1% /dev/shm
> /dev/sda1  97M   32M   61M  35% /boot
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6489) DFS Used space is not correct computed on frequent append operations

2016-01-25 Thread Bogdan Raducanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115206#comment-15115206
 ] 

Bogdan Raducanu commented on HDFS-6489:
---

I've recently hit this bug in 2.7.1. I attached repro code. The repro should 
fail with 'all datanodes are bad' exception while the datanode log will show 
the "insufficient disk space" exception.
While the program is running you can see the reported "Block pool used" 
increase by a lot. A minute or two after the failure the "Block pool used" goes 
down to normal.

> DFS Used space is not correct computed on frequent append operations
> 
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0, 2.7.1
>Reporter: stanley shi
> Attachments: HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space 
> on each block write operation. This is correct in most scenario (create new 
> file), but sometimes it will behave in-correct(append small data to a large 
> block).
> For example, I have a file with only one block(say, 60M). Then I try to 
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the 
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large 
> number of files (1000+), assume the block size is 32M (half of the default 
> value), then the dfs used will be increased 1000*32M = 32G on each append to 
> the files; but actually I only write 10K bytes; this will cause the datanode 
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock  
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received 
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: 
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, 
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sda3  16G  2.9G   13G  20% /
> tmpfs 1.9G   72K  1.9G   1% /dev/shm
> /dev/sda1  97M   32M   61M  35% /boot
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9695) HTTPFS - CHECKACCESS operation missing

2016-01-25 Thread Bert Hekman (JIRA)
Bert Hekman created HDFS-9695:
-

 Summary: HTTPFS - CHECKACCESS operation missing
 Key: HDFS-9695
 URL: https://issues.apache.org/jira/browse/HDFS-9695
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Bert Hekman


Hi,

The CHECKACCESS operation seems to be missing in HTTPFS. I'm getting the 
following error:

{code}
QueryParamException: java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.fs.http.client.HttpFSFileSystem.Operation.CHECKACCESS
{code}

A quick look into the org.apache.hadoop.fs.http.client.HttpFSFileSystem class 
reveals that CHECKACCESS is not defined at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-9689) Test o.a.h.hdfs.TestRenameWhileOpen fails intermittently

2016-01-25 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115072#comment-15115072
 ] 

Vinayakumar B edited comment on HDFS-9689 at 1/25/16 1:22 PM:
--

One possible solution to Namenode restart tests, is to implement 
{{BPOfferService#refreshNNList()}} in way to support in between changing of 
namenode addresses, and issue, DN#refreshNamenodes(Conf) call to all DNs after 
restart in MiniDfsCluster.
This way all tests which fails intermittently with this kind of problems would 
be solved.

After analyzing, current MiniDfsCluster, is not having option to 
restartNamenode in another optional ports. 
If this support is required to be added, there are some points needs to be 
checked.
1. FileSystem's URI will change in case of NonHA, So all FileSystem's instances 
should be refreshed in tests after restart of Namenode.

2. Current restartDatanode(..) have keepPort default as false, but this will 
work only if {{setupHostsFile}} is false. Otherwise, DN will try to restart in 
same port. This could result in some occasional test failures.

3. In case of restartDatanodes() or restartNamenodes(),, assertions of URIs or 
DN names, should be checked, as these may change with changed port post restart.

So, to completely resolve issues raising due to Port Bind issues, from restart 
(Name|Data)nodes needs some effort :)

What you say, [~liuml07] ?


was (Author: vinayrpet):
One possible solution to Namenode restart tests, is to implement 
{{BPOfferService#refreshNNList()}} in way to support in between changing of 
namenode addresses, and issue, DN#refreshNamenodes(Conf) call to all DNs after 
restart in MiniDfsCluster.
This way all tests which fails intermittently with this kind of problems would 
be solved.

What you say, [~liuml07] ?

> Test o.a.h.hdfs.TestRenameWhileOpen fails intermittently 
> -
>
> Key: HDFS-9689
> URL: https://issues.apache.org/jira/browse/HDFS-9689
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9689.000.patch
>
>
> The test fails in recent builds, e.g.
> https://builds.apache.org/job/PreCommit-HDFS-Build/14063/testReport/org.apache.hadoop.hdfs/TestRenameWhileOpen/
> and
> https://builds.apache.org/job/PreCommit-HDFS-Build/14212/testReport/org.apache.hadoop.hdfs/TestRenameWhileOpen/testWhileOpenRenameToNonExistentDirectory/
> The *Error Message* is like:
> {code}
> Problem binding to [localhost:60690] java.net.BindException: Address already 
> in use; For more details see:  http://wiki.apache.org/hadoop/BindException
> {code}
> and *Stacktrace* is:
> {code}
> java.net.BindException: Problem binding to [localhost:60690] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:469)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:695)
>   at org.apache.hadoop.ipc.Server.(Server.java:2464)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:392)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:743)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:685)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:884)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1581)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441)

[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs

2016-01-25 Thread Staffan Friberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115356#comment-15115356
 ] 

Staffan Friberg commented on HDFS-9260:
---

Fixed checkstyle on TreeSet.

Should I convert storages field to private? (The triplets field was protected)

> Improve performance and GC friendliness of startup and FBRs
> ---
>
> Key: HDFS-9260
> URL: https://issues.apache.org/jira/browse/HDFS-9260
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, performance
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Attachments: FBR processing.png, HDFS Block and Replica Management 
> 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, 
> HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, 
> HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, 
> HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, 
> HDFS-9260.013.patch, HDFS-9260.014.patch, HDFSBenchmarks.zip, 
> HDFSBenchmarks2.zip
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to 
> keep them sorted. This allows faster and more GC friendly handling of full 
> block reports.
> Would like to hear peoples feedback on this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9694) Make existing DFSClient#getFileChecksum() work for striped blocks

2016-01-25 Thread Kai Zheng (JIRA)
Kai Zheng created HDFS-9694:
---

 Summary: Make existing DFSClient#getFileChecksum() work for 
striped blocks
 Key: HDFS-9694
 URL: https://issues.apache.org/jira/browse/HDFS-9694
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: 3.0.0


This is a sub-task of HDFS-8430 and will get the existing API 
{{FileSystem#getFileChecksum(path)}} work for striped files. It will also 
refactor existing codes and layout basic work for subsequent tasks like support 
of the new API proposed there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9693) Trim the user config of `dfs.ha.namenode.id`

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115244#comment-15115244
 ] 

Hadoop QA commented on HDFS-9693:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
5s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 4s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 46s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
36s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 235m 51s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestMissingBlocksAlert |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.datanode.TestBlockReplacement |
|   | hadoop.hdfs.TestFileCreationDelete |
|   | hadoop.hdfs.server.namenode.ha.TestHAAppend |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
| JDK v1.7.0_91 Failed junit 

[jira] [Commented] (HDFS-8430) Erasure coding: compute file checksum for stripe files

2016-01-25 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115135#comment-15115135
 ] 

Kai Zheng commented on HDFS-8430:
-

To break down, opened HDFS-9694 to make the existing API also works for striped 
files along with codes refactoring. 

> Erasure coding: compute file checksum for stripe files
> --
>
> Key: HDFS-8430
> URL: https://issues.apache.org/jira/browse/HDFS-8430
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7285
>Reporter: Walter Su
>Assignee: Kai Zheng
> Attachments: HDFS-8430-poc1.patch
>
>
> HADOOP-3981 introduces a  distributed file checksum algorithm. It's designed 
> for replicated block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped 
> block group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9525) hadoop utilities need to support provided delegation tokens

2016-01-25 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115355#comment-15115355
 ] 

Kihwal Lee commented on HDFS-9525:
--

Is anyone reverting it or reworking on the fix?

> hadoop utilities need to support provided delegation tokens
> ---
>
> Key: HDFS-9525
> URL: https://issues.apache.org/jira/browse/HDFS-9525
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-7984.001.patch, HDFS-7984.002.patch, 
> HDFS-7984.003.patch, HDFS-7984.004.patch, HDFS-7984.005.patch, 
> HDFS-7984.006.patch, HDFS-7984.007.patch, HDFS-7984.patch, 
> HDFS-9525.008.patch, HDFS-9525.009.patch, HDFS-9525.009.patch, 
> HDFS-9525.branch-2.008.patch, HDFS-9525.branch-2.009.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9406) FSImage corruption after taking snapshot

2016-01-25 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115780#comment-15115780
 ] 

Jing Zhao commented on HDFS-9406:
-

Thanks for reporting the issue, [~stanislav.an...@gmail.com]. The corrupted 
fsimage should also be useful for debugging. Could you please share the image 
if possible?

> FSImage corruption after taking snapshot
> 
>
> Key: HDFS-9406
> URL: https://issues.apache.org/jira/browse/HDFS-9406
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CentOS 6 amd64, CDH 5.4.4-1
> 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3
> Memory: 32GB
> Namenode blocks: ~700_000 blocks, no HA setup
>Reporter: Stanislav Antic
>Assignee: Yongjun Zhang
>
> FSImage corruption happened after HDFS snapshots were taken. Cluster was not 
> used
> at that time.
> When namenode restarts it reported NULL pointer exception:
> {code}
> 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized 
> segments in /tmp/fsimage_checker_5857/fsimage/current
> 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected.
> 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes.
> 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:794)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553)
> 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1
> {code}
> Corruption happened after "07.11.2015 00:15", and after that time blocks 
> ~9300 blocks were invalidated that shouldn't be.
> After recovering FSimage I discovered that around ~9300 blocks were missing.
> -I also attached log of namenode before and after corruption happened.-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9262) Support reconfiguring dfs.datanode.lazywriter.interval.sec without DN restart

2016-01-25 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-9262:

Attachment: HDFS-9262-HDFS-9000.004.patch

V004 fixed a large quantity of unit failures as a result of Reconfigurable 
implementation(originally throw UnsupportedOperationException)   in 
SimulatedFSDataset and ExternalDatasetImpl.

> Support reconfiguring dfs.datanode.lazywriter.interval.sec without DN restart
> -
>
> Key: HDFS-9262
> URL: https://issues.apache.org/jira/browse/HDFS-9262
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9262-HDFS-9000.002.patch, 
> HDFS-9262-HDFS-9000.003.patch, HDFS-9262-HDFS-9000.004.patch, 
> HDFS-9262.001.patch
>
>
> This is to reconfigure
> {code}
> dfs.datanode.lazywriter.interval.sec
> {code}
> without restarting DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9699) libhdfs++: Add appropriate catch blocks for ASIO operations that throw

2016-01-25 Thread James Clampffer (JIRA)
James Clampffer created HDFS-9699:
-

 Summary: libhdfs++: Add appropriate catch blocks for ASIO 
operations that throw
 Key: HDFS-9699
 URL: https://issues.apache.org/jira/browse/HDFS-9699
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: James Clampffer
Assignee: James Clampffer


libhdfs++ doesn't create exceptions of its own but it should be able to 
gracefully handle exceptions thrown by libraries it uses, particularly asio.

libhdfs++ should be able to catch most exceptions within reason either at the 
call site or in the code that spins up asio worker threads.  Certain system 
exceptions like std::bad_alloc don't need to be caught because by that point 
the process is likely in a unrecoverable state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9672) o.a.h.hdfs.TestLeaseRecovery2 fails intermittently

2016-01-25 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115862#comment-15115862
 ] 

Jitendra Nath Pandey commented on HDFS-9672:


+1

> o.a.h.hdfs.TestLeaseRecovery2 fails intermittently
> --
>
> Key: HDFS-9672
> URL: https://issues.apache.org/jira/browse/HDFS-9672
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9672.000.patch, HDFS-9672.001.patch
>
>
> It fails in recent builds, see:
> https://builds.apache.org/job/PreCommit-HDFS-Build/14177/testReport/org.apache.hadoop.hdfs/
> https://builds.apache.org/job/PreCommit-HDFS-Build/14147/testReport/org.apache.hadoop.hdfs/
> Failing test methods include:
> * 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart
> * org.apache.hadoop.hdfs.TestLeaseRecovery2.testLeaseRecoverByAnotherUser
> * org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecovery
> * 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.org.apache.hadoop.hdfs.TestLeaseRecovery2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2016-01-25 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115745#comment-15115745
 ] 

Rakesh R commented on HDFS-9494:


Thanks [~demongaorui] for the patch. I've a minor comment, please consider this 
also when preparing next patch.

For every flushAllInternals(), it is creating {{ExecutorService executor = 
Executors.newFixedThreadPool(numAllBlocks);}}. Please do 
{{executor.shutdownNow();}} at the end of flushAllInternals() function. Otw 
there could be a chance of unnecessary {{Thread (pool-1-thread-1) (Running)}} 
reference leaving, right?

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9698) Long running Balancer should renew TGT

2016-01-25 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-9698:
---

 Summary: Long running Balancer should renew TGT
 Key: HDFS-9698
 URL: https://issues.apache.org/jira/browse/HDFS-9698
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover, security
Affects Versions: 2.6.3
Reporter: Zhe Zhang
Assignee: Zhe Zhang


When the {{Balancer}} runs beyond the configured TGT lifetime, the current 
logic won't renew TGT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9406) FSImage corruption after taking snapshot

2016-01-25 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115866#comment-15115866
 ] 

Yongjun Zhang commented on HDFS-9406:
-

Thanks [~kihwal] and [~jingzhao].

Hi Jing, 

I have got a set of data from [~stanislav.an...@gmail.com] at our private 
channel, the issue can be reproduced with this set of data (Thanks Stanislav a 
million for that!). I have been debugging and had good understanding. I will 
talk with you and Stanislav privately about the data.

While I tried to create a small testcase to reproduce the symptom here, I was 
not quite successful. However, I was able to create HDFS-9697 and have a 
proposed solution (not published yet).  My study showed that HDFS-9406 has 
similar cause as HDFS-9697 but not exactly the same. I'm digging it a bit 
further, I might need help from you guys at some point.

Thanks much.


 



> FSImage corruption after taking snapshot
> 
>
> Key: HDFS-9406
> URL: https://issues.apache.org/jira/browse/HDFS-9406
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CentOS 6 amd64, CDH 5.4.4-1
> 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3
> Memory: 32GB
> Namenode blocks: ~700_000 blocks, no HA setup
>Reporter: Stanislav Antic
>Assignee: Yongjun Zhang
>
> FSImage corruption happened after HDFS snapshots were taken. Cluster was not 
> used
> at that time.
> When namenode restarts it reported NULL pointer exception:
> {code}
> 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized 
> segments in /tmp/fsimage_checker_5857/fsimage/current
> 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected.
> 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes.
> 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:794)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553)
> 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1
> {code}
> Corruption happened after "07.11.2015 00:15", and after that time blocks 
> ~9300 blocks were invalidated that shouldn't be.
> After recovering FSimage I discovered that around ~9300 blocks were missing.
> -I also attached log of namenode before and after corruption happened.-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115818#comment-15115818
 ] 

Jing Zhao edited comment on HDFS-9696 at 1/25/16 7:32 PM:
--

Currently I think HDFS-9406 and HDFS-9697 may both be caused by some lingering 
INode in the diff list. Both failed when loading INode from the inode Map. 
Compared with the logic for removing inodes from inode map, cleaning diff list 
is more complicated thus has higher chance to have bug.


was (Author: jingzhao):
Currently I think HDFS-9406 and HDFS-9697 may both be caused by some lingering 
INode in the diff list. Both failed when loading INode from the inode Map. 
Compared with the logic for removing inodes from inode map, cleaning diff list 
is more complicated thus has higher chance to fail.

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Yongjun Zhang
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115818#comment-15115818
 ] 

Jing Zhao commented on HDFS-9696:
-

Currently I think HDFS-9406 and HDFS-9697 may both be caused by some lingering 
INode in the diff list. Both failed when loading INode from the inode Map. 
Compared with the logic for removing inodes from inode map, cleaning diff list 
is more complicated thus has higher chance to fail.

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Yongjun Zhang
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9330) Support reconfiguring dfs.datanode.duplicate.replica.deletion without DN restart

2016-01-25 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HDFS-9330:

Attachment: HDFS-9330-HDFS-9000.003.patch

Similarly, V003 fixed a large quantity of unit failures as a result of 
Reconfigurable implementation(originally throw UnsupportedOperationException)   
in SimulatedFSDataset and ExternalDatasetImpl.

> Support reconfiguring dfs.datanode.duplicate.replica.deletion without DN 
> restart 
> -
>
> Key: HDFS-9330
> URL: https://issues.apache.org/jira/browse/HDFS-9330
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9330-HDFS-9000.002.patch, 
> HDFS-9330-HDFS-9000.003.patch, HDFS-9330.001.patch
>
>
> This is to reconfigure
> {code}
> dfs.datanode.duplicate.replica.deletion
> {code}
> without restarting DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9691) o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode#testCheckSafeMode fails intermittently

2016-01-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115776#comment-15115776
 ] 

Mingliang Liu commented on HDFS-9691:
-

The failing test is not related, and seems flaky which is tracked by 
[HDFS-9476].

> o.a.h.hdfs.server.blockmanagement.TestBlockManagerSafeMode#testCheckSafeMode 
> fails intermittently
> -
>
> Key: HDFS-9691
> URL: https://issues.apache.org/jira/browse/HDFS-9691
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9691.000.patch
>
>
> It's a flaky test method and can rarely re-produce locally. We can see this 
> happened in recent build, e.g. 
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14225/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14139/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlockManagerSafeMode/testCheckSafeMode/
> {code}
> Error Message
> expected: but was:
> Stacktrace
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode.testCheckSafeMode(TestBlockManagerSafeMode.java:165)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9698) Long running Balancer should renew TGT

2016-01-25 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9698:

Attachment: HDFS-9698.00.patch

A similar fix as HADOOP-12559, but in the {{Balancer}}. Adding the renewal 
logic before each {{Balancer}} iteration because the dispatch runs multiple 
operations with NN within the iteration.

> Long running Balancer should renew TGT
> --
>
> Key: HDFS-9698
> URL: https://issues.apache.org/jira/browse/HDFS-9698
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover, security
>Affects Versions: 2.6.3
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9698.00.patch
>
>
> When the {{Balancer}} runs beyond the configured TGT lifetime, the current 
> logic won't renew TGT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9406) FSImage corruption after taking snapshot

2016-01-25 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115795#comment-15115795
 ] 

Kihwal Lee commented on HDFS-9406:
--

HDFS-9696 might be related.

> FSImage corruption after taking snapshot
> 
>
> Key: HDFS-9406
> URL: https://issues.apache.org/jira/browse/HDFS-9406
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: CentOS 6 amd64, CDH 5.4.4-1
> 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3
> Memory: 32GB
> Namenode blocks: ~700_000 blocks, no HA setup
>Reporter: Stanislav Antic
>Assignee: Yongjun Zhang
>
> FSImage corruption happened after HDFS snapshots were taken. Cluster was not 
> used
> at that time.
> When namenode restarts it reported NULL pointer exception:
> {code}
> 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized 
> segments in /tmp/fsimage_checker_5857/fsimage/current
> 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected.
> 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes.
> 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:810)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:794)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553)
> 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1
> {code}
> Corruption happened after "07.11.2015 00:15", and after that time blocks 
> ~9300 blocks were invalidated that shouldn't be.
> After recovering FSimage I discovered that around ~9300 blocks were missing.
> -I also attached log of namenode before and after corruption happened.-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115796#comment-15115796
 ] 

Yongjun Zhang commented on HDFS-9696:
-

Thanks Kihwal. Yes, agree. While I have been investigating, I indeed planned to 
ask the snapshot developers for help at some point.


> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Yongjun Zhang
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115884#comment-15115884
 ] 

Yongjun Zhang commented on HDFS-9696:
-

And I have a solution for HDFS-9697, for the case I created. Yet to prove that 
it will work with all situations.



> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Yongjun Zhang
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9094) Add command line option to ask NameNode reload configuration.

2016-01-25 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9094:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0
   Status: Resolved  (was: Patch Available)

+1 for the v009 patch. I committed this for 2.9.0. Thanks for the contribution 
[~xiaobingo].

> Add command line option to ask NameNode reload configuration.
> -
>
> Key: HDFS-9094
> URL: https://issues.apache.org/jira/browse/HDFS-9094
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-9094-HDFS-9000.002.patch, 
> HDFS-9094-HDFS-9000.003.patch, HDFS-9094-HDFS-9000.004.patch, 
> HDFS-9094-HDFS-9000.005.patch, HDFS-9094-HDFS-9000.006.patch, 
> HDFS-9094-HDFS-9000.007.patch, HDFS-9094-HDFS-9000.008.patch, 
> HDFS-9094-HDFS-9000.009.patch, HDFS-9094.001.patch
>
>
> This work is going to add DFS admin command that allows reloading NameNode 
> configuration. This is sibling work related to HDFS-6808.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9094) Add command line option to ask NameNode reload configuration.

2016-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115906#comment-15115906
 ] 

Hudson commented on HDFS-9094:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9180 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9180/])
HDFS-9094. Add command line option to ask NameNode reload configuration. (arp: 
rev d62b4a4de75edb840df6634f49cb4beb74e3fb07)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ReconfigurationProtocolServerSideUtils.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ReconfigurationProtocol.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamenodeProtocols.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdmin.java


> Add command line option to ask NameNode reload configuration.
> -
>
> Key: HDFS-9094
> URL: https://issues.apache.org/jira/browse/HDFS-9094
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-9094-HDFS-9000.002.patch, 
> HDFS-9094-HDFS-9000.003.patch, HDFS-9094-HDFS-9000.004.patch, 
> HDFS-9094-HDFS-9000.005.patch, HDFS-9094-HDFS-9000.006.patch, 
> HDFS-9094-HDFS-9000.007.patch, HDFS-9094-HDFS-9000.008.patch, 
> HDFS-9094-HDFS-9000.009.patch, HDFS-9094.001.patch
>
>
> This work is going to add DFS admin command that allows reloading NameNode 
> configuration. This is sibling work related to HDFS-6808.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9494) Parallel optimization of DFSStripedOutputStream#flushAllInternals( )

2016-01-25 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115531#comment-15115531
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9494:
---

In the last for-loop, the finally-block will be executed multiple times 
(healthyStreamerCount).  It may not be intended.

I think it is better to wait until all tasks have been completed.  Then, 
process the exceptions if the map is non-empty.

> Parallel optimization of DFSStripedOutputStream#flushAllInternals( )
> 
>
> Key: HDFS-9494
> URL: https://issues.apache.org/jira/browse/HDFS-9494
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: GAO Rui
>Assignee: GAO Rui
>Priority: Minor
> Attachments: HDFS-9494-origin-trunk.00.patch, 
> HDFS-9494-origin-trunk.01.patch, HDFS-9494-origin-trunk.02.patch, 
> HDFS-9494-origin-trunk.03.patch, HDFS-9494-origin-trunk.04.patch
>
>
> Currently, in DFSStripedOutputStream#flushAllInternals( ), we trigger and 
> wait for flushInternal( ) in sequence. So the runtime flow is like:
> {code}
> Streamer0#flushInternal( )
> Streamer0#waitForAckedSeqno( )
> Streamer1#flushInternal( )
> Streamer1#waitForAckedSeqno( )
> …
> Streamer8#flushInternal( )
> Streamer8#waitForAckedSeqno( )
> {code}
> It could be better to trigger all the streamers to flushInternal( ) and
> wait for all of them to return from waitForAckedSeqno( ),  and then 
> flushAllInternals( ) returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115578#comment-15115578
 ] 

Hadoop QA commented on HDFS-9260:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 
704 unchanged - 12 fixed = 706 total (was 716) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 34s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 34s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 128m 41s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.hdfs.server.datanode.TestFsDatasetCache |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 

[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++

2016-01-25 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115421#comment-15115421
 ] 

James Clampffer commented on HDFS-9118:
---

I'm going to take a shot at this.  Things I'm planning on picking up implicitly 
in addition to the log message and level:
-id of thread doing the logging
-stack address of the logging function (add a local variable and grab its 
address)
-line number, file name, function

> Add logging system for libdhfs++
> 
>
> Key: HDFS-9118
> URL: https://issues.apache.org/jira/browse/HDFS-9118
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: James Clampffer
>
> With HDFS-9505, we've starting logging data from libhdfs++.  Consumers of the 
> library are going to have their own logging infrastructure that we're going 
> to want to provide data to.  
> libhdfs++ should have a logging library that:
> * Is overridable and can provide sufficient information to work well with 
> common C++ logging frameworks
> * Has a rational default implementation 
> * Is performant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++

2016-01-25 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115434#comment-15115434
 ] 

Bob Hansen commented on HDFS-9118:
--

Grabbing the stack address is a fairly expensive operation.  I would limit it 
to opt-in in rare circumstances, and perhaps when logging an error.

> Add logging system for libdhfs++
> 
>
> Key: HDFS-9118
> URL: https://issues.apache.org/jira/browse/HDFS-9118
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: James Clampffer
>
> With HDFS-9505, we've starting logging data from libhdfs++.  Consumers of the 
> library are going to have their own logging infrastructure that we're going 
> to want to provide data to.  
> libhdfs++ should have a logging library that:
> * Is overridable and can provide sufficient information to work well with 
> common C++ logging frameworks
> * Has a rational default implementation 
> * Is performant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9607) Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place)

2016-01-25 Thread Dinesh S. Atreya (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115548#comment-15115548
 ] 

Dinesh S. Atreya commented on HDFS-9607:


Please see updated API from above below:
{code:title=FSWriteInPlaceStream.java|borderStyle=solid}
// (alternate names welcome) extends FSDataOutputStream 
longgetPos() // Get the current position, note FSDataOutputStream already 
has it.

voidseek(long desiredWritePos)
// Seek to the given position in file

int write(long position, byte[] writeBuffer, int readLength) throws 
IOException
// Write/Update bytes from writeBuffer up to previously read length 
// at given position in file

int write(long position, int readLength, byte[] writeBuffer, int offset, 
int readLength) throws IOException
// Write/Update bytes from writeBuffer up to previously read length 
// after seek in file starting at offset.

boolean  canWrite(long position, byte[] writeBuffer, int readLength)
// Check whether Write/Update of bytes from writeBuffer up to 
// previously read length at given position is possible inside file

boolean  canWrite(long position, int readLength, byte[] writeBuffer, int 
offset, int readLength)
// Check whether Write/Update of bytes from writeBuffer up to 
// previously read length after seek is possible inside file starting at offset.
{code}


> Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place)
> 
>
> Key: HDFS-9607
> URL: https://issues.apache.org/jira/browse/HDFS-9607
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Dinesh S. Atreya
>
> Link to Umbrella JIRA
> https://issues.apache.org/jira/browse/HADOOP-12620 
> Provide capability to carry out in-place writes/updates. Only writes in-place 
> are supported where the existing length does not change.
> For example, "Hello World" can be replaced by "Hello HDFS!"
> See 
> https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300
>  for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9607) Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place)

2016-01-25 Thread Dinesh S. Atreya (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115577#comment-15115577
 ] 

Dinesh S. Atreya commented on HDFS-9607:


Alternatively, assuming {{getPos()}} and {{seek}} as given above are included
{code:title=FSWriteInPlaceStream.java|borderStyle=solid}
// (alternate names welcome) extends FSDataOutputStream 
voidsetReadLength(int length) // Set the length that had been read earlier.

int getReadLength()   // Get the read length that has been set.

int write(long position, byte[] writeBuffer) throws IOException
// Write/Update bytes from writeBuffer up to previously read length 
// at given position in file

int write(long position, byte[] writeBuffer, int offset) throws IOException
// Write/Update bytes from writeBuffer up to previously read length 
// after seek in file starting at offset.

boolean  canWrite(long position, byte[] writeBuffer)
// Check whether Write/Update of bytes from writeBuffer up to 
// previously read length at given position is possible inside file

boolean  canWrite(long position, byte[] writeBuffer, int offset)
// Check whether Write/Update of bytes from writeBuffer up to 
// previously read length after seek is possible inside file starting at offset.
{code}

> Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place)
> 
>
> Key: HDFS-9607
> URL: https://issues.apache.org/jira/browse/HDFS-9607
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Dinesh S. Atreya
>
> Link to Umbrella JIRA
> https://issues.apache.org/jira/browse/HADOOP-12620 
> Provide capability to carry out in-place writes/updates. Only writes in-place 
> are supported where the existing length does not change.
> For example, "Hello World" can be replaced by "Hello HDFS!"
> See 
> https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300
>  for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-9696:


 Summary: Garbage snapshot records lingering forever
 Key: HDFS-9696
 URL: https://issues.apache.org/jira/browse/HDFS-9696
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.2
Reporter: Kihwal Lee
Priority: Critical


We have a cluster where the snapshot feature might have been tested years ago. 
When the HDFS does not have any snapshot, but I see filediff records persisted 
in its fsimage.  Since it has been restarted many times and checkpointed over 
100 times since then, it must haven been persisted and  carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9118) Add logging system for libdhfs++

2016-01-25 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer reassigned HDFS-9118:
-

Assignee: James Clampffer

> Add logging system for libdhfs++
> 
>
> Key: HDFS-9118
> URL: https://issues.apache.org/jira/browse/HDFS-9118
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: James Clampffer
>
> With HDFS-9505, we've starting logging data from libhdfs++.  Consumers of the 
> library are going to have their own logging infrastructure that we're going 
> to want to provide data to.  
> libhdfs++ should have a logging library that:
> * Is overridable and can provide sufficient information to work well with 
> common C++ logging frameworks
> * Has a rational default implementation 
> * Is performant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9607) Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place)

2016-01-25 Thread Dinesh S. Atreya (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115572#comment-15115572
 ] 

Dinesh S. Atreya commented on HDFS-9607:


Correction 
{code}
int write(long position, byte[] writeBuffer, int offset, int readLength) 
throws IOException
// Write/Update bytes from writeBuffer up to previously read length 
// after seek in file starting at offset.
{code}
should replace
{code}
int write(long position, int readLength, byte[] writeBuffer, int offset, 
int readLength) throws IOException
// Write/Update bytes from writeBuffer up to previously read length 
// after seek in file starting at offset.
{code}

> Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place)
> 
>
> Key: HDFS-9607
> URL: https://issues.apache.org/jira/browse/HDFS-9607
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Dinesh S. Atreya
>
> Link to Umbrella JIRA
> https://issues.apache.org/jira/browse/HADOOP-12620 
> Provide capability to carry out in-place writes/updates. Only writes in-place 
> are supported where the existing length does not change.
> For example, "Hello World" can be replaced by "Hello HDFS!"
> See 
> https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300
>  for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9690) addBlock is not idempotent

2016-01-25 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115568#comment-15115568
 ] 

Vinayakumar B commented on HDFS-9690:
-

Tried committing, patch doesnt apply to branch-2.7 as there is no 
FSDirWriteFileOp.java in branch-2.7.

> addBlock is not idempotent
> --
>
> Key: HDFS-9690
> URL: https://issues.apache.org/jira/browse/HDFS-9690
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h9690_20160124.patch, h9690_20160124b.patch
>
>
> TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the 
> bug. It failed in the following builds.
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9684) DataNode stopped sending heartbeat after getting OutOfMemoryError form DataTransfer thread.

2016-01-25 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115523#comment-15115523
 ] 

Kihwal Lee commented on HDFS-9684:
--

That usually means the ulimit is reached. What is the max Xceiver limit in the 
datanode config? And what is the datanode user's limit on fork/clone? I.e. 
{{ulimit -u}}.  On a rare occasion, the system can run out of PID. I think the 
default on most linux distros is 32K. You can raise it if that's causing the 
problem.

> DataNode stopped sending heartbeat after getting OutOfMemoryError form 
> DataTransfer thread.
> ---
>
> Key: HDFS-9684
> URL: https://issues.apache.org/jira/browse/HDFS-9684
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Blocker
> Attachments: HDFS-9684.01.patch
>
>
> {noformat}
> java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:714)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1999)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:2008)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:857)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:671)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9690) addBlock is not idempotent

2016-01-25 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9690:
-
Target Version/s: 2.7.3

> addBlock is not idempotent
> --
>
> Key: HDFS-9690
> URL: https://issues.apache.org/jira/browse/HDFS-9690
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h9690_20160124.patch, h9690_20160124b.patch
>
>
> TestDFSClientRetries#testIdempotentAllocateBlockAndClose can illustrate the 
> bug. It failed in the following builds.
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14188/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14201/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/
> - 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14202/testReport/org.apache.hadoop.hdfs/TestDFSClientRetries/testIdempotentAllocateBlockAndClose/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang reassigned HDFS-9696:
---

Assignee: Yongjun Zhang

Hi Kihwal,

Since I am working on, do you mind if I assign it to myself?

Thanks.



> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Yongjun Zhang
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115754#comment-15115754
 ] 

Kihwal Lee commented on HDFS-9696:
--

bq.  do you mind if I assign it to myself?
I don't. But I noticed that none of the original snapshot feature developers 
are watching HDFS-9406. At some point, we should call them out.

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Yongjun Zhang
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9697) NN fails to restart due to corrupt fsimage caused by snapshot handling

2016-01-25 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-9697:

Summary: NN fails to restart due to corrupt fsimage caused by snapshot 
handling  (was: NN fails to restart due to corrupt fsimage)

> NN fails to restart due to corrupt fsimage caused by snapshot handling
> --
>
> Key: HDFS-9697
> URL: https://issues.apache.org/jira/browse/HDFS-9697
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>
> This is related to HDFS-9406, but not quite the same symptom.
> {quote}
> ERROR namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReference(FSImageFormatPBSnapshot.java:114)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReferenceSection(FSImageFormatPBSnapshot.java:105)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:258)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:766)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:589)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:818)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:797)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1561)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115595#comment-15115595
 ] 

Kihwal Lee commented on HDFS-9696:
--

{code:xml}
0
...
1638543008443-10action-data.seq
43108392-1302some_random_file
...

{code}

The file with inode number 43008443 exists. As it is shown, there is no 
snapshot that SnapshotManager is aware of and the snapshot ID of all filediff 
entries are -1.

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9697) NN fails to restart due to corrupt fsimage

2016-01-25 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-9697:
---

 Summary: NN fails to restart due to corrupt fsimage
 Key: HDFS-9697
 URL: https://issues.apache.org/jira/browse/HDFS-9697
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang


This is related to HDFS-9406, but not quite the same symptom.

{quote}
ERROR namenode.NameNode: Failed to start namenode.
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReference(FSImageFormatPBSnapshot.java:114)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReferenceSection(FSImageFormatPBSnapshot.java:105)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:258)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1062)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:766)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:589)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:818)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:797)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1561)
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115690#comment-15115690
 ] 

Yongjun Zhang commented on HDFS-9696:
-

Hi [~kihwal],

Thanks much for reporting this issue. I have been looking in to HDFS-9406 and 
observed the same. I have made progress on HDFS-9406 and am still working on.


> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

2016-01-25 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115710#comment-15115710
 ] 

Jing Zhao commented on HDFS-8999:
-

Thanks for verifying the test failures, Nicholas! +1 committing the latest 
patch to trunk. Please see if you plan to commit it to branch-2 as well. 

> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -
>
> Key: HDFS-8999
> URL: https://issues.apache.org/jira/browse/HDFS-8999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h8999_20151228.patch, h8999_20160106.patch, 
> h8999_20160106b.patch, h8999_20160106c.patch, h8999_20160111.patch, 
> h8999_20160113.patch, h8999_20160114.patch, h8999_20160121.patch, 
> h8999_20160121b.patch
>
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
># NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
># If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
># When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9700) DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots

2016-01-25 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HDFS-9700:

Attachment: HDFS-9700_branch-2.7.patch

The attached patch is against branch-2.7.  For an HBase deployment on secure 
Hadoop, this reliably lowers our P95 write latencies from 40ms+ to ~2ms.

I'm still working out how/if the same changes apply to trunk.

> DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots
> 
>
> Key: HDFS-9700
> URL: https://issues.apache.org/jira/browse/HDFS-9700
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1, 2.6.3
>Reporter: Gary Helmling
> Attachments: HDFS-9700_branch-2.7.patch
>
>
> In {{DFSClient.connectToDN()}} and 
> {{DFSOutputStream.createSocketForPipeline()}}, we never call 
> {{setTcpNoDelay()}} on the constructed socket before sending.  In both cases, 
> we should respect the value of ipc.client.tcpnodelay in the configuration.
> While this applies whether security is enabled or not, it seems to have a 
> bigger impact on latency when security is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9669) TcpPeerServer should respect ipc.server.listen.queue.size

2016-01-25 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116012#comment-15116012
 ] 

Elliott Clark commented on HDFS-9669:
-

Ping?

This is running in production and removes thousands of tcp resets.

> TcpPeerServer should respect ipc.server.listen.queue.size
> -
>
> Key: HDFS-9669
> URL: https://issues.apache.org/jira/browse/HDFS-9669
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-9669.0.patch, HDFS-9669.1.patch, HDFS-9669.1.patch
>
>
> On periods of high traffic we are seeing:
> {code}
> 16/01/19 23:40:40 WARN hdfs.DFSClient: Connection failure: Failed to connect 
> to /10.138.178.47:50010 for file /MYPATH/MYFILE for block 
> BP-1935559084-10.138.112.27-1449689748174:blk_1080898601_7375294:java.io.IOException:
>  Connection reset by peer
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>   at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>   at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:109)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
> {code}
> At the time that this happens there are way less xceivers than configured.
> On most JDK's this will make 50 the total backlog at any time. This 
> effectively means that any GC + Busy time willl result in tcp resets.
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/net/ServerSocket.java#l370



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9118) Add logging system for libdhfs++

2016-01-25 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116065#comment-15116065
 ] 

James Clampffer commented on HDFS-9118:
---

Shouldn't getting the stack address of a local variable boil down (at least on 
x86) to just reading whats in ESP-a constant offset?  That should be doable in 
a few cycles, superscalar complications aside, unless I'm missing something.

Either way good idea on allowing things to opt-in based on logging levels.  
I'll add that.

> Add logging system for libdhfs++
> 
>
> Key: HDFS-9118
> URL: https://issues.apache.org/jira/browse/HDFS-9118
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: HDFS-8707
>Reporter: Bob Hansen
>Assignee: James Clampffer
>
> With HDFS-9505, we've starting logging data from libhdfs++.  Consumers of the 
> library are going to have their own logging infrastructure that we're going 
> to want to provide data to.  
> libhdfs++ should have a logging library that:
> * Is overridable and can provide sufficient information to work well with 
> common C++ logging frameworks
> * Has a rational default implementation 
> * Is performant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9700) DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots

2016-01-25 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HDFS-9700:

Attachment: HDFS-9700-v1.patch

Attaching a patch for the same changes against trunk.

> DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots
> 
>
> Key: HDFS-9700
> URL: https://issues.apache.org/jira/browse/HDFS-9700
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1, 2.6.3
>Reporter: Gary Helmling
> Attachments: HDFS-9700-v1.patch, HDFS-9700_branch-2.7.patch
>
>
> In {{DFSClient.connectToDN()}} and 
> {{DFSOutputStream.createSocketForPipeline()}}, we never call 
> {{setTcpNoDelay()}} on the constructed socket before sending.  In both cases, 
> we should respect the value of ipc.client.tcpnodelay in the configuration.
> While this applies whether security is enabled or not, it seems to have a 
> bigger impact on latency when security is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9700) DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots

2016-01-25 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HDFS-9700:

Status: Patch Available  (was: Open)

> DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots
> 
>
> Key: HDFS-9700
> URL: https://issues.apache.org/jira/browse/HDFS-9700
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.6.3, 2.7.1
>Reporter: Gary Helmling
> Attachments: HDFS-9700-v1.patch, HDFS-9700_branch-2.7.patch
>
>
> In {{DFSClient.connectToDN()}} and 
> {{DFSOutputStream.createSocketForPipeline()}}, we never call 
> {{setTcpNoDelay()}} on the constructed socket before sending.  In both cases, 
> we should respect the value of ipc.client.tcpnodelay in the configuration.
> While this applies whether security is enabled or not, it seems to have a 
> bigger impact on latency when security is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9701) DN may deadlock when hot-swapping under load

2016-01-25 Thread Xiao Chen (JIRA)
Xiao Chen created HDFS-9701:
---

 Summary: DN may deadlock when hot-swapping under load
 Key: HDFS-9701
 URL: https://issues.apache.org/jira/browse/HDFS-9701
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiao Chen
Assignee: Xiao Chen


If the DN is under load (new blocks being written), a hot-swap task by {{hdfs 
dfsadmin -reconfig}} may cause a dead lock.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9700) DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116161#comment-15116161
 ] 

Hadoop QA commented on HDFS-9700:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 33s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12784241/HDFS-9700-v1.patch |
| JIRA Issue | HDFS-9700 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4f1abf5f155b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 

[jira] [Commented] (HDFS-9701) DN may deadlock when hot-swapping under load

2016-01-25 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116149#comment-15116149
 ] 

Xiao Chen commented on HDFS-9701:
-

Most notable jstacks:
Reconfigure task:
{noformat}
"Reconfiguration Task" #459 daemon prio=5 os_prio=0 tid=0x7fc6913a6000 
nid=0x5219 waiting on condition [0x7fc663cde000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.closeAndWait(FsVolumeImpl.java:251)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:322)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:363)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:472)
- locked <0xd6057410> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:718)
- locked <0xd55a5950> (a 
org.apache.hadoop.hdfs.server.datanode.DataNode)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:684)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.refreshVolumes(DataNode.java:648)
- locked <0xd55a5950> (a 
org.apache.hadoop.hdfs.server.datanode.DataNode)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.reconfigurePropertyImpl(DataNode.java:485)
at 
org.apache.hadoop.conf.ReconfigurableBase$ReconfigurationThread.run(ReconfigurableBase.java:133)
{noformat}

Being written thread:
{noformat}
"PacketResponder: BP-284727513-10.64.40.36-1450767058747:blk_1073785044_44298, 
type=HAS_DOWNSTREAM_IN_PIPELINE" #462 daemon prio=5 os_prio=0 
tid=0x7fc67c5c8000 nid=0x5268 waiting for monitor entry [0x7fc662ed2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1487)
- waiting to lock <0xd6057410> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1300)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1257)
at java.lang.Thread.run(Thread.java:745)
{noformat}
The deadlock happens between a lock and a reference count waiting:
# in {{BlockReceiver$PacketResponder#finalizeBlock}}, reference is increased 
after {{claimReplicaHandler}}.  (Code 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java#L1426])
# Reconfigure task locks on the {{FsDatasetImpl}} object
# Reconfigure task calls all the way into {{FsVolumeImpl#closeAndWait}}, 
infinite loop waiting on reference count (Code 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java#L249])
# {{BlockReceiver$PacketResponder#finalizeBlock}} waits on the 
{{FsDatasetImpl}} object's lock in step #2. Oops.

> DN may deadlock when hot-swapping under load
> 
>
> Key: HDFS-9701
> URL: https://issues.apache.org/jira/browse/HDFS-9701
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> If the DN is under load (new blocks being written), a hot-swap task by {{hdfs 
> dfsadmin -reconfig}} may cause a dead lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9330) Support reconfiguring dfs.datanode.duplicate.replica.deletion without DN restart

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116162#comment-15116162
 ] 

Hadoop QA commented on HDFS-9330:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 
284 unchanged - 10 fixed = 285 total (was 294) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 13s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 28s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 128m 28s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
| JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12784218/HDFS-9330-HDFS-9000.003.patch
 |
| JIRA Issue | HDFS-9330 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 53bba4c69937 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 

[jira] [Created] (HDFS-9702) DiskBalancer : getVolumeMap implementation

2016-01-25 Thread Anu Engineer (JIRA)
Anu Engineer created HDFS-9702:
--

 Summary: DiskBalancer : getVolumeMap implementation
 Key: HDFS-9702
 URL: https://issues.apache.org/jira/browse/HDFS-9702
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer & mover
Affects Versions: HDFS-1312
Reporter: Anu Engineer
Assignee: Anu Engineer
 Fix For: HDFS-1312


Add get volume map 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9702) DiskBalancer : getVolumeMap implementation

2016-01-25 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9702:
---
Attachment: HDFS-9702-HDFS-1312.001.patch

Adding patch for code review. This is dependent on HDFS-9683

> DiskBalancer : getVolumeMap implementation
> --
>
> Key: HDFS-9702
> URL: https://issues.apache.org/jira/browse/HDFS-9702
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: HDFS-1312
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-1312
>
> Attachments: HDFS-9702-HDFS-1312.001.patch
>
>
> Add get volume map 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9703) DiskBalancer : getBandwidth implementation

2016-01-25 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-9703:
---
Attachment: HDFS-9703-HDFS-1312.001.patch

Adding patch for code review. This is dependent on HDFS-9702

> DiskBalancer : getBandwidth implementation
> --
>
> Key: HDFS-9703
> URL: https://issues.apache.org/jira/browse/HDFS-9703
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: HDFS-1312
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-1312
>
> Attachments: HDFS-9703-HDFS-1312.001.patch
>
>
> Add getBandwidth call



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9262) Support reconfiguring dfs.datanode.lazywriter.interval.sec without DN restart

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116176#comment-15116176
 ] 

Hadoop QA commented on HDFS-9262:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
2s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 
251 unchanged - 11 fixed = 252 total (was 262) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 17s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 57s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 155m 35s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.tools.TestDFSAdmin |
| JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.TestRenameWhileOpen |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
|   | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation |
|   | hadoop.hdfs.tools.TestDFSAdminWithHA |
|   | hadoop.hdfs.tools.TestDFSAdmin |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 

[jira] [Commented] (HDFS-9525) hadoop utilities need to support provided delegation tokens

2016-01-25 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116289#comment-15116289
 ] 

Owen O'Malley commented on HDFS-9525:
-

[~daryn] I'm sorry, but I don't see what problem the patch introduced. It lets 
your webhdfs have a token even if your security is turned off as long as it was 
already in the UGI. Where is the problem?

> hadoop utilities need to support provided delegation tokens
> ---
>
> Key: HDFS-9525
> URL: https://issues.apache.org/jira/browse/HDFS-9525
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-7984.001.patch, HDFS-7984.002.patch, 
> HDFS-7984.003.patch, HDFS-7984.004.patch, HDFS-7984.005.patch, 
> HDFS-7984.006.patch, HDFS-7984.007.patch, HDFS-7984.patch, 
> HDFS-9525.008.patch, HDFS-9525.009.patch, HDFS-9525.009.patch, 
> HDFS-9525.branch-2.008.patch, HDFS-9525.branch-2.009.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9701) DN may deadlock when hot-swapping under load

2016-01-25 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9701:

Attachment: HDFS-9701.01.patch

> DN may deadlock when hot-swapping under load
> 
>
> Key: HDFS-9701
> URL: https://issues.apache.org/jira/browse/HDFS-9701
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9701.01.patch
>
>
> If the DN is under load (new blocks being written), a hot-swap task by {{hdfs 
> dfsadmin -reconfig}} may cause a dead lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9701) DN may deadlock when hot-swapping under load

2016-01-25 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9701:

Status: Patch Available  (was: Open)

> DN may deadlock when hot-swapping under load
> 
>
> Key: HDFS-9701
> URL: https://issues.apache.org/jira/browse/HDFS-9701
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9701.01.patch
>
>
> If the DN is under load (new blocks being written), a hot-swap task by {{hdfs 
> dfsadmin -reconfig}} may cause a dead lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9701) DN may deadlock when hot-swapping under load

2016-01-25 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116165#comment-15116165
 ] 

Xiao Chen commented on HDFS-9701:
-

Thanks a lot to [~eddyxu] for the offline discussion about the general ideas!

To fix the problem, we have several ways:
1. Don't do infinite wait in {{FsVolumeImpl#closeAndWait}}, wait outside of the 
lock scope
2. Use finer-grained locks on the volumes in FsDatasetImpl.

I think option 1 is better since the change is smaller, and the infinite wait 
inside seems a bit scary to me.
Patch 1 attempts to solve the problem along option 1.
- Moved the wait-for-close logic to outside of the {{FsDatasetImpl}}, into 
{{DataNode}}.
- Had to add a new interface to {{FsDatasetSpi}}
- Added methods along the call stack to allow the above
- Added a new unit test in {{TestFsDatasetImpl}} that locks before the patch 
and passes after
- Had to modify the {{TestFsVolumeList}} to accommodate the change
- Added more info into the log in {{BlockReceiver}} which I found useful when 
root causing the problem.
- Added a missing {{@Override}} in {{FsDatasetImpl}}

> DN may deadlock when hot-swapping under load
> 
>
> Key: HDFS-9701
> URL: https://issues.apache.org/jira/browse/HDFS-9701
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9701.01.patch
>
>
> If the DN is under load (new blocks being written), a hot-swap task by {{hdfs 
> dfsadmin -reconfig}} may cause a dead lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9698) Long running Balancer should renew TGT

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116184#comment-15116184
 ] 

Hadoop QA commented on HDFS-9698:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 
38 unchanged - 0 fixed = 39 total (was 38) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 10s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 7s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 182m 15s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.shortcircuit.TestShortCircuitCache |
|   | hadoop.hdfs.TestEncryptionZones |
|   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.server.namenode.TestFSImageWithSnapshot |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.server.datanode.TestBlockScanner |
|   | 

[jira] [Created] (HDFS-9703) DiskBalancer : getBandwidth implementation

2016-01-25 Thread Anu Engineer (JIRA)
Anu Engineer created HDFS-9703:
--

 Summary: DiskBalancer : getBandwidth implementation
 Key: HDFS-9703
 URL: https://issues.apache.org/jira/browse/HDFS-9703
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer & mover
Affects Versions: HDFS-1312
Reporter: Anu Engineer
Assignee: Anu Engineer
 Fix For: HDFS-1312


Add getBandwidth call



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9476) TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail

2016-01-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115775#comment-15115775
 ] 

Mingliang Liu commented on HDFS-9476:
-

It happens in recent build as well, see UT log at: 
https://builds.apache.org/job/PreCommit-HDFS-Build/14230/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt

> TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail
> -
>
> Key: HDFS-9476
> URL: https://issues.apache.org/jira/browse/HDFS-9476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>
> This test occasionally fail. For example, the most recent one is:
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2587/
> Error Message
> {noformat}
> Cannot obtain block length for 
> LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020;
>  getBlockSize()=1024; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]}
> {noformat}
> Stacktrace
> {noformat}
> java.io.IOException: Cannot obtain block length for 
> LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020;
>  getBlockSize()=1024; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]}
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:399)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:343)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:275)
>   at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:265)
>   at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1046)
>   at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1011)
>   at 
> org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.dfsOpenFileWithRetries(TestDFSUpgradeFromImage.java:177)
>   at 
> org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyDir(TestDFSUpgradeFromImage.java:213)
>   at 
> org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyFileSystem(TestDFSUpgradeFromImage.java:228)
>   at 
> org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.upgradeAndVerify(TestDFSUpgradeFromImage.java:600)
>   at 
> org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.testUpgradeFromRel1BBWImage(TestDFSUpgradeFromImage.java:622)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9698) Long running Balancer should renew TGT

2016-01-25 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9698:

Status: Patch Available  (was: Open)

> Long running Balancer should renew TGT
> --
>
> Key: HDFS-9698
> URL: https://issues.apache.org/jira/browse/HDFS-9698
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover, security
>Affects Versions: 2.6.3
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>
> When the {{Balancer}} runs beyond the configured TGT lifetime, the current 
> logic won't renew TGT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115802#comment-15115802
 ] 

Yongjun Zhang commented on HDFS-9696:
-

Ah, I intended to write the request message in my prior comment before 
reassigning, just found that I accidentally reassigned together with the 
request message.  Sorry about that.


> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Yongjun Zhang
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9689) Test o.a.h.hdfs.TestRenameWhileOpen fails intermittently

2016-01-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115838#comment-15115838
 ] 

Mingliang Liu commented on HDFS-9689:
-

Thanks [~iwasakims] and [~vinayrpet] for your insightful comments.

{quote}
Are other test processes still possible to bind the nn port between 
shutdownNameNode and createNameNode in MiniDFSCluster#restartNameNode?
{quote}
Yes. This is why I cancelled the patch. If the port changes in the process of 
restart NN, the NN will never leave the safe mode, leading to timed out 
exception.

{quote}
So, to completely resolve issues raising due to Port Bind issues, from restart 
(Name|Data)nodes needs some effort
{quote}
I totally agree with you. The to-do list you proposed seems to fix these kind 
of errors fundamentally. Shall we address this in separate jira?


> Test o.a.h.hdfs.TestRenameWhileOpen fails intermittently 
> -
>
> Key: HDFS-9689
> URL: https://issues.apache.org/jira/browse/HDFS-9689
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9689.000.patch
>
>
> The test fails in recent builds, e.g.
> https://builds.apache.org/job/PreCommit-HDFS-Build/14063/testReport/org.apache.hadoop.hdfs/TestRenameWhileOpen/
> and
> https://builds.apache.org/job/PreCommit-HDFS-Build/14212/testReport/org.apache.hadoop.hdfs/TestRenameWhileOpen/testWhileOpenRenameToNonExistentDirectory/
> The *Error Message* is like:
> {code}
> Problem binding to [localhost:60690] java.net.BindException: Address already 
> in use; For more details see:  http://wiki.apache.org/hadoop/BindException
> {code}
> and *Stacktrace* is:
> {code}
> java.net.BindException: Problem binding to [localhost:60690] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:463)
>   at sun.nio.ch.Net.bind(Net.java:455)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:469)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:695)
>   at org.apache.hadoop.ipc.Server.(Server.java:2464)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:392)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:743)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:685)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:884)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1581)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441)
>   at 
> org.apache.hadoop.hdfs.TestRenameWhileOpen.testWhileOpenRenameToNonExistentDirectory(TestRenameWhileOpen.java:332)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-01-25 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115879#comment-15115879
 ] 

Yongjun Zhang commented on HDFS-9696:
-

Yes [~jingzhao], your analysis is correct to me per my study in HDFS-9406. 
Thanks.



> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Yongjun Zhang
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9700) DFSClient and DFSOutputStream do not respect TCP_NODELAY config in two spots

2016-01-25 Thread Gary Helmling (JIRA)
Gary Helmling created HDFS-9700:
---

 Summary: DFSClient and DFSOutputStream do not respect TCP_NODELAY 
config in two spots
 Key: HDFS-9700
 URL: https://issues.apache.org/jira/browse/HDFS-9700
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.3, 2.7.1
Reporter: Gary Helmling


In {{DFSClient.connectToDN()}} and 
{{DFSOutputStream.createSocketForPipeline()}}, we never call 
{{setTcpNoDelay()}} on the constructed socket before sending.  In both cases, 
we should respect the value of ipc.client.tcpnodelay in the configuration.

While this applies whether security is enabled or not, it seems to have a 
bigger impact on latency when security is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9672) o.a.h.hdfs.TestLeaseRecovery2 fails intermittently

2016-01-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116453#comment-15116453
 ] 

Mingliang Liu commented on HDFS-9672:
-

Thanks for the discussion, review and commit, [~jnp]!

> o.a.h.hdfs.TestLeaseRecovery2 fails intermittently
> --
>
> Key: HDFS-9672
> URL: https://issues.apache.org/jira/browse/HDFS-9672
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9672.000.patch, HDFS-9672.001.patch
>
>
> It fails in recent builds, see:
> https://builds.apache.org/job/PreCommit-HDFS-Build/14177/testReport/org.apache.hadoop.hdfs/
> https://builds.apache.org/job/PreCommit-HDFS-Build/14147/testReport/org.apache.hadoop.hdfs/
> Failing test methods include:
> * 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart
> * org.apache.hadoop.hdfs.TestLeaseRecovery2.testLeaseRecoverByAnotherUser
> * org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecovery
> * 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.org.apache.hadoop.hdfs.TestLeaseRecovery2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9701) DN may deadlock when hot-swapping under load

2016-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116466#comment-15116466
 ] 

Hadoop QA commented on HDFS-9701:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 39s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
4s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 
335 unchanged - 1 fixed = 337 total (was 336) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 49s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 18s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 53s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
29s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 199m 57s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(Set, 
boolean) calls Thread.sleep() with a lock held  At DataNode.java:a lock held  
At DataNode.java:[line 805] |
| JDK v1.8.0_66 Failed junit tests | 
hadoop.hdfs.qjournal.client.TestQuorumJournalManager |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.TestRecoverStripedFile |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
|   | 

[jira] [Commented] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level

2016-01-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116490#comment-15116490
 ] 

Sangjin Lee commented on HDFS-9579:
---

[~mingma], the patch no longer applies cleanly. Do you mind updating the patch? 
Thanks!

> Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
> -
>
> Key: HDFS-9579
> URL: https://issues.apache.org/jira/browse/HDFS-9579
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9579-2.patch, HDFS-9579-3.patch, HDFS-9579-4.patch, 
> HDFS-9579.patch, MR job counters.png
>
>
> For cross DC distcp or other applications, it becomes useful to have insight 
> as to the traffic volume for each network distance to distinguish cross-DC 
> traffic, local-DC-remote-rack, etc.
> FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To 
> provide additional metrics for each network distance, we can add additional 
> metrics to FileSystem level and have {{DFSInputStream}} update the value 
> based on the network distance between client and the datanode.
> {{DFSClient}} will resolve client machine's network location as part of its 
> initialization. It doesn't need to resolve datanode's network location for 
> each read as {{DatanodeInfo}} already has the info.
> There are existing HDFS specific metrics such as {{ReadStatistics}} and 
> {{DFSHedgedReadMetrics}}. But these metrics are only accessible via 
> {{DFSClient}} or {{DFSInputStream}}. Not something that application framework 
> such as MR and Tez can get to. That is the benefit of storing these new 
> metrics in FileSystem.Statistics.
> This jira only includes metrics generation by HDFS. The consumption of these 
> metrics at MR and Tez will be tracked by separated jiras.
> We can add similar metrics for HDFS write scenario later if it is necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9664) TestRollingUpgrade.testRollback failed frequently

2016-01-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116320#comment-15116320
 ] 

Mingliang Liu commented on HDFS-9664:
-

Perhaps there is a fundamental problem in the rolling upgrade. Otherwise,
* If we don't really need check exit exception when shutdowning the cluster, we 
may ignore this exception by building a cluster with 
{{checkExitOnShutdown(false)}} option.
* Meanwhile, restarting NN/DN is not keep the RPC port. If the port is 
different, the NN may never leave safe mode because not enough DN registers 
(see [HDFS-9689]). Will this break the test? 


> TestRollingUpgrade.testRollback failed frequently
> -
>
> Key: HDFS-9664
> URL: https://issues.apache.org/jira/browse/HDFS-9664
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>
> Seen the following failure in the following jenkins test runs:
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/2743/testReport 
> (2016-01-18 22:14:23)
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/2742/testReport 
> (2016-01-18 17:52:58)
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/2739/testReport 
> (2016-01-18 01:51:26)
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/2738/testReport 
> (2016-01-17 21:56:17)
> Failed test: org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback
> {quote}
> Error Message
> Test resulted in an unexpected exit
> Stacktrace
> java.lang.AssertionError: Test resulted in an unexpected exit
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1895)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1882)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1875)
>   at 
> org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:350)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9677) Rename generationStampV1/generationStampV2 to legacyGenerationStamp/generationStamp

2016-01-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116366#comment-15116366
 ] 

Mingliang Liu commented on HDFS-9677:
-

The v1 patch basically renames the {{generationStampV1}} => 
{{legacyGenerationStamp}} and {{generationStampV2}} => {{generationStamp}}. I 
think the renaming is reasonable as I don't see any loss of readability. 
Meanwhile, there are existing comments for the usages of the variables to 
elaborate the difference. We have other cases using "legacy" in class/variable 
names, e.g. {{BlockReaderLocalLegacy}}, {{legacyBlock}}.
Another option is to change {{generationStampV1}} to {{generationStampRandom}}, 
{{generationStampV2}} to {{generationStampSequential}}. These are real names. 
However, they're representing how it does, not exactly "what it does" 'cause 
they're implementation specific. Suppose a new user plays with generation stamp 
for the very first time, she needs to know implementation details before she is 
able to tell which one is legacy or deprecated. Even with current V1/V2 naming, 
we should not blame a new user who wonders whether a {{generationStampV3}} 
version exists.

I'd happy to refine the patch for further useful input.

> Rename generationStampV1/generationStampV2 to 
> legacyGenerationStamp/generationStamp
> ---
>
> Key: HDFS-9677
> URL: https://issues.apache.org/jira/browse/HDFS-9677
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Jing Zhao
>Assignee: Mingliang Liu
> Attachments: HDFS-9677.000.patch, HDFS-9677.001.patch
>
>
> [comment|https://issues.apache.org/jira/browse/HDFS-9542?focusedCommentId=15110531=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15110531]
>  from [~drankye] in HDFS-9542:
> {quote}
> Just wonder if it's a good idea to rename: generationStampV1 => 
> legacyGenerationStamp; generationStampV2 => generationStamp, similar for 
> other variables, as we have legacy block and block.
> {quote}
> This jira plans to do this rename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >