[jira] [Commented] (HDFS-8625) count with -h option displays namespace quota in human readable format

2015-06-22 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596074#comment-14596074
 ] 

Allen Wittenauer commented on HDFS-8625:


Right, counts should be in billions, byte sizes should be in gigabytes.  So 
that's pretty much the only bug here. But it's:

a) relatively minor
b) causes a cascade of other, incompatible changes (e.g., need to honor both b 
and not g for setting quotas)

 count with -h option displays namespace quota in human readable format
 --

 Key: HDFS-8625
 URL: https://issues.apache.org/jira/browse/HDFS-8625
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Archana T
Assignee: Surendra Singh Lilhore
 Attachments: HDFS-8625.patch


 When 'count' command is executed with '-h' option , namespace quota is 
 displayed in human readable format --
 Example :
 hdfs dfsadmin -setQuota {color:red}1048576{color} /test
 hdfs dfs -count -q -h -v /test
{color:red}QUOTA   REM_QUOTA{color} SPACE_QUOTA 
 REM_SPACE_QUOTADIR_COUNT   FILE_COUNT   CONTENT_SIZE PATHNAME
  {color:red}1 M   1.0 M{color}none 
 inf10  0 /test
 QUOTA and REM_QUOTA shows 1 M (human readable format) which actually should 
 give count value 1048576



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8637) OzoneHandler : Add Error Table

2015-06-22 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596261#comment-14596261
 ] 

Arpit Agarwal commented on HDFS-8637:
-

+1 for the patch. Jenkins failures are unrelated to the patch.

I will commit it shortly.

 OzoneHandler : Add Error Table
 --

 Key: HDFS-8637
 URL: https://issues.apache.org/jira/browse/HDFS-8637
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Anu Engineer
Assignee: Anu Engineer
 Attachments: hdfs-8637-HDFS-7240.001.patch


 Define all errors coming out of REST protocol. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8625) count with -h option displays namespace quota in human readable format

2015-06-22 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596074#comment-14596074
 ] 

Allen Wittenauer edited comment on HDFS-8625 at 6/22/15 3:23 PM:
-

Right, counts should be in billions, byte sizes should be in gigabytes.  So 
that's pretty much the only bug here. But it's:

a) relatively minor
b) causes a cascade of other, incompatible changes (e.g., need to honor b and 
not g for setting size-based quotas)


was (Author: aw):
Right, counts should be in billions, byte sizes should be in gigabytes.  So 
that's pretty much the only bug here. But it's:

a) relatively minor
b) causes a cascade of other, incompatible changes (e.g., need to honor both b 
and not g for setting quotas)

 count with -h option displays namespace quota in human readable format
 --

 Key: HDFS-8625
 URL: https://issues.apache.org/jira/browse/HDFS-8625
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Archana T
Assignee: Surendra Singh Lilhore
 Attachments: HDFS-8625.patch


 When 'count' command is executed with '-h' option , namespace quota is 
 displayed in human readable format --
 Example :
 hdfs dfsadmin -setQuota {color:red}1048576{color} /test
 hdfs dfs -count -q -h -v /test
{color:red}QUOTA   REM_QUOTA{color} SPACE_QUOTA 
 REM_SPACE_QUOTADIR_COUNT   FILE_COUNT   CONTENT_SIZE PATHNAME
  {color:red}1 M   1.0 M{color}none 
 inf10  0 /test
 QUOTA and REM_QUOTA shows 1 M (human readable format) which actually should 
 give count value 1048576



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-22 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596293#comment-14596293
 ] 

Haohui Mai commented on HDFS-8515:
--

The patch looks good. I'm wondering whether it is possible to inherit the 
{{AbstractChannel}} for the stream class, which is similar to what the 
{{ChildChannel}} patch has done in https://github.com/netty/netty/issues/3667. 
This will make the abstraction closer to the ones that netty provides, 
simplifying the effort of building the applications at the upper layer.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, 
 HDFS-8515-v3.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8462) Implement GETXATTRS and LISTXATTRS operation for WebImageViewer

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596246#comment-14596246
 ] 

Hadoop QA commented on HDFS-8462:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 43s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 58s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m  1s | Site still builds. |
| {color:green}+1{color} | checkstyle |   0m 44s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 19s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 18s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 159m 29s | Tests passed in hadoop-hdfs. 
|
| | | 208m 56s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741031/HDFS-8462-04.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 445b132 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11434/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11434/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11434/console |


This message was automatically generated.

 Implement GETXATTRS and LISTXATTRS operation for WebImageViewer
 ---

 Key: HDFS-8462
 URL: https://issues.apache.org/jira/browse/HDFS-8462
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Akira AJISAKA
Assignee: Jagadesh Kiran N
 Attachments: HDFS-8462-00.patch, HDFS-8462-01.patch, 
 HDFS-8462-02.patch, HDFS-8462-03.patch, HDFS-8462-04.patch


 In Hadoop 2.7.0, WebImageViewer supports the following operations:
 * {{GETFILESTATUS}}
 * {{LISTSTATUS}}
 * {{GETACLSTATUS}}
 I'm thinking it would be better for administrators if {{GETXATTRS}} and 
 {{LISTXATTRS}} are supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8637) OzoneHandler : Add Error Table

2015-06-22 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-8637:

   Resolution: Fixed
Fix Version/s: HDFS-7240
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to the feature branch.

Thanks [~anu] for the contribution!

 OzoneHandler : Add Error Table
 --

 Key: HDFS-8637
 URL: https://issues.apache.org/jira/browse/HDFS-8637
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Anu Engineer
Assignee: Anu Engineer
 Fix For: HDFS-7240

 Attachments: hdfs-8637-HDFS-7240.001.patch


 Define all errors coming out of REST protocol. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8493) Consolidate truncate() related implementation in a single class

2015-06-22 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596397#comment-14596397
 ] 

Rakesh R commented on HDFS-8493:


Following are the functions where it has done the resolution 
{{fsd.resolvePath(pc, src, pathComponents);}} by acquiring only fsn lock and 
not fsd lock. Could you please take a look at it.
# FsDirAclOp.java
- getAclStatus()
- modifyAclEntries()
- removeAcl()
- removeDefaultAcl()
- setAcl()
- getAclStatus()
# FsDirDeleteOp.java
- delete(fsn, src, recursive, logRetryCache)
# FsDirRenameOp.java
- renameToInt(fsd, srcArg, dstArg, logRetryCache)
- renameToInt(fsd, srcArg, dstArg, logRetryCache, options)
# FsDirStatAndListingOp.java
- getContentSummary(fsd, src)
- getFileInfo(fsd, srcArg, resolveLink)
- isFileClosed(fsd, src)
- getListingInt(fsd, srcArg, startAfter, needLocation)
# FsDirWriteFileOp.java
- abandonBlock()
- completeFile(fsn, pc, srcArg, holder, last, fileId)
- getEncryptionKeyInfo(fsn, pc, src, supportedVersions)
- startFile()
- validateAddBlock()
# FsDirXAttrOp.java
- getXAttrs(fsd, srcArg, xAttrs)
- listXAttrs(fsd, src)
- setXAttr(fsd, src, xAttr, flag, logRetryCache)
# FSNamesystem.java
- createEncryptionZoneInt()
- getEZForPath()

 Consolidate truncate() related implementation in a single class
 ---

 Key: HDFS-8493
 URL: https://issues.apache.org/jira/browse/HDFS-8493
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Rakesh R
 Attachments: HDFS-8493-001.patch, HDFS-8493-002.patch, 
 HDFS-8493-003.patch, HDFS-8493-004.patch, HDFS-8493-005.patch, 
 HDFS-8493-006.patch, HDFS-8493-007.patch, HDFS-8493-007.patch


 This jira proposes to consolidate truncate() related methods into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8644) OzoneHandler : Add volume handler

2015-06-22 Thread Anu Engineer (JIRA)
Anu Engineer created HDFS-8644:
--

 Summary: OzoneHandler : Add volume handler
 Key: HDFS-8644
 URL: https://issues.apache.org/jira/browse/HDFS-8644
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Anu Engineer


Add volume handler logic that dispatches volume related calls to the right 
interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7214) Display the time when NN became active on the webUI

2015-06-22 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-7214:
--
Attachment: HDFS-7214.v3.patch

 Display the time when NN became active on the webUI
 ---

 Key: HDFS-7214
 URL: https://issues.apache.org/jira/browse/HDFS-7214
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, 
 HDFS-7214.v3.patch


 The currently NN webUI displayed JVM start up. It will be useful to show when 
 NN became active. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.

2015-06-22 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596712#comment-14596712
 ] 

Haohui Mai commented on HDFS-8617:
--

bq. You can read my related SoCC paper here: 
http://umbrant.com/papers/socc12-cake.pdf . I experimented with ioprio about 3 
years ago as part of this work, and didn't get positive results. We needed 
application-level throttling.

As you mentioned in the evaluation, there are adverse effects on throughputs.

I agree that application-level throttling can be useful. The proposed solution, 
however, relies on magic numbers to work. My concern is that how to choose the 
magic numbers? Is it repeatable to achieve good performance? Is it 
generalizable to other configuration? It looks to me that currently the answers 
of both questions are no. The proposed solution looks like lowering the 
utilization of the cluster (at the cost of making {{checkDir()}} really slow) 
to meet the SLOs.

bq. The key issue though, as both Colin and I have mentioned, is that there is 
queuing both in the OS and on disk. ioprio only affects OS-level queuing, and 
disk-level queuing can be quite substantial. Not sure how much more needs to be 
said.

Point taken. Unfortunately without performance benchmarks and numbers the 
statements are purely speculative. For example, what do you mean by 
substantial? The size of the NCQ is 32 compared the size of OS level I/O queue 
can be hundreds or thousands. I really appreciate doing some performance 
benchmarks and sharing the numbers.

My concern of the proposal is that the parameter cannot be automatically 
tunable w.r.t. cluster configurations and loads. It has to be dynamic. In the 
longer term it makes a lot sense to tune these parameters based on the length 
of the I/O queue, avg. processing time, etc. At the first step I think it can 
be very helpful to simply correlate these parameters with simple metrics like 
the number of tranceiver threads.


 Throttle DiskChecker#checkDirs() speed.
 ---

 Key: HDFS-8617
 URL: https://issues.apache.org/jira/browse/HDFS-8617
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-8617.000.patch


 As described in HDFS-8564,  {{DiskChecker.checkDirs(finalizedDir)}} is 
 causing excessive I/Os because {{finalizedDirs}} might have up to 64K 
 sub-directories (HDFS-6482).
 This patch proposes to limit the rate of IO operations in 
 {{DiskChecker.checkDirs()}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them

2015-06-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8480:
---
   Resolution: Fixed
Fix Version/s: 2.7.1
   Status: Resolved  (was: Patch Available)

 Fix performance and timeout issues in HDFS-7929 by using hard-links to 
 preserve old edit logs instead of copying them
 -

 Key: HDFS-8480
 URL: https://issues.apache.org/jira/browse/HDFS-8480
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, 
 HDFS-8480.02.patch, HDFS-8480.03.patch


 HDFS-7929 copies existing edit logs to the storage directory of the upgraded 
 {{NameNode}}. This slows down the upgrade process. This JIRA aims to use 
 hard-linking instead of per-op copying to achieve the same goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596652#comment-14596652
 ] 

Hadoop QA commented on HDFS-7214:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m  4s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 10s | The applied patch generated  2 
new checkstyle issues (total was 183, now 185). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 18s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 160m  3s | Tests passed in hadoop-hdfs. 
|
| | | 206m 29s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741081/HDFS-7214.v3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11436/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11436/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11436/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11436/console |


This message was automatically generated.

 Display the time when NN became active on the webUI
 ---

 Key: HDFS-7214
 URL: https://issues.apache.org/jira/browse/HDFS-7214
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, 
 HDFS-7214.v3.patch, HDFS-7214.v4.patch


 The currently NN webUI displayed JVM start up. It will be useful to show when 
 NN became active. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8608) Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in UnderReplicatedBlocks and PendingReplicationBlocks)

2015-06-22 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596649#comment-14596649
 ] 

Andrew Wang commented on HDFS-8608:
---

+1 LGTM. I'm going to commit this to branch-2.

 Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in 
 UnderReplicatedBlocks and PendingReplicationBlocks)
 --

 Key: HDFS-8608
 URL: https://issues.apache.org/jira/browse/HDFS-8608
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 3.0.0

 Attachments: HDFS-4366-branch-2.00.patch, 
 HDFS-4366-branch-2.01.patch, HDFS-8608.00.patch, HDFS-8608.01.patch, 
 HDFS-8608.02.patch


 This JIRA aims to merges HDFS-7912 into trunk to minimize final patch when 
 merging the HDFS-7285 (erasure coding) branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4366) Block Replication Policy Implementation May Skip Higher-Priority Blocks for Lower-Priority Blocks

2015-06-22 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4366:
--
Fix Version/s: (was: 3.0.0)
   2.8.0

[~zhz] did up a branch-2 patch which I backported, changing the fix version to 
reflect this.

 Block Replication Policy Implementation May Skip Higher-Priority Blocks for 
 Lower-Priority Blocks
 -

 Key: HDFS-4366
 URL: https://issues.apache.org/jira/browse/HDFS-4366
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Derek Dagit
Assignee: Derek Dagit
 Fix For: 2.8.0

 Attachments: HDFS-4366-branch-2.patch, HDFS-4366.patch, 
 HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, 
 HDFS-4366.patch, hdfs-4366-unittest.patch


 In certain cases, higher-priority under-replicated blocks can be skipped by 
 the replication policy implementation.  The current implementation maintains, 
 for each priority level, an index into a list of blocks that are 
 under-replicated.  Together, the lists compose a priority queue (see note 
 later about branch-0.23).  In some cases when blocks are removed from a list, 
 the caller (BlockManager) properly handles the index into the list from which 
 it removed a block.  In some other cases, the index remains stationary while 
 the list changes.  Whenever this happens, and the removed block happened to 
 be at or before the index, the implementation will skip over a block when 
 selecting blocks for replication work.
 In situations when entire racks are decommissioned, leading to many 
 under-replicated blocks, loss of blocks can occur.
 Background: HDFS-1765
 This patch to trunk greatly improved the state of the replication policy 
 implementation.  Prior to the patch, the following details were true:
   * The block priority queue was no such thing: It was really set of 
 trees that held blocks in natural ordering, that being by the blocks ID, 
 which resulted in iterator walks over the blocks in pseudo-random order.
   * There was only a single index into an iteration over all of the 
 blocks...
   * ... meaning the implementation was only successful in respecting 
 priority levels on the first pass.  Overall, the behavior was a 
 round-robin-type scheduling of blocks.
 After the patch
   * A proper priority queue is implemented, preserving log n operations 
 while iterating over blocks in the order added.
   * A separate index for each priority is key is kept...
   * ... allowing for processing of the highest priority blocks first 
 regardless of which priority had last been processed.
 The change was suggested for branch-0.23 as well as trunk, but it does not 
 appear to have been pulled in.
 The problem:
 Although the indices are now tracked in a better way, there is a 
 synchronization issue since the indices are managed outside of methods to 
 modify the contents of the queue.
 Removal of a block from a priority level without adjusting the index can mean 
 that the index then points to the block after the block it originally pointed 
 to.  In the next round of scheduling for that priority level, the block 
 originally pointed to by the index is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596698#comment-14596698
 ] 

Hadoop QA commented on HDFS-8542:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741136/HDFS-8542-branch-2.7.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7b424f9 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11439/console |


This message was automatically generated.

 WebHDFS getHomeDirectory behavior does not match specification
 --

 Key: HDFS-8542
 URL: https://issues.apache.org/jira/browse/HDFS-8542
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Jakob Homan
Assignee: kanaka kumar avvaru
 Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, 
 HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch


 Per the 
 [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory],
  WebHDFS provides a REST endpoint for getting the user's home directory:
 {noformat}Submit a HTTP GET request.
 curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat}
 However, WebHDFSFileSystem.java does not use this, instead building the home 
 [directory 
 locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]:
 {code}  /** @return the home directory. */
   public static String getHomeDirectoryString(final UserGroupInformation ugi) 
 {
 return /user/ + ugi.getShortUserName();
   }
   @Override
   public Path getHomeDirectory() {
 return makeQualified(new Path(getHomeDirectoryString(ugi)));
   }{code}
 The WebHDFSFileSystem client should call to the REST service to determine the 
 home directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification

2015-06-22 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-8542:
--
Status: Patch Available  (was: Open)

 WebHDFS getHomeDirectory behavior does not match specification
 --

 Key: HDFS-8542
 URL: https://issues.apache.org/jira/browse/HDFS-8542
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Jakob Homan
Assignee: kanaka kumar avvaru
 Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, 
 HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch


 Per the 
 [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory],
  WebHDFS provides a REST endpoint for getting the user's home directory:
 {noformat}Submit a HTTP GET request.
 curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat}
 However, WebHDFSFileSystem.java does not use this, instead building the home 
 [directory 
 locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]:
 {code}  /** @return the home directory. */
   public static String getHomeDirectoryString(final UserGroupInformation ugi) 
 {
 return /user/ + ugi.getShortUserName();
   }
   @Override
   public Path getHomeDirectory() {
 return makeQualified(new Path(getHomeDirectoryString(ugi)));
   }{code}
 The WebHDFSFileSystem client should call to the REST service to determine the 
 home directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI

2015-06-22 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596608#comment-14596608
 ] 

Siqi Li commented on HDFS-7214:
---

It looks like HDFS-7257 has already checked in a similar feature as of this 
jira. I feel like we could resolve this one by marking it as a duplicate

 Display the time when NN became active on the webUI
 ---

 Key: HDFS-7214
 URL: https://issues.apache.org/jira/browse/HDFS-7214
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, 
 HDFS-7214.v3.patch, HDFS-7214.v4.patch


 The currently NN webUI displayed JVM start up. It will be useful to show when 
 NN became active. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them

2015-06-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8480:
---
Summary: Fix performance and timeout issues in HDFS-7929 by using 
hard-links to preserve old edit logs instead of copying them  (was: Fix 
performance and timeout issues in HDFS-7929: use hard-links instead of copying 
edit logs)

 Fix performance and timeout issues in HDFS-7929 by using hard-links to 
 preserve old edit logs instead of copying them
 -

 Key: HDFS-8480
 URL: https://issues.apache.org/jira/browse/HDFS-8480
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Critical
 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, 
 HDFS-8480.02.patch, HDFS-8480.03.patch


 HDFS-7929 copies existing edit logs to the storage directory of the upgraded 
 {{NameNode}}. This slows down the upgrade process. This JIRA aims to use 
 hard-linking instead of per-op copying to achieve the same goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7390) Provide JMX metrics per storage type

2015-06-22 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-7390:
---
Attachment: HDFS-7390-005.patch

 Provide JMX metrics per storage type
 

 Key: HDFS-7390
 URL: https://issues.apache.org/jira/browse/HDFS-7390
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.5.2
Reporter: Benoy Antony
Assignee: Benoy Antony
  Labels: BB2015-05-TBR
 Attachments: HDFS-7390-003.patch, HDFS-7390-004.patch, 
 HDFS-7390-005.patch, HDFS-7390-005.patch, HDFS-7390.patch, HDFS-7390.patch


 HDFS-2832  added heterogeneous support. In a cluster with different storage 
 types, it is useful to have metrics per storage type. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification

2015-06-22 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-8542:
--
Status: Open  (was: Patch Available)

 WebHDFS getHomeDirectory behavior does not match specification
 --

 Key: HDFS-8542
 URL: https://issues.apache.org/jira/browse/HDFS-8542
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Jakob Homan
Assignee: kanaka kumar avvaru
 Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, 
 HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch


 Per the 
 [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory],
  WebHDFS provides a REST endpoint for getting the user's home directory:
 {noformat}Submit a HTTP GET request.
 curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat}
 However, WebHDFSFileSystem.java does not use this, instead building the home 
 [directory 
 locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]:
 {code}  /** @return the home directory. */
   public static String getHomeDirectoryString(final UserGroupInformation ugi) 
 {
 return /user/ + ugi.getShortUserName();
   }
   @Override
   public Path getHomeDirectory() {
 return makeQualified(new Path(getHomeDirectoryString(ugi)));
   }{code}
 The WebHDFSFileSystem client should call to the REST service to determine the 
 home directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.

2015-06-22 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596627#comment-14596627
 ] 

Andrew Wang commented on HDFS-8617:
---

You can read my related SoCC paper here: 
http://umbrant.com/papers/socc12-cake.pdf . I experimented with ioprio about 3 
years ago as part of this work, and didn't get positive results. We needed 
application-level throttling.

The key issue though, as both Colin and I have mentioned, is that there is 
queuing both in the OS and on disk. ioprio only affects OS-level queuing, and 
disk-level queuing can be quite substantial. Not sure how much more needs to be 
said.

Also as Colin (and I) mentioned, deadline and noop IO schedulers are often used 
for latency sensitive workloads like HBase, and ioprio only works with CFQ. 
Thus ioprio is not going to work in this situation.

 Throttle DiskChecker#checkDirs() speed.
 ---

 Key: HDFS-8617
 URL: https://issues.apache.org/jira/browse/HDFS-8617
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-8617.000.patch


 As described in HDFS-8564,  {{DiskChecker.checkDirs(finalizedDir)}} is 
 causing excessive I/Os because {{finalizedDirs}} might have up to 64K 
 sub-directories (HDFS-6482).
 This patch proposes to limit the rate of IO operations in 
 {{DiskChecker.checkDirs()}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8608) Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in UnderReplicatedBlocks and PendingReplicationBlocks)

2015-06-22 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-8608:
--
   Resolution: Fixed
Fix Version/s: (was: 3.0.0)
   2.8.0
   Status: Resolved  (was: Patch Available)

Thanks again Zhe, I committed both HDFS-4366 and HDFS-8608 to branch-2.

The HDFS-8608 backport was a little unclean though, and I noticed we still have 
some changes between branch-2 and trunk in TestReplicationPolicy we should 
probably resolve. Mind taking this on too as a follow-on?

 Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in 
 UnderReplicatedBlocks and PendingReplicationBlocks)
 --

 Key: HDFS-8608
 URL: https://issues.apache.org/jira/browse/HDFS-8608
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.8.0

 Attachments: HDFS-4366-branch-2.00.patch, 
 HDFS-4366-branch-2.01.patch, HDFS-8608.00.patch, HDFS-8608.01.patch, 
 HDFS-8608.02.patch


 This JIRA aims to merges HDFS-7912 into trunk to minimize final patch when 
 merging the HDFS-7285 (erasure coding) branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929: use hard-links instead of copying edit logs

2015-06-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596671#comment-14596671
 ] 

Colin Patrick McCabe commented on HDFS-8480:


Thanks,  [~zhz].  +1.

 Fix performance and timeout issues in HDFS-7929: use hard-links instead of 
 copying edit logs
 

 Key: HDFS-8480
 URL: https://issues.apache.org/jira/browse/HDFS-8480
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Critical
 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, 
 HDFS-8480.02.patch, HDFS-8480.03.patch


 HDFS-7929 copies existing edit logs to the storage directory of the upgraded 
 {{NameNode}}. This slows down the upgrade process. This JIRA aims to use 
 hard-linking instead of per-op copying to achieve the same goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification

2015-06-22 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-8542:
--
Attachment: HDFS-8542-branch-2.7.002.patch

I'm still not wild about caching the result since again, (a) the value is never 
discarded, so it's not a cache and (b) backing systems could choose to change 
this value on a subsequent call.  However, both FileSystem and 
DistributedFileSystem are doing some questionable things with this API, so I'll 
worry about those issues later, if we run into them.

+1 on current patch.  Failed tests are spurious.  Attaching a version for 2.7 
(same except location of JsonUtils).  Will commit both after Jenkins has a pass 
over backport.

 WebHDFS getHomeDirectory behavior does not match specification
 --

 Key: HDFS-8542
 URL: https://issues.apache.org/jira/browse/HDFS-8542
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Jakob Homan
Assignee: kanaka kumar avvaru
 Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, 
 HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch


 Per the 
 [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory],
  WebHDFS provides a REST endpoint for getting the user's home directory:
 {noformat}Submit a HTTP GET request.
 curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat}
 However, WebHDFSFileSystem.java does not use this, instead building the home 
 [directory 
 locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]:
 {code}  /** @return the home directory. */
   public static String getHomeDirectoryString(final UserGroupInformation ugi) 
 {
 return /user/ + ugi.getShortUserName();
   }
   @Override
   public Path getHomeDirectory() {
 return makeQualified(new Path(getHomeDirectoryString(ugi)));
   }{code}
 The WebHDFSFileSystem client should call to the REST service to determine the 
 home directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7390) Provide JMX metrics per storage type

2015-06-22 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated HDFS-7390:
---
Attachment: (was: HDFS-7390-005.patch)

 Provide JMX metrics per storage type
 

 Key: HDFS-7390
 URL: https://issues.apache.org/jira/browse/HDFS-7390
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.5.2
Reporter: Benoy Antony
Assignee: Benoy Antony
  Labels: BB2015-05-TBR
 Attachments: HDFS-7390-003.patch, HDFS-7390-004.patch, 
 HDFS-7390-005.patch, HDFS-7390.patch, HDFS-7390.patch


 HDFS-2832  added heterogeneous support. In a cluster with different storage 
 types, it is useful to have metrics per storage type. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3620) WebHdfsFileSystem getHomeDirectory() should not resolve locally

2015-06-22 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved HDFS-3620.
---
Resolution: Duplicate

This issue was duplicated and dealt with in HDFS-8542.

 WebHdfsFileSystem getHomeDirectory() should not resolve locally
 ---

 Key: HDFS-3620
 URL: https://issues.apache.org/jira/browse/HDFS-3620
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 1.0.3, 2.0.0-alpha
Reporter: Alejandro Abdelnur
Priority: Critical

 WebHdfsFileSystem getHomeDirectory() method it is hardcoded to return 
 '/user/' + UGI#shortname. Instead, it should make a HTTP REST call with 
 op=GETHOMEDIRECTORY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7929) inotify unable fetch pre-upgrade edit log segments once upgrade starts

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596702#comment-14596702
 ] 

Hudson commented on HDFS-7929:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8046 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8046/])
HDFS-8480. Fix performance and timeout issues in HDFS-7929 by using hard-links 
to preserve old edit logs, instead of copying them. (Zhe Zhang via Colin P. 
McCabe) (cmccabe: rev 7b424f938c3c306795d574792b086d84e4f06425)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNUpgradeUtil.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java


 inotify unable fetch pre-upgrade edit log segments once upgrade starts
 --

 Key: HDFS-7929
 URL: https://issues.apache.org/jira/browse/HDFS-7929
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.7.0

 Attachments: HDFS-7929-000.patch, HDFS-7929-001.patch, 
 HDFS-7929-002.patch, HDFS-7929-003.patch


 inotify is often used to periodically poll HDFS events. However, once an HDFS 
 upgrade has started, edit logs are moved to /previous on the NN, which is not 
 accessible. Moreover, once the upgrade is finalized /previous is currently 
 lost forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596703#comment-14596703
 ] 

Hudson commented on HDFS-8480:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8046 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8046/])
HDFS-8480. Fix performance and timeout issues in HDFS-7929 by using hard-links 
to preserve old edit logs, instead of copying them. (Zhe Zhang via Colin P. 
McCabe) (cmccabe: rev 7b424f938c3c306795d574792b086d84e4f06425)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNUpgradeUtil.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java


 Fix performance and timeout issues in HDFS-7929 by using hard-links to 
 preserve old edit logs instead of copying them
 -

 Key: HDFS-8480
 URL: https://issues.apache.org/jira/browse/HDFS-8480
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Critical
 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, 
 HDFS-8480.02.patch, HDFS-8480.03.patch


 HDFS-7929 copies existing edit logs to the storage directory of the upgraded 
 {{NameNode}}. This slows down the upgrade process. This JIRA aims to use 
 hard-linking instead of per-op copying to achieve the same goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.

2015-06-22 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596748#comment-14596748
 ] 

Andrew Wang commented on HDFS-8617:
---

bq. As you mentioned in the evaluation, there are adverse effects on 
throughputs...The proposed solution looks like lowering the utilization of the 
cluster (at the cost of making checkDir() really slow) to meet the SLOs.

I'd like to turn this question around and ask: is there a downside to 
throttling checkDisk throughput? We might end up taking longer to detect a bad 
disk, but this is not a performance-critical workload.

Here's also another idea for a throttle: spend at most x% of time doing 
checkDisk work. Maybe we say it can only run for 250ms of every 1000ms 
interval. Timeslicing like this automatically tunes for faster vs. slower IO 
rates.

 Throttle DiskChecker#checkDirs() speed.
 ---

 Key: HDFS-8617
 URL: https://issues.apache.org/jira/browse/HDFS-8617
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-8617.000.patch


 As described in HDFS-8564,  {{DiskChecker.checkDirs(finalizedDir)}} is 
 causing excessive I/Os because {{finalizedDirs}} might have up to 64K 
 sub-directories (HDFS-6482).
 This patch proposes to limit the rate of IO operations in 
 {{DiskChecker.checkDirs()}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times

2015-06-22 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7645:
--
Hadoop Flags: Incompatible change,Reviewed  (was: Reviewed)

This change is incompatible since we expose RollingUpgradeInfo in the NN's JMX 
(a public API). As discussed above, rather than being null on finalization, it 
now sets the finalization time.

Have we thought about other ways of solving this issue? Else we can change the 
JMX method to still return null on finalization.

 Rolling upgrade is restoring blocks from trash multiple times
 -

 Key: HDFS-7645
 URL: https://issues.apache.org/jira/browse/HDFS-7645
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Nathan Roberts
Assignee: Keisuke Ogiwara
 Fix For: 2.8.0

 Attachments: HDFS-7645.01.patch, HDFS-7645.02.patch, 
 HDFS-7645.03.patch, HDFS-7645.04.patch, HDFS-7645.05.patch, 
 HDFS-7645.06.patch, HDFS-7645.07.patch


 When performing an HDFS rolling upgrade, the trash directory is getting 
 restored twice when under normal circumstances it shouldn't need to be 
 restored at all. iiuc, the only time these blocks should be restored is if we 
 need to rollback a rolling upgrade. 
 On a busy cluster, this can cause significant and unnecessary block churn 
 both on the datanodes, and more importantly in the namenode.
 The two times this happens are:
 1) restart of DN onto new software
 {code}
   private void doTransition(DataNode datanode, StorageDirectory sd,
   NamespaceInfo nsInfo, StartupOption startOpt) throws IOException {
 if (startOpt == StartupOption.ROLLBACK  sd.getPreviousDir().exists()) {
   Preconditions.checkState(!getTrashRootDir(sd).exists(),
   sd.getPreviousDir() +  and  + getTrashRootDir(sd) +  should not 
  +
both be present.);
   doRollback(sd, nsInfo); // rollback if applicable
 } else {
   // Restore all the files in the trash. The restored files are retained
   // during rolling upgrade rollback. They are deleted during rolling
   // upgrade downgrade.
   int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd));
   LOG.info(Restored  + restored +  block files from trash.);
 }
 {code}
 2) When heartbeat response no longer indicates a rollingupgrade is in progress
 {code}
   /**
* Signal the current rolling upgrade status as indicated by the NN.
* @param inProgress true if a rolling upgrade is in progress
*/
   void signalRollingUpgrade(boolean inProgress) throws IOException {
 String bpid = getBlockPoolId();
 if (inProgress) {
   dn.getFSDataset().enableTrash(bpid);
   dn.getFSDataset().setRollingUpgradeMarker(bpid);
 } else {
   dn.getFSDataset().restoreTrash(bpid);
   dn.getFSDataset().clearRollingUpgradeMarker(bpid);
 }
   }
 {code}
 HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely 
 clear whether this is somehow intentional. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification

2015-06-22 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-8542:
--
   Resolution: Fixed
Fix Version/s: 2.8.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Jenkins isn't running against minor versions.  I've committed this to trunk and 
branch-2.  Thanks, Kanaka.  Resolving.

 WebHDFS getHomeDirectory behavior does not match specification
 --

 Key: HDFS-8542
 URL: https://issues.apache.org/jira/browse/HDFS-8542
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Jakob Homan
Assignee: kanaka kumar avvaru
 Fix For: 2.8.0

 Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, 
 HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch


 Per the 
 [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory],
  WebHDFS provides a REST endpoint for getting the user's home directory:
 {noformat}Submit a HTTP GET request.
 curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat}
 However, WebHDFSFileSystem.java does not use this, instead building the home 
 [directory 
 locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]:
 {code}  /** @return the home directory. */
   public static String getHomeDirectoryString(final UserGroupInformation ugi) 
 {
 return /user/ + ugi.getShortUserName();
   }
   @Override
   public Path getHomeDirectory() {
 return makeQualified(new Path(getHomeDirectoryString(ugi)));
   }{code}
 The WebHDFSFileSystem client should call to the REST service to determine the 
 home directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596869#comment-14596869
 ] 

Hudson commented on HDFS-8542:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8049 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8049/])
HDFS-8542. WebHDFS getHomeDirectory behavior does not match specification. 
Contributed by  Kanaka Kumar Avvaru. (jghoman: rev 
fac4e04dd359a7ff31f286d664fb06f019ec0b58)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/JsonUtilClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHDFS.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 WebHDFS getHomeDirectory behavior does not match specification
 --

 Key: HDFS-8542
 URL: https://issues.apache.org/jira/browse/HDFS-8542
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Jakob Homan
Assignee: kanaka kumar avvaru
 Fix For: 2.8.0

 Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, 
 HDFS-8542-02.patch, HDFS-8542-branch-2.7.002.patch


 Per the 
 [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory],
  WebHDFS provides a REST endpoint for getting the user's home directory:
 {noformat}Submit a HTTP GET request.
 curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat}
 However, WebHDFSFileSystem.java does not use this, instead building the home 
 [directory 
 locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]:
 {code}  /** @return the home directory. */
   public static String getHomeDirectoryString(final UserGroupInformation ugi) 
 {
 return /user/ + ugi.getShortUserName();
   }
   @Override
   public Path getHomeDirectory() {
 return makeQualified(new Path(getHomeDirectoryString(ugi)));
   }{code}
 The WebHDFSFileSystem client should call to the REST service to determine the 
 home directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-06-22 Thread Raju Bairishetti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raju Bairishetti updated HDFS-8578:
---
Fix Version/s: 2.7.1

 On upgrade, Datanode should process all storage/data dirs in parallel
 -

 Key: HDFS-8578
 URL: https://issues.apache.org/jira/browse/HDFS-8578
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Raju Bairishetti
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch


 Right now, during upgrades datanode is processing all the storage dirs 
 sequentially. Assume it takes ~20 mins to process a single storage dir then  
 datanode which has ~10 disks will take around 3hours to come up.
 *BlockPoolSliceStorage.java*
 {code}
for (int idx = 0; idx  getNumStorageDirs(); idx++) {
   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
   assert getCTime() == nsInfo.getCTime() 
   : Data-node and name-node CTimes must be the same.;
 }
 {code}
 It would save lots of time during major upgrades if datanode process all 
 storagedirs/disks parallelly.
 Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8645) Resolve inconsistent code in TestReplicationPolicy between trunk and branch-2

2015-06-22 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-8645:
---

 Summary: Resolve inconsistent code in TestReplicationPolicy 
between trunk and branch-2
 Key: HDFS-8645
 URL: https://issues.apache.org/jira/browse/HDFS-8645
 Project: Hadoop HDFS
  Issue Type: Test
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang


Per [discussion | 
https://issues.apache.org/jira/browse/HDFS-8608?focusedCommentId=14596665page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14596665]
 under HDFS-8608.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks

2015-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596846#comment-14596846
 ] 

Zhe Zhang commented on HDFS-8619:
-

A quick comment is maybe we should consider targeting this for trunk? I haven't 
finished reviewing the entire patch, and I see the following changes besides 
the main change mentioned above:
# A new {{hasNoDataNodes}} logic.
# A {{Block-BlockInfo}} refactor for {{postponedMisreplicatedBlocks}}.
# Refactor of {{invalidateBlock}} to take counted nodes as input instead of 
counting again.
# General code cleanups
All changes LGTM overall, and all look applicable against trunk (except for the 
tests).



 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks

2015-06-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596905#comment-14596905
 ] 

Jing Zhao commented on HDFS-8619:
-

Thanks for the review, Zhe! Sure, I will separate some refactoring out for 
trunk.

 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks

2015-06-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596967#comment-14596967
 ] 

Jing Zhao commented on HDFS-8619:
-

Besides, we now have merged quite a few changes to trunk, any plan for merging 
trunk changes to the HDFS-7285 feature branch?

 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks

2015-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596932#comment-14596932
 ] 

Zhe Zhang commented on HDFS-8619:
-

Thanks Jing! I meant to say all changes, including the main 
{{CorruptReplicasMap}} change LGTM overall and look applicable for trunk. 
Should we just retarget the JIRA to trunk?

 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks

2015-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596933#comment-14596933
 ] 

Zhe Zhang commented on HDFS-8619:
-

Thanks Jing! I meant to say all changes, including the main 
{{CorruptReplicasMap}} change LGTM overall and look applicable for trunk. 
Should we just retarget the JIRA to trunk?

 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8186) Erasure coding: Make block placement policy for EC file configurable

2015-06-22 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596973#comment-14596973
 ] 

Walter Su commented on HDFS-8186:
-

comparison of HDFS-7068 and HDFS-8186 is 
[here|https://issues.apache.org/jira/browse/HDFS-7068?focusedCommentId=14596964page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14596964]

 Erasure coding: Make block placement policy for EC file configurable
 

 Key: HDFS-8186
 URL: https://issues.apache.org/jira/browse/HDFS-8186
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Walter Su
Assignee: Walter Su
 Fix For: HDFS-7285

 Attachments: HDFS-8186-HDFS-7285.002.txt, 
 HDFS-8186-HDFS-7285.003.patch, HDFS-8186.001.txt


 This includes:
 1. User can config block placement policy for EC file in xml configuration 
 file.
 2. EC policy works for EC file, replication policy works for non-EC file. 
 They are coexistent.
 Not includes:
 1. Details of block placement policy for EC. Discussion and implementation 
 goes to HDFS-7613.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596822#comment-14596822
 ] 

Zhe Zhang commented on HDFS-7068:
-

[~walter.k.su] Since we are preparing to merge the HDFS-7285 branch to trunk, 
we should probably revisit this JIRA. I suggest we split the HDFS-8186 patch 
and separate the multi-policy part out for this JIRA 
({{BlockPlacementPolicies}} etc.). That part needs to be reviewed against trunk 
anyway as part of the merge. And logically it is orthogonal to EC logic. 
Separating it out will reduce the consolidated EC patch and make merge-review 
easier.

 Support multiple block placement policies
 -

 Key: HDFS-7068
 URL: https://issues.apache.org/jira/browse/HDFS-7068
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Zesheng Wu
Assignee: Walter Su

 According to the code, the current implement of HDFS only supports one 
 specific type of block placement policy, which is BlockPlacementPolicyDefault 
 by default.
 The default policy is enough for most of the circumstances, but under some 
 special circumstances, it works not so well.
 For example, on a shared cluster, we want to erasure encode all the files 
 under some specified directories. So the files under these directories need 
 to use a new placement policy.
 But at the same time, other files still use the default placement policy. 
 Here we need to support multiple placement policies for the HDFS.
 One plain thought is that, the default placement policy is still configured 
 as the default. On the other hand, HDFS can let user specify customized 
 placement policy through the extended attributes(xattr). When the HDFS choose 
 the replica targets, it firstly check the customized placement policy, if not 
 specified, it fallbacks to the default one. 
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2015-06-22 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596853#comment-14596853
 ] 

Arpit Agarwal commented on HDFS-8277:
-

Hi [~surendrasingh], while putting the safe mode status in the NN persistent 
state is the right solution I agree with [~vinayrpet] that it would be an 
incompatible change for 2.x. 

If we cannot make the change for 2.x I prefer not changing the current behavior 
of failing 'safemode enter' when SBN is down.

 Safemode enter fails when Standby NameNode is down
 --

 Key: HDFS-8277
 URL: https://issues.apache.org/jira/browse/HDFS-8277
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, HDFS, namenode
Affects Versions: 2.6.0
 Environment: HDP 2.2.0
Reporter: Hari Sekhon
Assignee: Surendra Singh Lilhore
Priority: Minor
 Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, 
 HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch


 HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
 AMBARI-10536).
 {code}hdfs dfsadmin -safemode enter
 safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused{code}
 This appears to be a bug in that it's not trying both NameNodes like the 
 standard hdfs client code does, and is instead stopping after getting a 
 connection refused from nn1 which is down. I verified normal hadoop fs writes 
 and reads via cli did work at this time, using nn2. I happened to run this 
 command as the hdfs user on nn2 which was the surviving Active NameNode.
 After I re-bootstrapped the Standby NN to fix it the command worked as 
 expected again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8634) OzoneHandler: Add userAuth Interface and Simple userAuth handler

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596856#comment-14596856
 ] 

Hadoop QA commented on HDFS-8634:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 15s | Pre-patch HDFS-7240 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 20s | The applied patch generated  2 
new checkstyle issues (total was 1, now 3). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 19s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 161m 42s | Tests failed in hadoop-hdfs. |
| | | 209m  9s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.namenode.TestNameEditsConfigs |
|   | hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741113/hdfs-8634-HDFS-7240.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7240 / 1e75142 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11438/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11438/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11438/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11438/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11438/console |


This message was automatically generated.

 OzoneHandler: Add userAuth Interface and Simple userAuth handler
 

 Key: HDFS-8634
 URL: https://issues.apache.org/jira/browse/HDFS-8634
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Anu Engineer
Assignee: Anu Engineer
 Attachments: hdfs-8634-HDFS-7240.001.patch


 Add user authentication interface and also the first concrete implementation 
 for that interface called simple. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2015-06-22 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596853#comment-14596853
 ] 

Arpit Agarwal edited comment on HDFS-8277 at 6/22/15 11:39 PM:
---

Hi [~surendrasingh], while putting the safe mode status in the NN persistent 
state is the right solution I agree with [~vinayrpet] that it would be an 
incompatible change for 2.x. 

If we cannot make the change for 2.x I prefer not changing the current behavior 
of failing 'safemode enter' when SBN is down.

[~vinayrpet] - what do you think?


was (Author: arpitagarwal):
Hi [~surendrasingh], while putting the safe mode status in the NN persistent 
state is the right solution I agree with [~vinayrpet] that it would be an 
incompatible change for 2.x. 

If we cannot make the change for 2.x I prefer not changing the current behavior 
of failing 'safemode enter' when SBN is down.

 Safemode enter fails when Standby NameNode is down
 --

 Key: HDFS-8277
 URL: https://issues.apache.org/jira/browse/HDFS-8277
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, HDFS, namenode
Affects Versions: 2.6.0
 Environment: HDP 2.2.0
Reporter: Hari Sekhon
Assignee: Surendra Singh Lilhore
Priority: Minor
 Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, 
 HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch


 HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
 AMBARI-10536).
 {code}hdfs dfsadmin -safemode enter
 safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused{code}
 This appears to be a bug in that it's not trying both NameNodes like the 
 standard hdfs client code does, and is instead stopping after getting a 
 connection refused from nn1 which is down. I verified normal hadoop fs writes 
 and reads via cli did work at this time, using nn2. I happened to run this 
 command as the hdfs user on nn2 which was the surviving Active NameNode.
 After I re-bootstrapped the Standby NN to fix it the command worked as 
 expected again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-06-22 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-8578:
---
Fix Version/s: (was: 2.7.1)

 On upgrade, Datanode should process all storage/data dirs in parallel
 -

 Key: HDFS-8578
 URL: https://issues.apache.org/jira/browse/HDFS-8578
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Raju Bairishetti
Priority: Critical
 Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch


 Right now, during upgrades datanode is processing all the storage dirs 
 sequentially. Assume it takes ~20 mins to process a single storage dir then  
 datanode which has ~10 disks will take around 3hours to come up.
 *BlockPoolSliceStorage.java*
 {code}
for (int idx = 0; idx  getNumStorageDirs(); idx++) {
   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
   assert getCTime() == nsInfo.getCTime() 
   : Data-node and name-node CTimes must be the same.;
 }
 {code}
 It would save lots of time during major upgrades if datanode process all 
 storagedirs/disks parallelly.
 Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7068) Support multiple block placement policies

2015-06-22 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596964#comment-14596964
 ] 

Walter Su commented on HDFS-7068:
-

comparison of HDFS-7068 and HDFS-8186({{BlockPlacementPolicies}}):
*strategy*
HDFS-7068: policy given by user.
HDFS-8186: policy determined from context(file status)
*extensibility*
HDFS-7068: better.
HDFS-8186: BlockPlacementPolicies accepts a {{boolean}} argument and returns a 
ec/non-ec policy. In the future, we can extends argument list.
*code complexity*
HDFS-7068: complicated
HDFS-8186: simple
*memory usage*
HDFS-7068: xattr or inode header
HDFS-8186: none

bq. I'm wondering if we could do it in lighter way. In my understanding, if the 
file is in replication mode as by default, then we'll go to the current block 
placement policy as it goes currently in trunk; otherwise, if stripping and/or 
ec is involved, then we have a new single customized placement policy to cover 
all the related cases.
Hi, [~drankye]! Thanks for your advice. HDFS-8186 did that.

bq. I'm also +1 for #1.
Hi, [~jingzhao]! I think we can revisit HDFS-7068 and #3 design? HDFS-8186 
works for EC branch. I'm not sure it's acceptable for trunk. I can ask 
everybody's opinion from mailing list.

 Support multiple block placement policies
 -

 Key: HDFS-7068
 URL: https://issues.apache.org/jira/browse/HDFS-7068
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.1
Reporter: Zesheng Wu
Assignee: Walter Su

 According to the code, the current implement of HDFS only supports one 
 specific type of block placement policy, which is BlockPlacementPolicyDefault 
 by default.
 The default policy is enough for most of the circumstances, but under some 
 special circumstances, it works not so well.
 For example, on a shared cluster, we want to erasure encode all the files 
 under some specified directories. So the files under these directories need 
 to use a new placement policy.
 But at the same time, other files still use the default placement policy. 
 Here we need to support multiple placement policies for the HDFS.
 One plain thought is that, the default placement policy is still configured 
 as the default. On the other hand, HDFS can let user specify customized 
 placement policy through the extended attributes(xattr). When the HDFS choose 
 the replica targets, it firstly check the customized placement policy, if not 
 specified, it fallbacks to the default one. 
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8480) Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them

2015-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596786#comment-14596786
 ] 

Zhe Zhang commented on HDFS-8480:
-

Thanks Colin for the review! And helpful comments from Vinod, Andrew, and Arpit.

 Fix performance and timeout issues in HDFS-7929 by using hard-links to 
 preserve old edit logs instead of copying them
 -

 Key: HDFS-8480
 URL: https://issues.apache.org/jira/browse/HDFS-8480
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-8480.00.patch, HDFS-8480.01.patch, 
 HDFS-8480.02.patch, HDFS-8480.03.patch


 HDFS-7929 copies existing edit logs to the storage directory of the upgraded 
 {{NameNode}}. This slows down the upgrade process. This JIRA aims to use 
 hard-linking instead of per-op copying to achieve the same goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks

2015-06-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8619:

Attachment: HDFS-8619.000.patch

Initial patch to fix the above issue.

 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7390) Provide JMX metrics per storage type

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596961#comment-14596961
 ] 

Hadoop QA commented on HDFS-7390:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 45s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 26s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 31s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 14s | The applied patch generated  3 
new checkstyle issues (total was 228, now 230). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 17s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 13s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 160m  1s | Tests passed in hadoop-hdfs. 
|
| | | 206m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741135/HDFS-7390-005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 11ac848 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11440/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11440/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11440/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11440/console |


This message was automatically generated.

 Provide JMX metrics per storage type
 

 Key: HDFS-7390
 URL: https://issues.apache.org/jira/browse/HDFS-7390
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.5.2
Reporter: Benoy Antony
Assignee: Benoy Antony
  Labels: BB2015-05-TBR
 Attachments: HDFS-7390-003.patch, HDFS-7390-004.patch, 
 HDFS-7390-005.patch, HDFS-7390.patch, HDFS-7390.patch


 HDFS-2832  added heterogeneous support. In a cluster with different storage 
 types, it is useful to have metrics per storage type. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596797#comment-14596797
 ] 

Hadoop QA commented on HDFS-7214:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 15s | The applied patch generated  2 
new checkstyle issues (total was 183, now 185). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 18s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 147m 27s | Tests failed in hadoop-hdfs. |
| | | 193m 43s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM |
|   | hadoop.hdfs.TestDFSInotifyEventInputStream |
|   | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics |
|   | hadoop.hdfs.TestDFSClientFailover |
|   | hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits |
|   | hadoop.hdfs.server.namenode.ha.TestFailureOfSharedDir |
|   | hadoop.hdfs.server.namenode.ha.TestQuotasWithHA |
|   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
|   | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencing |
|   | hadoop.hdfs.server.namenode.TestNamenodeRetryCache |
|   | hadoop.hdfs.server.namenode.ha.TestHAMetrics |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogsDuringFailover |
|   | hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA |
|   | hadoop.tools.TestJMXGet |
|   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
|   | hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement |
|   | hadoop.hdfs.server.datanode.TestBlockReplacement |
|   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
|   | hadoop.hdfs.server.namenode.ha.TestHAFsck |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages |
|   | hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler |
|   | hadoop.hdfs.tools.TestDFSHAAdminMiniCluster |
|   | hadoop.hdfs.server.namenode.ha.TestHAAppend |
|   | hadoop.hdfs.server.namenode.TestEditLogAutoroll |
|   | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
|   | hadoop.hdfs.server.namenode.ha.TestHASafeMode |
|   | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot |
|   | hadoop.hdfs.TestRollingUpgradeRollback |
|   | hadoop.hdfs.server.namenode.ha.TestHAStateTransitions |
|   | hadoop.hdfs.web.TestWebHDFSForHA |
|   | hadoop.hdfs.TestEncryptionZonesWithHA |
|   | hadoop.hdfs.server.namenode.ha.TestHarFileSystemWithHA |
|   | hadoop.hdfs.TestRollingUpgradeDowngrade |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741103/HDFS-7214.v4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11437/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11437/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11437/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux 

[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks

2015-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596799#comment-14596799
 ] 

Zhe Zhang commented on HDFS-8619:
-

Thanks Jing for the work! I think the analysis makes sense. I will review the 
patch later today.

 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8608) Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in UnderReplicatedBlocks and PendingReplicationBlocks)

2015-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596810#comment-14596810
 ] 

Zhe Zhang commented on HDFS-8608:
-

Thanks Jing and Andrew for reviewing the patch!

I filed HDFS-8645 to address the {{TestReplicationPolicy}} issue.

 Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead of Block in 
 UnderReplicatedBlocks and PendingReplicationBlocks)
 --

 Key: HDFS-8608
 URL: https://issues.apache.org/jira/browse/HDFS-8608
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: 2.8.0

 Attachments: HDFS-4366-branch-2.00.patch, 
 HDFS-4366-branch-2.01.patch, HDFS-8608.00.patch, HDFS-8608.01.patch, 
 HDFS-8608.02.patch


 This JIRA aims to merges HDFS-7912 into trunk to minimize final patch when 
 merging the HDFS-7285 (erasure coding) branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-06-22 Thread Raju Bairishetti (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596862#comment-14596862
 ] 

Raju Bairishetti commented on HDFS-8578:


[~vinayrpet] Have you done any performance benchmarking with this approach? If 
yes, Could you please post the results here?

 On upgrade, Datanode should process all storage/data dirs in parallel
 -

 Key: HDFS-8578
 URL: https://issues.apache.org/jira/browse/HDFS-8578
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Raju Bairishetti
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch


 Right now, during upgrades datanode is processing all the storage dirs 
 sequentially. Assume it takes ~20 mins to process a single storage dir then  
 datanode which has ~10 disks will take around 3hours to come up.
 *BlockPoolSliceStorage.java*
 {code}
for (int idx = 0; idx  getNumStorageDirs(); idx++) {
   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
   assert getCTime() == nsInfo.getCTime() 
   : Data-node and name-node CTimes must be the same.;
 }
 {code}
 It would save lots of time during major upgrades if datanode process all 
 storagedirs/disks parallelly.
 Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks

2015-06-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596959#comment-14596959
 ] 

Jing Zhao commented on HDFS-8619:
-

No. I guess we still need this jira for adding the striped block logic and the 
tests.

 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8642) Improve TestFileTruncate#setup by deleting the snapshots

2015-06-22 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8642:
--

 Summary: Improve TestFileTruncate#setup by deleting the snapshots
 Key: HDFS-8642
 URL: https://issues.apache.org/jira/browse/HDFS-8642
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor


I've observed {{TestFileTruncate#setup()}} function has to be improved by 
making it more independent. Presently if any of the snapshots related test 
failures will affect all the subsequent unit test cases. One such error has 
been observed in the 
[Hadoop-Hdfs-trunk-2163|https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart]

{code}
https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart/

org.apache.hadoop.ipc.RemoteException: The directory /test cannot be deleted 
since /test is snapshottable and already has snapshots
at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.checkSnapshot(FSDirSnapshotOp.java:226)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:54)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.deleteInternal(FSDirDeleteOp.java:177)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:104)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3046)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:939)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:608)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2172)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166)

at org.apache.hadoop.ipc.Client.call(Client.java:1440)
at org.apache.hadoop.ipc.Client.call(Client.java:1371)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy22.delete(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540)
at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at com.sun.proxy.$Proxy23.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1711)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:718)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
at 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.setup(TestFileTruncate.java:119)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8493) Consolidate truncate() related implementation in a single class

2015-06-22 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595735#comment-14595735
 ] 

Vinayakumar B commented on HDFS-8493:
-

bq. The resolution should be in the lock of FSDirectory.
IMO, I think this is okay, especially for write ops, provided fsn writelock is 
held. And I can see many places where this resolution is done under fsn lock 
held, but not fsd lock.

This triggered thoughts, Why two separate locks, fsdir lock and fsnamesystem 
locks.? Almost all ops are go through fsn with lock (read/write) held, and then 
go on-to get fsdir locks.

Any thoughts?

 Consolidate truncate() related implementation in a single class
 ---

 Key: HDFS-8493
 URL: https://issues.apache.org/jira/browse/HDFS-8493
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Rakesh R
 Attachments: HDFS-8493-001.patch, HDFS-8493-002.patch, 
 HDFS-8493-003.patch, HDFS-8493-004.patch, HDFS-8493-005.patch, 
 HDFS-8493-006.patch, HDFS-8493-007.patch, HDFS-8493-007.patch


 This jira proposes to consolidate truncate() related methods into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8642) Improve TestFileTruncate#setup by deleting the snapshots

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595795#comment-14595795
 ] 

Hadoop QA commented on HDFS-8642:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   8m 12s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 27s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 19s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 13s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   1m 19s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 159m 29s | Tests passed in hadoop-hdfs. 
|
| | | 184m 22s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12740975/HDFS-8642-00.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11432/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11432/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11432/console |


This message was automatically generated.

 Improve TestFileTruncate#setup by deleting the snapshots
 

 Key: HDFS-8642
 URL: https://issues.apache.org/jira/browse/HDFS-8642
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor
 Attachments: HDFS-8642-00.patch


 I've observed {{TestFileTruncate#setup()}} function has to be improved by 
 making it more independent. Presently if any of the snapshots related test 
 failures will affect all the subsequent unit test cases. One such error has 
 been observed in the 
 [Hadoop-Hdfs-trunk-2163|https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart]
 {code}
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart/
 org.apache.hadoop.ipc.RemoteException: The directory /test cannot be deleted 
 since /test is snapshottable and already has snapshots
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.checkSnapshot(FSDirSnapshotOp.java:226)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:54)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.deleteInternal(FSDirDeleteOp.java:177)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:104)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3046)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:939)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:608)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2172)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166)
   at 

[jira] [Updated] (HDFS-8642) Improve TestFileTruncate#setup by deleting the snapshots

2015-06-22 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8642:
---
Attachment: HDFS-8642-00.patch

 Improve TestFileTruncate#setup by deleting the snapshots
 

 Key: HDFS-8642
 URL: https://issues.apache.org/jira/browse/HDFS-8642
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor
 Attachments: HDFS-8642-00.patch


 I've observed {{TestFileTruncate#setup()}} function has to be improved by 
 making it more independent. Presently if any of the snapshots related test 
 failures will affect all the subsequent unit test cases. One such error has 
 been observed in the 
 [Hadoop-Hdfs-trunk-2163|https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart]
 {code}
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart/
 org.apache.hadoop.ipc.RemoteException: The directory /test cannot be deleted 
 since /test is snapshottable and already has snapshots
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.checkSnapshot(FSDirSnapshotOp.java:226)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:54)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.deleteInternal(FSDirDeleteOp.java:177)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:104)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3046)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:939)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:608)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2172)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166)
   at org.apache.hadoop.ipc.Client.call(Client.java:1440)
   at org.apache.hadoop.ipc.Client.call(Client.java:1371)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
   at com.sun.proxy.$Proxy22.delete(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540)
   at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at com.sun.proxy.$Proxy23.delete(Unknown Source)
   at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1711)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:718)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714)
   at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.setup(TestFileTruncate.java:119)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8643) Add snapshot names list to SnapshottableDirectoryStatus

2015-06-22 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8643:
--

 Summary: Add snapshot names list to SnapshottableDirectoryStatus
 Key: HDFS-8643
 URL: https://issues.apache.org/jira/browse/HDFS-8643
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Rakesh R
Assignee: Rakesh R


The idea of this jira to enhance {{SnapshottableDirectoryStatus}} by adding 
{{snapshotNames}} attribute into it, presently it has the {{snapshotNumber}}. 
IMHO this would help the users to get the list of snapshot names created. Also, 
the snapshot names can be used while renaming or deleting the snapshots.

{code}
org.apache.hadoop.hdfs.protocol.SnapshottableDirectoryStatus.java

  /**
   * @return Snapshot names for the directory.
   */
  public List String getSnapshotNames() {
return snapshotNames;
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8642) Improve TestFileTruncate#setup by deleting the snapshots

2015-06-22 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-8642:
---
Target Version/s: 2.8.0
  Status: Patch Available  (was: Open)

 Improve TestFileTruncate#setup by deleting the snapshots
 

 Key: HDFS-8642
 URL: https://issues.apache.org/jira/browse/HDFS-8642
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Minor
 Attachments: HDFS-8642-00.patch


 I've observed {{TestFileTruncate#setup()}} function has to be improved by 
 making it more independent. Presently if any of the snapshots related test 
 failures will affect all the subsequent unit test cases. One such error has 
 been observed in the 
 [Hadoop-Hdfs-trunk-2163|https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart]
 {code}
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/2163/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestart/
 org.apache.hadoop.ipc.RemoteException: The directory /test cannot be deleted 
 since /test is snapshottable and already has snapshots
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.checkSnapshot(FSDirSnapshotOp.java:226)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:54)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.deleteInternal(FSDirDeleteOp.java:177)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:104)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3046)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:939)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:608)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2172)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2166)
   at org.apache.hadoop.ipc.Client.call(Client.java:1440)
   at org.apache.hadoop.ipc.Client.call(Client.java:1371)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
   at com.sun.proxy.$Proxy22.delete(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540)
   at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at com.sun.proxy.$Proxy23.delete(Unknown Source)
   at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1711)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:718)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:714)
   at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
   at 
 org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.setup(TestFileTruncate.java:119)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8462) Implement GETXATTRS and LISTXATTRS operation for WebImageViewer

2015-06-22 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595516#comment-14595516
 ] 

Akira AJISAKA commented on HDFS-8462:
-

Thanks [~jagadesh.kiran] for updating the patch. Minor comment:
{code:title=TestOfflineImageViewerForXAttr.java}
WebImageViewer viewer = new WebImageViewer(
NetUtils.createSocketAddr(localhost:0));
try {
  viewer.initServer(originalFsimage.getAbsolutePath());
...
} finally {
  // shutdown the viewer
  viewer.close();
}
{code}
Would you use try-with-resources instead of try-finally? I'm +1 if that is 
addressed.

 Implement GETXATTRS and LISTXATTRS operation for WebImageViewer
 ---

 Key: HDFS-8462
 URL: https://issues.apache.org/jira/browse/HDFS-8462
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Akira AJISAKA
Assignee: Jagadesh Kiran N
 Attachments: HDFS-8462-00.patch, HDFS-8462-01.patch, 
 HDFS-8462-02.patch, HDFS-8462-03.patch


 In Hadoop 2.7.0, WebImageViewer supports the following operations:
 * {{GETFILESTATUS}}
 * {{LISTSTATUS}}
 * {{GETACLSTATUS}}
 I'm thinking it would be better for administrators if {{GETXATTRS}} and 
 {{LISTXATTRS}} are supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8609) Dead Code in DFS Util for DFSUtil#substituteForWildcardAddress

2015-06-22 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt resolved HDFS-8609.

Resolution: Invalid

Sorry my mistake . Closing the issue as invalid 

 Dead Code in DFS Util for DFSUtil#substituteForWildcardAddress
 --

 Key: HDFS-8609
 URL: https://issues.apache.org/jira/browse/HDFS-8609
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Bibin A Chundatt
Assignee: Surendra Singh Lilhore
Priority: Minor

 Dead code after JDK 1.4
 {code}
 otherHttpAddr = DFSUtil.getInfoServerWithDefaultHost(
 otherIpcAddr.getHostName(), otherNode, scheme).toURL();
 {code}
 In {{DFSUtil#substituteForWildcardAddress}} 
 {code}
  if (addr != null  addr.isAnyLocalAddress()) {
 ...
 }
 {code}
 addr.isAnyLocalAddress() will always return false.
 Always the url will be formed with address which is configured  in 
 hdfs-site.xml .Same will affect bootStrap from NN and ssl certificate check



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8493) Consolidate truncate() related implementation in a single class

2015-06-22 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596306#comment-14596306
 ] 

Haohui Mai commented on HDFS-8493:
--

bq. IMO, I think this is okay, especially for write ops, provided fsn writelock 
is held. And I can see many places where this resolution is done under fsn lock 
held, but not fsd lock.

Can you list these places and file jiras? They are critical bugs and should be 
fixed.

bq. This triggered thoughts, Why two separate locks, fsdir lock and 
fsnamesystem locks.? Almost all ops are go through fsn with lock (read/write) 
held, and then go on-to get fsdir locks.

Though most of the time the fsd lock is acquired within the fsn lock.  
BlockManager and LeaseManager only requires the fsn lock but not the fsd lock. 
We're in the process of cleaning up the locks of both fsn and fsd locks. At the 
end of the day the NN should be able to process block reports w/o blocking 
requests to the namespace.

 Consolidate truncate() related implementation in a single class
 ---

 Key: HDFS-8493
 URL: https://issues.apache.org/jira/browse/HDFS-8493
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Rakesh R
 Attachments: HDFS-8493-001.patch, HDFS-8493-002.patch, 
 HDFS-8493-003.patch, HDFS-8493-004.patch, HDFS-8493-005.patch, 
 HDFS-8493-006.patch, HDFS-8493-007.patch, HDFS-8493-007.patch


 This jira proposes to consolidate truncate() related methods into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8542) WebHDFS getHomeDirectory behavior does not match specification

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596331#comment-14596331
 ] 

Hadoop QA commented on HDFS-8542:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 22s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 52s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  7s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m  8s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 21s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 159m 54s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 16s | Tests passed in 
hadoop-hdfs-client. |
| | | 205m 21s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.TestEncryptionZonesWithKMS |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741037/HDFS-8542-02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11435/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11435/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11435/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11435/console |


This message was automatically generated.

 WebHDFS getHomeDirectory behavior does not match specification
 --

 Key: HDFS-8542
 URL: https://issues.apache.org/jira/browse/HDFS-8542
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Jakob Homan
Assignee: kanaka kumar avvaru
 Attachments: HDFS-8542-00.patch, HDFS-8542-01.patch, 
 HDFS-8542-02.patch


 Per the 
 [spec|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Home_Directory],
  WebHDFS provides a REST endpoint for getting the user's home directory:
 {noformat}Submit a HTTP GET request.
 curl -i http://HOST:PORT/webhdfs/v1/?op=GETHOMEDIRECTORY{noformat}
 However, WebHDFSFileSystem.java does not use this, instead building the home 
 [directory 
 locally|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java#L271]:
 {code}  /** @return the home directory. */
   public static String getHomeDirectoryString(final UserGroupInformation ugi) 
 {
 return /user/ + ugi.getShortUserName();
   }
   @Override
   public Path getHomeDirectory() {
 return makeQualified(new Path(getHomeDirectoryString(ugi)));
   }{code}
 The WebHDFSFileSystem client should call to the REST service to determine the 
 home directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8643) Add snapshot names list to SnapshottableDirectoryStatus

2015-06-22 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596432#comment-14596432
 ] 

Rakesh R commented on HDFS-8643:


Following warnings are not related to this patch:
- Whitespace: the reported problem is not in the scope of the patch, line no: 
58 is next line to the proposed changes in the patch
{code}
./hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshottableDirectoryStatus.java:58:
{code}
- checkstyle: the reported problem is already exists in the present code.
{code}
./hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshottableDirectoryStatus.java:62:10:
 More than 7 parameters (found 12).
{code}
- Test case failure: It looks like not related to the patch.
{code}
https://builds.apache.org/job/PreCommit-HDFS-Build/11433/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestBlocksWithNotEnoughRacks/testSufficientlyReplBlocksUsesNewRack/
java.lang.RuntimeException: java.util.zip.ZipException: invalid code lengths set
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:122)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown 
Source)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
Source)
at 
org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2546)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2534)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2605)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2558)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2469)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1205)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1177)
at org.apache.hadoop.conf.Configuration.setLong(Configuration.java:1422)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks.getConf(TestBlocksWithNotEnoughRacks.java:63)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks.testSufficientlyReplBlocksUsesNewRack(TestBlocksWithNotEnoughRacks.java:88)
{code}

 Add snapshot names list to SnapshottableDirectoryStatus
 ---

 Key: HDFS-8643
 URL: https://issues.apache.org/jira/browse/HDFS-8643
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-8643-00.patch


 The idea of this jira to enhance {{SnapshottableDirectoryStatus}} by adding 
 {{snapshotNames}} attribute into it, presently it has the {{snapshotNumber}}. 
 IMHO this would help the users to get the list of snapshot names created. 
 Also, the snapshot names can be used while renaming or deleting the snapshots.
 {code}
 org.apache.hadoop.hdfs.protocol.SnapshottableDirectoryStatus.java
   /**
* @return Snapshot names for the directory.
*/
   public List String getSnapshotNames() {
 return snapshotNames;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.

2015-06-22 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596430#comment-14596430
 ] 

Haohui Mai commented on HDFS-8617:
--

bq. Andrew and I actually benchmarked setting ioprio in order to implement 
quality of service on the DataNode. It didn't have very much effect.

Can you please share your code and results? Thanks

 Throttle DiskChecker#checkDirs() speed.
 ---

 Key: HDFS-8617
 URL: https://issues.apache.org/jira/browse/HDFS-8617
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-8617.000.patch


 As described in HDFS-8564,  {{DiskChecker.checkDirs(finalizedDir)}} is 
 causing excessive I/Os because {{finalizedDirs}} might have up to 64K 
 sub-directories (HDFS-6482).
 This patch proposes to limit the rate of IO operations in 
 {{DiskChecker.checkDirs()}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7214) Display the time when NN became active on the webUI

2015-06-22 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated HDFS-7214:
--
Attachment: HDFS-7214.v4.patch

 Display the time when NN became active on the webUI
 ---

 Key: HDFS-7214
 URL: https://issues.apache.org/jira/browse/HDFS-7214
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, 
 HDFS-7214.v3.patch, HDFS-7214.v4.patch


 The currently NN webUI displayed JVM start up. It will be useful to show when 
 NN became active. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4366) Block Replication Policy Implementation May Skip Higher-Priority Blocks for Lower-Priority Blocks

2015-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596659#comment-14596659
 ] 

Hudson commented on HDFS-4366:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8045 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8045/])
Move HDFS-4366 to 2.8.0 in CHANGES.txt (wang: rev 
5590e914f5889413da9eda047f64842c4b67fe85)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Block Replication Policy Implementation May Skip Higher-Priority Blocks for 
 Lower-Priority Blocks
 -

 Key: HDFS-4366
 URL: https://issues.apache.org/jira/browse/HDFS-4366
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Derek Dagit
Assignee: Derek Dagit
 Fix For: 3.0.0

 Attachments: HDFS-4366-branch-2.patch, HDFS-4366.patch, 
 HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, 
 HDFS-4366.patch, hdfs-4366-unittest.patch


 In certain cases, higher-priority under-replicated blocks can be skipped by 
 the replication policy implementation.  The current implementation maintains, 
 for each priority level, an index into a list of blocks that are 
 under-replicated.  Together, the lists compose a priority queue (see note 
 later about branch-0.23).  In some cases when blocks are removed from a list, 
 the caller (BlockManager) properly handles the index into the list from which 
 it removed a block.  In some other cases, the index remains stationary while 
 the list changes.  Whenever this happens, and the removed block happened to 
 be at or before the index, the implementation will skip over a block when 
 selecting blocks for replication work.
 In situations when entire racks are decommissioned, leading to many 
 under-replicated blocks, loss of blocks can occur.
 Background: HDFS-1765
 This patch to trunk greatly improved the state of the replication policy 
 implementation.  Prior to the patch, the following details were true:
   * The block priority queue was no such thing: It was really set of 
 trees that held blocks in natural ordering, that being by the blocks ID, 
 which resulted in iterator walks over the blocks in pseudo-random order.
   * There was only a single index into an iteration over all of the 
 blocks...
   * ... meaning the implementation was only successful in respecting 
 priority levels on the first pass.  Overall, the behavior was a 
 round-robin-type scheduling of blocks.
 After the patch
   * A proper priority queue is implemented, preserving log n operations 
 while iterating over blocks in the order added.
   * A separate index for each priority is key is kept...
   * ... allowing for processing of the highest priority blocks first 
 regardless of which priority had last been processed.
 The change was suggested for branch-0.23 as well as trunk, but it does not 
 appear to have been pulled in.
 The problem:
 Although the indices are now tracked in a better way, there is a 
 synchronization issue since the indices are managed outside of methods to 
 modify the contents of the queue.
 Removal of a block from a priority level without adjusting the index can mean 
 that the index then points to the block after the block it originally pointed 
 to.  In the next round of scheduling for that priority level, the block 
 originally pointed to by the index is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.

2015-06-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596369#comment-14596369
 ] 

Colin Patrick McCabe commented on HDFS-8617:


Andrew and I actually benchmarked setting {{ioprio}} in order to implement 
quality of service on the DataNode.  It didn't have very much effect.

In general, more and more I/O scheduling is moving out of the operating system 
and into the storage device.  Back in the old days, operating systems would 
feed requests to disks one at a time.  Disks took a long time to process 
requests in those days so it was easy for the CPU to stay well ahead of the 
disk and basically lead it around by the nose.  Nowadays, hard disks have huge 
on-disk write buffers (several megabytes in size) and internal software that 
handles draining them.  The hard drive doesn't necessarily process requests in 
the order it gets them.  The situation with SSDs is even worse... SSDs have a 
huge internal layer of firmware that handles servicing any request.  In general 
with SSDs the role of the OS is just to forward requests as quickly as possible 
to try to keep up with the very fast speed of the SSD.  This is why Linux 
tuning guides tell you to turn your I/O schedule to either {{noop}} or 
{{deadline}} for best performance on SSDs.

Of course, when disks fail, they usually don't fail all at once.  Instead, more 
and more operations start to time out and produce I/O errors.  This is 
problematic for systems like HBase which strive for low latency.  That's why we 
developed workarounds like hedged reads.  However, HDFS's checkDirs behavior 
here is making the situation much worse.  For a disk that returns I/O errors 
every so often, each error may trigger a new full scan of every block file on 
the datanode.  While it's true that these scans just look at the metadata, not 
the data, they still can put a heavy load on the system.

It's pointless to keep rescanning the filesystem continuously when a disk 
starts returning errors.  At the very most, we should rescan only the drive 
that's failing.  And we should not do it continuously, but maybe once every 
hour or half hour.  An HBase sysadmin asked me how to configure this behavior 
and I had to tell him that we have absolutely no way to do it.

bq. I'm unsure whether \[andrew's IOPs calculation\] is the right math. I just 
checked the code. It looks like checkDir() mostly performs read-only operations 
on the metadata of the underlying filesystem. The metadata can be fully cached 
thus the parameter can be way off (and for SSD the parameter needs to be 
recalculated). That comes back to the point that it is difficult to determine 
the right parameter for various configuration. The difficulties of finding the 
parameter leads me to believe that using throttling here is flawed.

When your application is latency-sensitive (such as HBase), it makes sense to 
do a worst-case calculation of how many IOPS per second the workload may 
generate.  While it's true that sometime this may be overly pessimistic if 
things are cached in memory, it is the right math to do when latency is 
critical.

 Throttle DiskChecker#checkDirs() speed.
 ---

 Key: HDFS-8617
 URL: https://issues.apache.org/jira/browse/HDFS-8617
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-8617.000.patch


 As described in HDFS-8564,  {{DiskChecker.checkDirs(finalizedDir)}} is 
 causing excessive I/Os because {{finalizedDirs}} might have up to 64K 
 sub-directories (HDFS-6482).
 This patch proposes to limit the rate of IO operations in 
 {{DiskChecker.checkDirs()}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI

2015-06-22 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596419#comment-14596419
 ] 

Haohui Mai commented on HDFS-7214:
--

bq. 1. for suggestion of changing long to AtomicLong, I am not quite sure what 
the improvement would be.

http://psy-lob-saw.blogspot.com/2012/12/atomiclazyset-is-performance-win-for.html

bq. 2. I am not sure returning the timestamp is a good idea. Since getNNStarted 
method returns a Date object and the UI just displays it. I am just simply 
following that pattern.

We cannot change {{getNNStarted}} due to backward compatibility issues. New 
code should return timestamp.

 Display the time when NN became active on the webUI
 ---

 Key: HDFS-7214
 URL: https://issues.apache.org/jira/browse/HDFS-7214
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, 
 HDFS-7214.v3.patch


 The currently NN webUI displayed JVM start up. It will be useful to show when 
 NN became active. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI

2015-06-22 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596326#comment-14596326
 ] 

Haohui Mai commented on HDFS-7214:
--

Thanks for working on this.

{code}
-  
+  /** the time when the namenode became active */
+  private volatile long activeStateStartTime;
{code}

This is a read-dominant workload. It makes more sense to use {{AtomicLong}} 
here. You can call {{lazySet()}} to update the timestamp.

{code} 
+  @Override // NameNodeStatusMXBean
+  public String getNNTransitToActiveTime() {
+if (activeStateStartTime == 0) {
+  return N/A;
+}
+return new Date(activeStateStartTime).toString();
+  }
+
{code}

It might be better to return the timestamp and let the UI to format the date, 
considering locales and timezone issues.

 Display the time when NN became active on the webUI
 ---

 Key: HDFS-7214
 URL: https://issues.apache.org/jira/browse/HDFS-7214
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, 
 HDFS-7214.v3.patch


 The currently NN webUI displayed JVM start up. It will be useful to show when 
 NN became active. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI

2015-06-22 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596372#comment-14596372
 ] 

Siqi Li commented on HDFS-7214:
---

[~wheat9], Thanks for your feedback.

1. for suggestion of changing long to AtomicLong, I am not quite sure what the 
improvement would be.

2. I am not sure returning the timestamp is a good idea. Since getNNStarted 
method returns a Date object and the UI just displays it. I am just simply 
following that pattern.

 Display the time when NN became active on the webUI
 ---

 Key: HDFS-7214
 URL: https://issues.apache.org/jira/browse/HDFS-7214
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, 
 HDFS-7214.v3.patch


 The currently NN webUI displayed JVM start up. It will be useful to show when 
 NN became active. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8644) OzoneHandler : Add volume handler

2015-06-22 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer reassigned HDFS-8644:
--

Assignee: Anu Engineer

 OzoneHandler : Add volume handler
 -

 Key: HDFS-8644
 URL: https://issues.apache.org/jira/browse/HDFS-8644
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Anu Engineer
Assignee: Anu Engineer

 Add volume handler logic that dispatches volume related calls to the right 
 interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8634) OzoneHandler: Add userAuth Interface and Simple userAuth handler

2015-06-22 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-8634:
---
Attachment: hdfs-8634-HDFS-7240.001.patch

UserAuth Interface and simple Auth handler

 OzoneHandler: Add userAuth Interface and Simple userAuth handler
 

 Key: HDFS-8634
 URL: https://issues.apache.org/jira/browse/HDFS-8634
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Anu Engineer
Assignee: Anu Engineer
 Attachments: hdfs-8634-HDFS-7240.001.patch


 Add user authentication interface and also the first concrete implementation 
 for that interface called simple. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8634) OzoneHandler: Add userAuth Interface and Simple userAuth handler

2015-06-22 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-8634:
---
Status: Patch Available  (was: Open)

 OzoneHandler: Add userAuth Interface and Simple userAuth handler
 

 Key: HDFS-8634
 URL: https://issues.apache.org/jira/browse/HDFS-8634
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Anu Engineer
Assignee: Anu Engineer
 Attachments: hdfs-8634-HDFS-7240.001.patch


 Add user authentication interface and also the first concrete implementation 
 for that interface called simple. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks

2015-06-22 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597131#comment-14597131
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8619:
---

Patch looks good in general.  I agree that we should do most of the changes in 
trunk.

- Just a question, why removing the if-condition below?  Is the condition 
always true?
{code}
//BlockManager.invalidateBlock(..)
-} else if (nr.liveReplicas() = 1) {
+} else {
{code}

- Let's move numCorruptReplicas from BlockManager to BlockManagerTestUtil.

- See also if we could move getCorruptReplicaBlockIds from CorruptReplicasMap 
to BlockManagerTestUtil or some other class in test.

 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1431) Balancer should work with the logic of BlockPlacementPolicy

2015-06-22 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597133#comment-14597133
 ] 

Ming Ma commented on HDFS-1431:
---

When I met with [~andrew.wang], [~ctrezzo], [~atm], [~cmccabe] the other day, 
we had brief discussion about balancer. To make balancer use 
BlockPlacementPolicy, alternatively we can run balancer inside namenode. 
Namenode already has the necessary information. It needs to provide balancer 
throttling with some refactoring. But overall it seems it shouldn't create much 
overhead on namenode. It will be great to heard from others about this approach 
on potential issues such as scale and performance.

 Balancer should work with the logic of BlockPlacementPolicy
 ---

 Key: HDFS-1431
 URL: https://issues.apache.org/jira/browse/HDFS-1431
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: HDFS-1431.txt


 Currently Balancer does not obtain information from BlockPlacementPolicy so 
 it can transfer the blocks without checking with BlockPlacementPolicy.
 This causes the policy break after balancing the cluster.
 There are some new policies proposed in HDFS-1094 and MAPREDUCE-1831 in which 
 the block placement follows some pattern.
 The pattern can be broken by Balancer.
 I propose that we add the following method in BlockPlacementPolicy:
 {code}
   abstract public boolean canBeMoved(String fileName, Block block,
 DatanodeInfo source, DatanodeInfo destination);
 {code}
 And make Balancer use it in
 {code}
   private boolean isGoodBlockCandidate(Source source,
   BalancerDatanode target, BalancerBlock block)
 {code}
 What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-06-22 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597132#comment-14597132
 ] 

Vinayakumar B commented on HDFS-8578:
-

bq. Vinayakumar B Have you done any performance benchmarking with this 
approach? If yes, Could you please post the results here?
Nope. I dont have any results. 

 On upgrade, Datanode should process all storage/data dirs in parallel
 -

 Key: HDFS-8578
 URL: https://issues.apache.org/jira/browse/HDFS-8578
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Raju Bairishetti
Priority: Critical
 Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch


 Right now, during upgrades datanode is processing all the storage dirs 
 sequentially. Assume it takes ~20 mins to process a single storage dir then  
 datanode which has ~10 disks will take around 3hours to come up.
 *BlockPoolSliceStorage.java*
 {code}
for (int idx = 0; idx  getNumStorageDirs(); idx++) {
   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
   assert getCTime() == nsInfo.getCTime() 
   : Data-node and name-node CTimes must be the same.;
 }
 {code}
 It would save lots of time during major upgrades if datanode process all 
 storagedirs/disks parallelly.
 Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7214) Display the time when NN became active on the webUI

2015-06-22 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596981#comment-14596981
 ] 

Ming Ma commented on HDFS-7214:
---

[~l201514] you are right that HDFS-7257 has added the metrics for it. Having 
this information on the webUI is useful. Maybe we can still update webUI based 
on jmx?

 Display the time when NN became active on the webUI
 ---

 Key: HDFS-7214
 URL: https://issues.apache.org/jira/browse/HDFS-7214
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Siqi Li
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: HDFS-7214.v1.patch, HDFS-7214.v2.patch, 
 HDFS-7214.v3.patch, HDFS-7214.v4.patch


 The currently NN webUI displayed JVM start up. It will be useful to show when 
 NN became active. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8646) Prune cached replicas from DatanodeDescriptor state on replica invalidation

2015-06-22 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-8646:
--
Attachment: hdfs-8646.001.patch

Patch attached. We now prune in BlockManager#removeStoredBlock, which should be 
pretty failsafe. New test exercises this logic, and I also added a failsafe 
prune in CacheManager in case we missed some other similar case.

 Prune cached replicas from DatanodeDescriptor state on replica invalidation
 ---

 Key: HDFS-8646
 URL: https://issues.apache.org/jira/browse/HDFS-8646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-8646.001.patch


 Currently we remove blocks from the DD's CachedBlockLists on node failure and 
 on cache report, but not on replica invalidation. This can lead to an invalid 
 situation where we return a LocatedBlock with cached locations that are not 
 backed by an on-disk replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6564) Use slf4j instead of common-logging in hdfs-client

2015-06-22 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-6564:
---
Release Note: Users may need special attention for this change while 
upgrading to this version. Previously hdfs client was using commons-logging as 
the logging framework. With this change it will use slf4j framework. For more 
details about slf4j, please see: http://www.slf4j.org/manual.html. Also, 
org.apache.hadoop.hdfs.protocol.CachePoolInfo#LOG public static member variable 
has been removed as it is not used anywhere. Users need to correct their code 
if any one has a reference to this variable. One can retrieve the named logger 
via the logging framework of their choice directly like, org.slf4j.Logger LOG = 
org.slf4j.LoggerFactory.getLogger(org.apache.hadoop.hdfs.protocol.CachePoolInfo.class);

 Use slf4j instead of common-logging in hdfs-client
 --

 Key: HDFS-6564
 URL: https://issues.apache.org/jira/browse/HDFS-6564
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Rakesh R
 Attachments: HDFS-6564-01.patch, HDFS-6564-02.patch, 
 HDFS-6564-03.patch


 hdfs-client should depends on slf4j instead of common-logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-06-22 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597016#comment-14597016
 ] 

Jesse Yates commented on HDFS-6440:
---

Rebased on trunk, tests pass locally for me.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2015-06-22 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597130#comment-14597130
 ] 

Vinayakumar B commented on HDFS-8277:
-

bq. If we cannot make the change for 2.x I prefer not changing the current 
behavior of failing 'safemode enter' when SBN is down.
In case, where SNN is down, may be for maintenance, but available in 
configuration, going ahead to next namenode on connectexception seems 
reasonable.

To avoid unexpected behavior, may be can add active/standby check for the next 
namenode before changing the safemode status and can change only if next 
namenode is active.?

Though this is kind of workaround instead of breaking compatibility, IMO 
proposal as in v1 patch seems reasonable. 

Any thoughts?


 Safemode enter fails when Standby NameNode is down
 --

 Key: HDFS-8277
 URL: https://issues.apache.org/jira/browse/HDFS-8277
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, HDFS, namenode
Affects Versions: 2.6.0
 Environment: HDP 2.2.0
Reporter: Hari Sekhon
Assignee: Surendra Singh Lilhore
Priority: Minor
 Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, 
 HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch


 HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
 AMBARI-10536).
 {code}hdfs dfsadmin -safemode enter
 safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused{code}
 This appears to be a bug in that it's not trying both NameNodes like the 
 standard hdfs client code does, and is instead stopping after getting a 
 connection refused from nn1 which is down. I verified normal hadoop fs writes 
 and reads via cli did work at this time, using nn2. I happened to run this 
 command as the hdfs user on nn2 which was the surviving Active NameNode.
 After I re-bootstrapped the Standby NN to fix it the command worked as 
 expected again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times

2015-06-22 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597138#comment-14597138
 ] 

Vinayakumar B commented on HDFS-7645:
-

bq. This change is incompatible since we expose RollingUpgradeInfo in the NN's 
JMX (a public API). As discussed above, rather than being null on finalization, 
it now sets the finalization time.
Oh! Thanks [~andrew.wang] for pointing out. That was a miss.

bq. Have we thought about other ways of solving this issue? Else we can change 
the JMX method to still return null on finalization.
Since DN side wanted to differentiate between FINALIZED rollingupgrade status 
and rolledback status, Setting the finalizetime on finalization.

bq. Else we can change the JMX method to still return null on finalization.
We can do this if this fix is backported to stable branches. Currently its only 
available in branch-2.
If not so critical to change it back, then we can add a release note indicating 
the change.

Note that, {{ClientProtocol#rollingUpgrade(..)}} also changed to return 
non-null finalized status as well.

 Rolling upgrade is restoring blocks from trash multiple times
 -

 Key: HDFS-7645
 URL: https://issues.apache.org/jira/browse/HDFS-7645
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.0
Reporter: Nathan Roberts
Assignee: Keisuke Ogiwara
 Fix For: 2.8.0

 Attachments: HDFS-7645.01.patch, HDFS-7645.02.patch, 
 HDFS-7645.03.patch, HDFS-7645.04.patch, HDFS-7645.05.patch, 
 HDFS-7645.06.patch, HDFS-7645.07.patch


 When performing an HDFS rolling upgrade, the trash directory is getting 
 restored twice when under normal circumstances it shouldn't need to be 
 restored at all. iiuc, the only time these blocks should be restored is if we 
 need to rollback a rolling upgrade. 
 On a busy cluster, this can cause significant and unnecessary block churn 
 both on the datanodes, and more importantly in the namenode.
 The two times this happens are:
 1) restart of DN onto new software
 {code}
   private void doTransition(DataNode datanode, StorageDirectory sd,
   NamespaceInfo nsInfo, StartupOption startOpt) throws IOException {
 if (startOpt == StartupOption.ROLLBACK  sd.getPreviousDir().exists()) {
   Preconditions.checkState(!getTrashRootDir(sd).exists(),
   sd.getPreviousDir() +  and  + getTrashRootDir(sd) +  should not 
  +
both be present.);
   doRollback(sd, nsInfo); // rollback if applicable
 } else {
   // Restore all the files in the trash. The restored files are retained
   // during rolling upgrade rollback. They are deleted during rolling
   // upgrade downgrade.
   int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd));
   LOG.info(Restored  + restored +  block files from trash.);
 }
 {code}
 2) When heartbeat response no longer indicates a rollingupgrade is in progress
 {code}
   /**
* Signal the current rolling upgrade status as indicated by the NN.
* @param inProgress true if a rolling upgrade is in progress
*/
   void signalRollingUpgrade(boolean inProgress) throws IOException {
 String bpid = getBlockPoolId();
 if (inProgress) {
   dn.getFSDataset().enableTrash(bpid);
   dn.getFSDataset().setRollingUpgradeMarker(bpid);
 } else {
   dn.getFSDataset().restoreTrash(bpid);
   dn.getFSDataset().clearRollingUpgradeMarker(bpid);
 }
   }
 {code}
 HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely 
 clear whether this is somehow intentional. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks

2015-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597142#comment-14597142
 ] 

Zhe Zhang commented on HDFS-8619:
-

bq. I guess we still need this jira for adding the striped block logic and the 
tests.
Sure. I see at least the tests are specific to striped blocks.

bq. Besides, we now have merged quite a few changes to trunk, any plan for 
merging trunk changes to the HDFS-7285 feature branch?
Thanks for bringing up the question Jing. These days I'm mostly focused on this 
front. Since the merged changes are quite big, I'm rebasing the entire 
consolidated HDFS-7285 patch instead of individual patches. Late last week I 
finished a first round of rebasing which is quite rough. I posted a [comment 
https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14593827page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14593827]
 but I guess it is hidden in between Hudson messages (could you help delete 
them?). I'm working on a second round of rebase to include all changes. 
Hopefully also to split it to functional pieces like Support EC zones, 
Allocate and persist striped blocks in NameNode, Add striped block support 
in INodeFile.. I'll post a new rebased patch soon.

 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode

2015-06-22 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597156#comment-14597156
 ] 

Vinayakumar B commented on HDFS-8586:
-

Thanks [~brahmareddy] for reporting this.
This will come, if the NameNode have the list of deadnodes, and block 
allocation request comes from the same machine as of DeadNode, then dead node 
is being chosen as localnode irrespective of whether its part of the cluster or 
not. Adding one check in 
{{BlockPlacementPolicyDefault.java#choseLocalStorage(..)}} will be the fix for 
this.

Regarding the test proposed above, it will not fail always, since its a 
minidfscluster test, and all datanodes will be on the same machine And 
Probabiity of deadnode being chosen as localstorage is not guaranteed.

 Dead Datanode is allocated for write when client is  from deadnode
 --

 Key: HDFS-8586
 URL: https://issues.apache.org/jira/browse/HDFS-8586
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Critical

  *{color:blue}DataNode marked as Dead{color}* 
 2015-06-11 19:39:00,862 | INFO  | 
 org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
  | BLOCK*  *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009*  | 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584)
 2015-06-11 19:39:00,863 | INFO  | 
 org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
  | Removing a node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488)
   *{color:blue}Deadnode got Allocated{color}* 
 2015-06-11 19:39:45,148 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | INFO  | IPC Server handler 26 on 25000 | BLOCK*  
 *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, 
 truncateBlock=null, primaryNodeIndex=-1, 
 replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW],
   
 ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW],
  ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: 
 *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)
 2015-06-11 19:39:45,191 | INFO  | IPC Server handler 35 on 25000 | BLOCK* 
 allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
 primaryNodeIndex=-1, 
 replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW],
  
 ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW],
  ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: 
 {{NORMAL:XX.XX.37.33:25009}}   |RBW]]} for /t1._COPYING_ | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596984#comment-14596984
 ] 

Hadoop QA commented on HDFS-6440:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 24 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 49s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m  3s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   4m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m  0s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 32s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 142m 30s | Tests failed in hadoop-hdfs. |
| {color:red}-1{color} | hdfs tests |   0m 16s | Tests failed in bkjournal. |
| | | 219m 10s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead |
|   | hadoop.hdfs.server.namenode.TestCheckpoint |
| Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestNameNodeAcl |
| Failed build | bkjournal |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12740539/hdfs-6440-trunk-v8.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 077250d |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11441/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11441/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| bkjournal test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11441/artifact/patchprocess/testrun_bkjournal.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11441/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11441/console |


This message was automatically generated.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8646) Prune cached replicas from DatanodeDescriptor state on replica invalidation

2015-06-22 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-8646:
-

 Summary: Prune cached replicas from DatanodeDescriptor state on 
replica invalidation
 Key: HDFS-8646
 URL: https://issues.apache.org/jira/browse/HDFS-8646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang


Currently we remove blocks from the DD's CachedBlockLists on node failure and 
on cache report, but not on replica invalidation. This can lead to an invalid 
situation where we return a LocatedBlock with cached locations that are not 
backed by an on-disk replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6564) Use slf4j instead of common-logging in hdfs-client

2015-06-22 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597023#comment-14597023
 ] 

Rakesh R commented on HDFS-6564:


Thanks for the reviews. I've updated {{Release note}} section in the jira. 
Anything else required for this change.

 Use slf4j instead of common-logging in hdfs-client
 --

 Key: HDFS-6564
 URL: https://issues.apache.org/jira/browse/HDFS-6564
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Rakesh R
 Attachments: HDFS-6564-01.patch, HDFS-6564-02.patch, 
 HDFS-6564-03.patch


 hdfs-client should depends on slf4j instead of common-logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8447) Decouple information of files in GetLocatedBlocks

2015-06-22 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-8447:
-
Attachment: HDFS-8447.004.patch

 Decouple information of files in GetLocatedBlocks
 -

 Key: HDFS-8447
 URL: https://issues.apache.org/jira/browse/HDFS-8447
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8447.000.patch, HDFS-8447.001.patch, 
 HDFS-8447.002.patch, HDFS-8447.003.patch, HDFS-8447.004.patch


 The current implementation of {{BlockManager.getLocatedBlocks()}} requires 
 the information of files to be passed as parameters. These information does 
 not affect the results of getting the physical locations of blocks.
 This jira proposes to refactor the call so that 
 {{BlockManager.getLocatedBlocks()}} depends only on the block information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-22 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597011#comment-14597011
 ] 

Duo Zhang commented on HDFS-8515:
-

I've introduced a Http2StreamChannel on the POC branch.

https://github.com/Apache9/hadoop/tree/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/http2

Let me extract a patch for it, thanks.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, 
 HDFS-8515-v3.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8493) Consolidate truncate() related implementation in a single class

2015-06-22 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597140#comment-14597140
 ] 

Vinayakumar B commented on HDFS-8493:
-

bq. Though most of the time the fsd lock is acquired within the fsn lock. 
BlockManager and LeaseManager only requires the fsn lock but not the fsd lock. 
We're in the process of cleaning up the locks of both fsn and fsd locks. At the 
end of the day the NN should be able to process block reports w/o blocking 
requests to the namespace.
Okay.

bq. Following are the functions where it has done the resolution 
fsd.resolvePath(pc, src, pathComponents); by acquiring only fsn lock and not 
fsd lock. Could you please take a look at it.
Thanks [~rakeshr] for listing out those methods. Can you file a follow-up jira 
to handle those?

 Consolidate truncate() related implementation in a single class
 ---

 Key: HDFS-8493
 URL: https://issues.apache.org/jira/browse/HDFS-8493
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Rakesh R
 Attachments: HDFS-8493-001.patch, HDFS-8493-002.patch, 
 HDFS-8493-003.patch, HDFS-8493-004.patch, HDFS-8493-005.patch, 
 HDFS-8493-006.patch, HDFS-8493-007.patch, HDFS-8493-007.patch


 This jira proposes to consolidate truncate() related methods into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-06-22 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597150#comment-14597150
 ] 

Vinayakumar B commented on HDFS-8578:
-

[~raju.bairishetti], would you mind testing with this patch with your loads? 

 On upgrade, Datanode should process all storage/data dirs in parallel
 -

 Key: HDFS-8578
 URL: https://issues.apache.org/jira/browse/HDFS-8578
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Raju Bairishetti
Priority: Critical
 Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch


 Right now, during upgrades datanode is processing all the storage dirs 
 sequentially. Assume it takes ~20 mins to process a single storage dir then  
 datanode which has ~10 disks will take around 3hours to come up.
 *BlockPoolSliceStorage.java*
 {code}
for (int idx = 0; idx  getNumStorageDirs(); idx++) {
   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
   assert getCTime() == nsInfo.getCTime() 
   : Data-node and name-node CTimes must be the same.;
 }
 {code}
 It would save lots of time during major upgrades if datanode process all 
 storagedirs/disks parallelly.
 Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-22 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8515:

Attachment: HDFS-8515-v4.patch

A solution based on AbstractChannel.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, 
 HDFS-8515-v3.patch, HDFS-8515-v4.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode

2015-06-22 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-8586:
---
Status: Patch Available  (was: Open)

 Dead Datanode is allocated for write when client is  from deadnode
 --

 Key: HDFS-8586
 URL: https://issues.apache.org/jira/browse/HDFS-8586
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Critical
 Attachments: HDFS-8586.patch


  *{color:blue}DataNode marked as Dead{color}* 
 2015-06-11 19:39:00,862 | INFO  | 
 org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
  | BLOCK*  *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009*  | 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584)
 2015-06-11 19:39:00,863 | INFO  | 
 org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
  | Removing a node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488)
   *{color:blue}Deadnode got Allocated{color}* 
 2015-06-11 19:39:45,148 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | INFO  | IPC Server handler 26 on 25000 | BLOCK*  
 *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, 
 truncateBlock=null, primaryNodeIndex=-1, 
 replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW],
   
 ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW],
  ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: 
 *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)
 2015-06-11 19:39:45,191 | INFO  | IPC Server handler 35 on 25000 | BLOCK* 
 allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
 primaryNodeIndex=-1, 
 replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW],
  
 ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW],
  ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: 
 {{NORMAL:XX.XX.37.33:25009}}   |RBW]]} for /t1._COPYING_ | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode

2015-06-22 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-8586:
---
Attachment: HDFS-8586.patch

 Dead Datanode is allocated for write when client is  from deadnode
 --

 Key: HDFS-8586
 URL: https://issues.apache.org/jira/browse/HDFS-8586
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Critical
 Attachments: HDFS-8586.patch


  *{color:blue}DataNode marked as Dead{color}* 
 2015-06-11 19:39:00,862 | INFO  | 
 org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
  | BLOCK*  *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009*  | 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584)
 2015-06-11 19:39:00,863 | INFO  | 
 org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
  | Removing a node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488)
   *{color:blue}Deadnode got Allocated{color}* 
 2015-06-11 19:39:45,148 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | INFO  | IPC Server handler 26 on 25000 | BLOCK*  
 *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, 
 truncateBlock=null, primaryNodeIndex=-1, 
 replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW],
   
 ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW],
  ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: 
 *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)
 2015-06-11 19:39:45,191 | INFO  | IPC Server handler 35 on 25000 | BLOCK* 
 allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
 primaryNodeIndex=-1, 
 replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW],
  
 ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW],
  ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: 
 {{NORMAL:XX.XX.37.33:25009}}   |RBW]]} for /t1._COPYING_ | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8647) Abstract BlockManager's rack policy into BlockPlacementPolicy

2015-06-22 Thread Ming Ma (JIRA)
Ming Ma created HDFS-8647:
-

 Summary: Abstract BlockManager's rack policy into 
BlockPlacementPolicy
 Key: HDFS-8647
 URL: https://issues.apache.org/jira/browse/HDFS-8647
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma


Sometimes we want to have namenode use alternative block placement policy such 
as upgrade domains in HDFS-7541.

BlockManager has built-in assumption about rack policy in functions such as 
useDelHint, blockHasEnoughRacks. That means when we have new block placement 
policy, we need to modify BlockManager to account for the new policy. Ideally 
BlockManager should ask BlockPlacementPolicy object instead. That will allow us 
to provide new BlockPlacementPolicy without changing BlockManager.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8493) Consolidate truncate() related implementation in a single class

2015-06-22 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597178#comment-14597178
 ] 

Rakesh R commented on HDFS-8493:


Yeah, I have raised HDFS-8648 sub-task to revisit these cases and do proper 
corrections.

 Consolidate truncate() related implementation in a single class
 ---

 Key: HDFS-8493
 URL: https://issues.apache.org/jira/browse/HDFS-8493
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Rakesh R
 Attachments: HDFS-8493-001.patch, HDFS-8493-002.patch, 
 HDFS-8493-003.patch, HDFS-8493-004.patch, HDFS-8493-005.patch, 
 HDFS-8493-006.patch, HDFS-8493-007.patch, HDFS-8493-007.patch


 This jira proposes to consolidate truncate() related methods into a single 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8648) Revisit FsDirectory#resolvePath() function usage to check the call is made under proper lock

2015-06-22 Thread Rakesh R (JIRA)
Rakesh R created HDFS-8648:
--

 Summary: Revisit FsDirectory#resolvePath() function usage to check 
the call is made under proper lock
 Key: HDFS-8648
 URL: https://issues.apache.org/jira/browse/HDFS-8648
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


As per the 
[discussion|https://issues.apache.org/jira/browse/HDFS-8493?focusedCommentId=14595735page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14595735]
 in HDFS-8493 the function {{FsDirectory#resolvePath}} usage needs to be 
reviewed. It seems there are many places it has done the resolution 
{{fsd.resolvePath(pc, src, pathComponents);}} by acquiring only fsn lock and 
not fsd lock. As per the initial analysis following are such cases, probably it 
needs to filter out and fix wrong usage.
# FsDirAclOp.java
- getAclStatus()
- modifyAclEntries()
- removeAcl()
- removeDefaultAcl()
- setAcl()
- getAclStatus()
# FsDirDeleteOp.java
- delete(fsn, src, recursive, logRetryCache)
# FsDirRenameOp.java
- renameToInt(fsd, srcArg, dstArg, logRetryCache)
- renameToInt(fsd, srcArg, dstArg, logRetryCache, options)
# FsDirStatAndListingOp.java
- getContentSummary(fsd, src)
- getFileInfo(fsd, srcArg, resolveLink)
- isFileClosed(fsd, src)
- getListingInt(fsd, srcArg, startAfter, needLocation)
# FsDirWriteFileOp.java
- abandonBlock()
- completeFile(fsn, pc, srcArg, holder, last, fileId)
- getEncryptionKeyInfo(fsn, pc, src, supportedVersions)
- startFile()
- validateAddBlock()
# FsDirXAttrOp.java
- getXAttrs(fsd, srcArg, xAttrs)
- listXAttrs(fsd, src)
- setXAttr(fsd, src, xAttr, flag, logRetryCache)
# FSNamesystem.java
- createEncryptionZoneInt()
- getEZForPath()

Thanks [~wheat9], [~vinayrpet] for the advice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode

2015-06-22 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597196#comment-14597196
 ] 

Brahma Reddy Battula commented on HDFS-8586:


[~vinayrpet] thanks a lot for taking a look into this issue.. Added the one 
check in {{BlockPlacementPolicyDefault.java#choseLocalStorage(..)}} and 
corrected the testcase.. Kindly Review

 Dead Datanode is allocated for write when client is  from deadnode
 --

 Key: HDFS-8586
 URL: https://issues.apache.org/jira/browse/HDFS-8586
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Critical
 Attachments: HDFS-8586.patch


  *{color:blue}DataNode marked as Dead{color}* 
 2015-06-11 19:39:00,862 | INFO  | 
 org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
  | BLOCK*  *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009*  | 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584)
 2015-06-11 19:39:00,863 | INFO  | 
 org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
  | Removing a node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488)
   *{color:blue}Deadnode got Allocated{color}* 
 2015-06-11 19:39:45,148 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The 
 cluster does not contain node: /default/rack3/XX.XX.39.33:25009 | 
 org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
 2015-06-11 19:39:45,149 | INFO  | IPC Server handler 26 on 25000 | BLOCK*  
 *allocate blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, 
 truncateBlock=null, primaryNodeIndex=-1, 
 replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW],
   
 ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW],
  ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL: 
 *XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)
 2015-06-11 19:39:45,191 | INFO  | IPC Server handler 35 on 25000 | BLOCK* 
 allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION, truncateBlock=null, 
 primaryNodeIndex=-1, 
 replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW],
  
 ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW],
  ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85: 
 {{NORMAL:XX.XX.37.33:25009}}   |RBW]]} for /t1._COPYING_ | 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >