[jira] [Commented] (HDFS-12684) Ozone: SCMMXBean NodeCount is overlapping with NodeManagerMXBean

2017-10-19 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212196#comment-16212196
 ] 

Eric Yang commented on HDFS-12684:
--

[~cheersyang] Thank you for the clarification.

> Ozone: SCMMXBean NodeCount is overlapping with NodeManagerMXBean
> 
>
> Key: HDFS-12684
> URL: https://issues.apache.org/jira/browse/HDFS-12684
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: HDFS-12684-HDFS-7240.001.patch
>
>
> I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
> both SCM and SCMNodeManager has {{NodeCount}} metrics
> {noformat}
>  {
> "name" : 
> "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
> "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
> "ClientRpcPort" : "9860",
> "DatanodeRpcPort" : "9861",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "CompileInfo" : "2017-10-17T06:47Z xxx",
> "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
> "SoftwareVersion" : "3.1.0-SNAPSHOT",
> "StartedTimeInMillis" : 1508393551065
>   }, {
> "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
> "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "OutOfChillMode" : false,
> "MinimumChillModeNodes" : 1,
> "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 
> 0 nodes reported, minimal 1 nodes required."
>   }
> {noformat}
> hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM

2017-10-19 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11468:
-
Status: Patch Available  (was: Reopened)

Re-trigger Jenkins.

> Ozone: SCM: Add Node Metrics for SCM
> 
>
> Key: HDFS-11468
> URL: https://issues.apache.org/jira/browse/HDFS-11468
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Xiaoyu Yao
>Assignee: Yiqun Lin
>Priority: Critical
>  Labels: OzonePostMerge
> Attachments: HDFS-11468-HDFS-7240.001.patch, 
> HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, 
> HDFS-11468-HDFS-7240.004.patch
>
>
> This ticket is opened to add node metrics in SCM based on heartbeat, node 
> report and container report from datanodes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM

2017-10-19 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin reopened HDFS-11468:
--

> Ozone: SCM: Add Node Metrics for SCM
> 
>
> Key: HDFS-11468
> URL: https://issues.apache.org/jira/browse/HDFS-11468
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Xiaoyu Yao
>Assignee: Yiqun Lin
>Priority: Critical
>  Labels: OzonePostMerge
> Attachments: HDFS-11468-HDFS-7240.001.patch, 
> HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, 
> HDFS-11468-HDFS-7240.004.patch
>
>
> This ticket is opened to add node metrics in SCM based on heartbeat, node 
> report and container report from datanodes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM

2017-10-19 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11468:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Ozone: SCM: Add Node Metrics for SCM
> 
>
> Key: HDFS-11468
> URL: https://issues.apache.org/jira/browse/HDFS-11468
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Xiaoyu Yao
>Assignee: Yiqun Lin
>Priority: Critical
>  Labels: OzonePostMerge
> Attachments: HDFS-11468-HDFS-7240.001.patch, 
> HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, 
> HDFS-11468-HDFS-7240.004.patch
>
>
> This ticket is opened to add node metrics in SCM based on heartbeat, node 
> report and container report from datanodes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212125#comment-16212125
 ] 

Hadoop QA commented on HDFS-11468:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
15s{color} | {color:red} Docker failed to build yetus/hadoop:71bbb86. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-11468 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893167/HDFS-11468-HDFS-7240.004.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21757/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: SCM: Add Node Metrics for SCM
> 
>
> Key: HDFS-11468
> URL: https://issues.apache.org/jira/browse/HDFS-11468
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Xiaoyu Yao
>Assignee: Yiqun Lin
>Priority: Critical
>  Labels: OzonePostMerge
> Attachments: HDFS-11468-HDFS-7240.001.patch, 
> HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, 
> HDFS-11468-HDFS-7240.004.patch
>
>
> This ticket is opened to add node metrics in SCM based on heartbeat, node 
> report and container report from datanodes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212120#comment-16212120
 ] 

Hadoop QA commented on HDFS-12544:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  3s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 468 unchanged - 5 fixed = 470 total (was 473) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 35s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 52s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | HDFS-12544 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893147/HDFS-12544.04.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux fd96f11bdc8a 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / ce7cf66 |
| Default Java | 1.8.0_131 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21756/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21756/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21756/testReport/ |
| modules | C: 

[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212118#comment-16212118
 ] 

Hadoop QA commented on HDFS-12544:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 466 unchanged - 5 fixed = 468 total (was 471) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 29s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 35s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReplication |
|   | hadoop.hdfs.server.federation.router.TestNamenodeHeartbeat |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | HDFS-12544 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893147/HDFS-12544.04.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux f3dba8225884 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 
18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 0f1c037 |
| Default Java | 1.8.0_131 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21755/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21755/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21755/testReport/ 

[jira] [Updated] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM

2017-10-19 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11468:
-
Attachment: HDFS-11468-HDFS-7240.004.patch

Attach new patch that removing node metrics related change. Thanks 
[~cheersyang].

> Ozone: SCM: Add Node Metrics for SCM
> 
>
> Key: HDFS-11468
> URL: https://issues.apache.org/jira/browse/HDFS-11468
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Xiaoyu Yao
>Assignee: Yiqun Lin
>Priority: Critical
>  Labels: OzonePostMerge
> Attachments: HDFS-11468-HDFS-7240.001.patch, 
> HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch, 
> HDFS-11468-HDFS-7240.004.patch
>
>
> This ticket is opened to add node metrics in SCM based on heartbeat, node 
> report and container report from datanodes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12482) Provide a configuration to adjust the weight of EC recovery tasks to adjust the speed of recovery

2017-10-19 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212087#comment-16212087
 ] 

Xiao Chen commented on HDFS-12482:
--

Thanks [~eddyxu] for revving. Looks good in general, a few comments:
- let's add input validation for {{xmitWeight}}. Although {{Math.max}} will use 
1, negative values and 0 looks invalid. Any thoughts on whether we should have 
an upper bound limit?
- in the .md documentation, suggest to add an example so people can know what 
to set without checking code. Maybe something like 'For example, if there are 2 
read streams and 1 write stream, xmits weight of 0.5 means recovery will be 
scheduled for 1 EC block and 1 replicated block. xmits weight of 1.0 means xxx' 
(Or better words)
- Test looks great. Trivially suggest we bump the timeout to at least 3 
minutes, since it took 20 seconds on my local.

> Provide a configuration to adjust the weight of EC recovery tasks to adjust 
> the speed of recovery
> -
>
> Key: HDFS-12482
> URL: https://issues.apache.org/jira/browse/HDFS-12482
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha4
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-12482.00.patch, HDFS-12482.01.patch
>
>
> The relative speed of EC recovery comparing to 3x replica recovery is a 
> function of (EC codec, number of sources, NIC speed, and CPU speed, and etc). 
> Currently the EC recovery has a fixed {{xmitsInProgress}} of {{max(# of 
> sources, # of targets)}} comparing to {{1}} for 3x replica recovery, and NN 
> uses {{xmitsInProgress}} to decide how much recovery tasks to schedule to the 
> DataNode this we can add a coefficient for user to tune the weight of EC 
> recovery tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12448) Make sure user defined erasure coding policy ID will not overflow

2017-10-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212085#comment-16212085
 ] 

Hudson commented on HDFS-12448:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13114 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13114/])
HDFS-12448. Make sure user defined erasure coding policy ID will not 
(kai.zheng: rev ce7cf66e5ed74c124afdb5a6902fbf297211cc04)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/ErasureCodeConstants.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestErasureCodingPolicies.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ErasureCodingPolicyManager.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSErasureCoding.md


> Make sure user defined erasure coding policy ID will not overflow
> -
>
> Key: HDFS-12448
> URL: https://issues.apache.org/jira/browse/HDFS-12448
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: SammiChen
>Assignee: Huafeng Wang
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0
>
> Attachments: HDFS-12448.001.patch, HDFS-12448.002.patch
>
>
> Current policy ID is of type "byte".  1~63 is reserved for built-in erasure 
> coding policy. 64 above is for user defined erasure coding policy. Make sure 
> user policy ID will not overflow when addErasureCodingPolicy API is called. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12688) HDFS File Not Removed Despite Successful "Moved to .Trash" Message

2017-10-19 Thread Shriya Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212035#comment-16212035
 ] 

Shriya Gupta commented on HDFS-12688:
-

I suspected that same asynchronous behaviour when this issue was brought to my 
attention by a user. However, at no point in time am I running two copies of 
the job at the same time. Each time, I have launched my script only after the 
previous execution would end.

I suggested running it multiple times only because the error occurs randomly -- 
it has also shown up at the very first run of the script sometimes.

> HDFS File Not Removed Despite Successful "Moved to .Trash" Message
> --
>
> Key: HDFS-12688
> URL: https://issues.apache.org/jira/browse/HDFS-12688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.0
>Reporter: Shriya Gupta
>Priority: Critical
>
> Wrote a simple script to delete and create a file and ran it multiple times. 
> However, some executions of the script randomly threw a FileAlreadyExists 
> error while the others succeeded despite successful hdfs dfs -rm command. The 
> script is as below, I have reproduced it in two different environments -- 
> hdfs dfs -ls  /user/shriya/shell_test/
> echo "starting hdfs remove **" 
> hdfs dfs -rm -r -f /user/shriya/shell_test/wordcountOutput
>  echo "hdfs compeleted!"
> hdfs dfs -ls  /user/shriya/shell_test/
> echo "starting mapReduce***"
> mapred job -libjars 
> /data/home/shriya/shell_test/hadoop-mapreduce-client-jobclient-2.7.1.jar 
> -submit /data/home/shriya/shell_test/wordcountJob.xml
> The message confirming successful move -- 
> 17/10/19 14:49:12 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://nameservice1/user/shriya/shell_test/wordcountOutput' to trash at: 
> hdfs://nameservice1/user/shriya/.Trash/Current/user/shriya/shell_test/wordcountOutput1508438952728
> The contents of subsequent -ls after -rm also showed that the file still 
> existed)
> The error I got when my MapReduce job tried to create the file -- 
> 17/10/19 14:50:00 WARN security.UserGroupInformation: 
> PriviledgedActionException as: (auth:KERBEROS) 
> cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
> Exception in thread "main" 
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
> at 
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:272)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:315)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12684) Ozone: SCMMXBean NodeCount is overlapping with NodeManagerMXBean

2017-10-19 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12684:
---
Summary: Ozone: SCMMXBean NodeCount is overlapping with NodeManagerMXBean  
(was: Ozone: SCM metrics NodeCount is overlapping with node manager metrics)

> Ozone: SCMMXBean NodeCount is overlapping with NodeManagerMXBean
> 
>
> Key: HDFS-12684
> URL: https://issues.apache.org/jira/browse/HDFS-12684
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: HDFS-12684-HDFS-7240.001.patch
>
>
> I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
> both SCM and SCMNodeManager has {{NodeCount}} metrics
> {noformat}
>  {
> "name" : 
> "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
> "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
> "ClientRpcPort" : "9860",
> "DatanodeRpcPort" : "9861",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "CompileInfo" : "2017-10-17T06:47Z xxx",
> "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
> "SoftwareVersion" : "3.1.0-SNAPSHOT",
> "StartedTimeInMillis" : 1508393551065
>   }, {
> "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
> "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "OutOfChillMode" : false,
> "MinimumChillModeNodes" : 1,
> "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 
> 0 nodes reported, minimal 1 nodes required."
>   }
> {noformat}
> hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12448) Make sure user defined erasure coding policy ID will not overflow

2017-10-19 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-12448:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and 3.0 branch. Thanks [~HuafengWang] for the contribution!

> Make sure user defined erasure coding policy ID will not overflow
> -
>
> Key: HDFS-12448
> URL: https://issues.apache.org/jira/browse/HDFS-12448
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: SammiChen
>Assignee: Huafeng Wang
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0
>
> Attachments: HDFS-12448.001.patch, HDFS-12448.002.patch
>
>
> Current policy ID is of type "byte".  1~63 is reserved for built-in erasure 
> coding policy. 64 above is for user defined erasure coding policy. Make sure 
> user policy ID will not overflow when addErasureCodingPolicy API is called. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics

2017-10-19 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212033#comment-16212033
 ] 

Weiwei Yang commented on HDFS-12684:


Hi [~eyang]

Sorry I was not giving a clear title, this is actually the SCMNodeManager in 
Ozone branch, similar name with yarn NodeManager but they are two different 
services. Just updated the JIRA title to avoid confusing.

> Ozone: SCM metrics NodeCount is overlapping with node manager metrics
> -
>
> Key: HDFS-12684
> URL: https://issues.apache.org/jira/browse/HDFS-12684
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: HDFS-12684-HDFS-7240.001.patch
>
>
> I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
> both SCM and SCMNodeManager has {{NodeCount}} metrics
> {noformat}
>  {
> "name" : 
> "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
> "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
> "ClientRpcPort" : "9860",
> "DatanodeRpcPort" : "9861",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "CompileInfo" : "2017-10-17T06:47Z xxx",
> "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
> "SoftwareVersion" : "3.1.0-SNAPSHOT",
> "StartedTimeInMillis" : 1508393551065
>   }, {
> "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
> "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "OutOfChillMode" : false,
> "MinimumChillModeNodes" : 1,
> "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 
> 0 nodes reported, minimal 1 nodes required."
>   }
> {noformat}
> hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12448) Make sure user defined erasure coding policy ID will not overflow

2017-10-19 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212021#comment-16212021
 ] 

Kai Zheng commented on HDFS-12448:
--

+1 on the latest patch. 

> Make sure user defined erasure coding policy ID will not overflow
> -
>
> Key: HDFS-12448
> URL: https://issues.apache.org/jira/browse/HDFS-12448
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: SammiChen
>Assignee: Huafeng Wang
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-12448.001.patch, HDFS-12448.002.patch
>
>
> Current policy ID is of type "byte".  1~63 is reserved for built-in erasure 
> coding policy. 64 above is for user defined erasure coding policy. Make sure 
> user policy ID will not overflow when addErasureCodingPolicy API is called. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-12620:
---
Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-10467

> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.004.patch, HDFS-10467-branch-2.patch, 
> HDFS-12620-branch-2.000.patch, HDFS-12620-branch-2.004.patch, 
> HDFS-12620-branch-2.005.patch, HDFS-12620-branch-2.006.patch, 
> HDFS-12620-branch-2.007.patch, HDFS-12620-branch-2.008.patch, 
> HDFS-12620-branch-2.009.patch, HDFS-12620-branch-2.010.patch, 
> HDFS-12620-branch-2.011.patch, HDFS-12620-branch-2.012.patch, 
> HDFS-12620.000.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-12620:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
Target Version/s: 2.9.0, 3.0.0
  Status: Resolved  (was: Patch Available)

> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.004.patch, HDFS-10467-branch-2.patch, 
> HDFS-12620-branch-2.000.patch, HDFS-12620-branch-2.004.patch, 
> HDFS-12620-branch-2.005.patch, HDFS-12620-branch-2.006.patch, 
> HDFS-12620-branch-2.007.patch, HDFS-12620-branch-2.008.patch, 
> HDFS-12620-branch-2.009.patch, HDFS-12620-branch-2.010.patch, 
> HDFS-12620-branch-2.011.patch, HDFS-12620-branch-2.012.patch, 
> HDFS-12620.000.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212000#comment-16212000
 ] 

Íñigo Goiri edited comment on HDFS-12620 at 10/20/17 1:11 AM:
--

Thanks [~subru], [~asuresh], and [~chris.douglas] for the review.
I did the cherry pick to branch-2 and pushed the fixes.

[^HDFS-10467-branch-2.004.patch] contains the full HDFS-10467 backporting to 
branch-2 including the fixes required for branch-2.
[^HDFS-12620-branch-2.012.patch] is just the changes required for branch-2.
Finally, one of the fixes needed to be ported to trunk, so I added 
[^HDFS-12620.000.patch] to trunk and branch-3.0.

I ran the unit tests before pushing and everything passed so it should be fine.


was (Author: elgoiri):
Thanks [~subru], [~asuresh], and [~chris.douglas] for the review.
I did the cherry pick to branch-2 and pushed the fixes.

[^HDFS-10467-branch-2.004.patch] contains the full HDFS-10467 backporting to 
branch-2 including the fixes required for branch-2.
[^HDFS-12620-branch-2.012.patch] is just the changes required for branch-2.
Finally, one of the fixes needed to be ported to trunk, so I added 
[^HDFS-12620.000.patch] to trunk and branch-3.

I ran the unit tests before pushing and everything passed so it should be fine.

> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.004.patch, HDFS-10467-branch-2.patch, 
> HDFS-12620-branch-2.000.patch, HDFS-12620-branch-2.004.patch, 
> HDFS-12620-branch-2.005.patch, HDFS-12620-branch-2.006.patch, 
> HDFS-12620-branch-2.007.patch, HDFS-12620-branch-2.008.patch, 
> HDFS-12620-branch-2.009.patch, HDFS-12620-branch-2.010.patch, 
> HDFS-12620-branch-2.011.patch, HDFS-12620-branch-2.012.patch, 
> HDFS-12620.000.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-12620:
---
Attachment: HDFS-12620.000.patch

> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.004.patch, HDFS-10467-branch-2.patch, 
> HDFS-12620-branch-2.000.patch, HDFS-12620-branch-2.004.patch, 
> HDFS-12620-branch-2.005.patch, HDFS-12620-branch-2.006.patch, 
> HDFS-12620-branch-2.007.patch, HDFS-12620-branch-2.008.patch, 
> HDFS-12620-branch-2.009.patch, HDFS-12620-branch-2.010.patch, 
> HDFS-12620-branch-2.011.patch, HDFS-12620-branch-2.012.patch, 
> HDFS-12620.000.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-12620:
---
Attachment: HDFS-12620-branch-2.012.patch
HDFS-10467-branch-2.004.patch

Thanks [~subru], [~asuresh], and [~chris.douglas] for the review.
I did the cherry pick to branch-2 and pushed the fixes.

[^HDFS-10467-branch-2.004.patch] contains the full HDFS-10467 backporting to 
branch-2 including the fixes required for branch-2.
[^HDFS-12620-branch-2.012.patch] is just the changes required for branch-2.
Finally, one of the fixes needed to be ported to trunk, so I added 
[^HDFS-12620.000.patch] to trunk and branch-3.

I ran the unit tests before pushing and everything passed so it should be fine.

> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.004.patch, HDFS-10467-branch-2.patch, 
> HDFS-12620-branch-2.000.patch, HDFS-12620-branch-2.004.patch, 
> HDFS-12620-branch-2.005.patch, HDFS-12620-branch-2.006.patch, 
> HDFS-12620-branch-2.007.patch, HDFS-12620-branch-2.008.patch, 
> HDFS-12620-branch-2.009.patch, HDFS-12620-branch-2.010.patch, 
> HDFS-12620-branch-2.011.patch, HDFS-12620-branch-2.012.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory

2017-10-19 Thread Manoj Govindassamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HDFS-12544:
--
Attachment: HDFS-12544.04.patch

Attached v04 patch to address the following:
1. Handled file rename/move case for the snapshot scope directory.
2. New unit test for the file rename.
3. Added more comments in the test and snapshot manager.
4. Fixed typos pointed out by Yongjun in the previous comment.
[~yzhangal], can you please take a look at the patch?


> SnapshotDiff - support diff generation on any snapshot root descendant 
> directory
> 
>
> Key: HDFS-12544
> URL: https://issues.apache.org/jira/browse/HDFS-12544
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, 
> HDFS-12544.03.patch, HDFS-12544.04.patch
>
>
> {noformat}
> # hdfs snapshotDiff   
> 
> {noformat}
> Using snapshot diff command, we can generate a diff report between any two 
> given snapshots under a snapshot root directory. The command today only 
> accepts the path that is a snapshot root. There are many deployments where 
> the snapshot root is configured at the higher level directory but the diff 
> report needed is only for a specific directory under the snapshot root. In 
> these cases, the diff report can be filtered for changes pertaining to the 
> directory we are interested in. But when the snapshot root directory is very 
> huge, the snapshot diff report generation can take minutes even if we are 
> interested to know the changes only in a small directory. So, it would be 
> highly performant if the diff report calculation can be limited to only the 
> interesting sub-directory of the snapshot root instead of the whole snapshot 
> root.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-12620:
---
Attachment: (was: HDFS-12620-branch-2.fixes.patch)

> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, 
> HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, 
> HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, 
> HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, 
> HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12671) [READ] Test NameNode restarts when PROVIDED is configured

2017-10-19 Thread Virajith Jalaparti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HDFS-12671:
--
Summary: [READ] Test NameNode restarts when PROVIDED is configured  (was: 
[READ] Test NameNode restarts)

> [READ] Test NameNode restarts when PROVIDED is configured
> -
>
> Key: HDFS-12671
> URL: https://issues.apache.org/jira/browse/HDFS-12671
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Virajith Jalaparti
> Attachments: HDFS-12671-HDFS-9806.001.patch
>
>
> Add test case to ensure namenode restarts can be handled with provided 
> storage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory

2017-10-19 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211876#comment-16211876
 ] 

Manoj Govindassamy commented on HDFS-12544:
---

Thanks for the review comments [~yzhangal]. Good discussion on the file rename 
behavior w.r.t snapshot diff for descendant directory. Thats right, the renamed 
files still show up in the diff report as "R" entry even though they are moved 
out of the scope (descendant) directory. To get the same behavior as the normal 
snapshot diff report, these renamed files whose target is not under the scoped 
directory should be shown as "D" deleted entries in the report. Will post a new 
patch to handle this case.

> SnapshotDiff - support diff generation on any snapshot root descendant 
> directory
> 
>
> Key: HDFS-12544
> URL: https://issues.apache.org/jira/browse/HDFS-12544
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, 
> HDFS-12544.03.patch
>
>
> {noformat}
> # hdfs snapshotDiff   
> 
> {noformat}
> Using snapshot diff command, we can generate a diff report between any two 
> given snapshots under a snapshot root directory. The command today only 
> accepts the path that is a snapshot root. There are many deployments where 
> the snapshot root is configured at the higher level directory but the diff 
> report needed is only for a specific directory under the snapshot root. In 
> these cases, the diff report can be filtered for changes pertaining to the 
> directory we are interested in. But when the snapshot root directory is very 
> huge, the snapshot diff report generation can take minutes even if we are 
> interested to know the changes only in a small directory. So, it would be 
> highly performant if the diff report calculation can be limited to only the 
> interesting sub-directory of the snapshot root instead of the whole snapshot 
> root.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED

2017-10-19 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-12682:
---
Priority: Blocker  (was: Major)

> ECAdmin -listPolicies will always show policy state as DISABLED
> ---
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12682.01.patch
>
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12544) SnapshotDiff - support diff generation on any snapshot root descendant directory

2017-10-19 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211833#comment-16211833
 ] 

Yongjun Zhang commented on HDFS-12544:
--

Hi [~manojg],

One thought, about snapshotDiff to scoped dir, if we have a move operation "mv 
x y", where x is in the scoped dir, and y is not in scoped dir, but both are in 
snapshot root hierarchy. There are two issues:

# If we do snapshotDiff on snapshot root, we will get a "rename" operation, 
this helps distcp to do a "sync" without copying renamed file. With the change 
of this jira, we will lose the optimization. If the different sub-dirs are very 
independent, then we are fine. We can document this though.
# With this change, it's expected that we get a "delete" entry when doing 
snapshotDiff at the source dir and "new" entry when doing snapshotDiff at the 
target dir. Would you please confirm if this is the case?
#  If the result of 2 is not as expected, we need to have new patch rev to fix 
it. 

Thanks.





> SnapshotDiff - support diff generation on any snapshot root descendant 
> directory
> 
>
> Key: HDFS-12544
> URL: https://issues.apache.org/jira/browse/HDFS-12544
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Attachments: HDFS-12544.01.patch, HDFS-12544.02.patch, 
> HDFS-12544.03.patch
>
>
> {noformat}
> # hdfs snapshotDiff   
> 
> {noformat}
> Using snapshot diff command, we can generate a diff report between any two 
> given snapshots under a snapshot root directory. The command today only 
> accepts the path that is a snapshot root. There are many deployments where 
> the snapshot root is configured at the higher level directory but the diff 
> report needed is only for a specific directory under the snapshot root. In 
> these cases, the diff report can be filtered for changes pertaining to the 
> directory we are interested in. But when the snapshot root directory is very 
> huge, the snapshot diff report generation can take minutes even if we are 
> interested to know the changes only in a small directory. So, it would be 
> highly performant if the diff report calculation can be limited to only the 
> interesting sub-directory of the snapshot root instead of the whole snapshot 
> root.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11821) BlockManager.getMissingReplOneBlocksCount() does not report correct value if corrupt file with replication factor of 1 gets deleted

2017-10-19 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211803#comment-16211803
 ] 

Ravi Prakash commented on HDFS-11821:
-

Hi Wellington!
Thanks for your explanation. I'm sorry I've been tardy on this issue. Thank you 
for the ping. I took some time to step through the debugger and understand 
what's going on. If you set a breakpoint 
[here|https://github.com/apache/hadoop/blob/4ab0c8f96a41c573cc1f1e71c18871d243f952b9/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java#L378],
 you'll see that remove is being called with priority level 5. Hence the check 
on [line 
385|https://github.com/apache/hadoop/blob/4ab0c8f96a41c573cc1f1e71c18871d243f952b9/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java#L385]
 is failing (which would correctly allow corruptReplicationOneBlocks to be 
decremented). I just tested this small fix which doesn't increase the time 
taken to delete the blocks : 
{code}
$ git diff
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java
 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java
index 347d606a04e..e3f228d2947 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/LowRedundancyBlocks.java
@@ -365,7 +365,7 @@ boolean remove(BlockInfo block, int priLevel, int 
oldExpectedReplicas) {
   NameNode.blockStateChangeLog.debug(
   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
   " {} from priority queue {}", block, i);
-  decrementBlockStat(block, priLevel, oldExpectedReplicas);
+  decrementBlockStat(block, i, oldExpectedReplicas);
   return true;
 }
   }
{code}
Could you please test this too?

> BlockManager.getMissingReplOneBlocksCount() does not report correct value if 
> corrupt file with replication factor of 1 gets deleted
> ---
>
> Key: HDFS-11821
> URL: https://issues.apache.org/jira/browse/HDFS-11821
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.0, 3.0.0-alpha2
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HDFS-11821-1.patch, HDFS-11821-2.patch
>
>
> *BlockManager* keeps a separate metric for number of missing blocks with 
> replication factor of 1. This is returned by 
> *BlockManager.getMissingReplOneBlocksCount()* method currently, and that's 
> what is displayed on below attribute for *dfsadmin -report* (in below 
> example, there's one corrupt block that relates to a file with replication 
> factor of 1):
> {noformat}
> ...
> Missing blocks (with replication factor 1): 1
> ...
> {noformat}
> However, if the related file gets deleted, (for instance, using hdfs fsck 
> -delete option), this metric never gets updated, and *dfsadmin -report* will 
> keep reporting a missing block, even though the file does not exist anymore. 
> The only workaround available is to restart the NN, so that this metric will 
> be cleared.
> This can be easily reproduced by forcing a replication factor 1 file 
> corruption such as follows:
> 1) Put a file into hdfs with replication factor 1:
> {noformat}
> $ hdfs dfs -Ddfs.replication=1 -put test_corrupt /
> $ hdfs dfs -ls /
> -rw-r--r--   1 hdfs supergroup 19 2017-05-10 09:21 /test_corrupt
> {noformat}
> 2) Find related block for the file and delete it from DN:
> {noformat}
> $ hdfs fsck /test_corrupt -files -blocks -locations
> ...
> /test_corrupt 19 bytes, 1 block(s):  OK
> 0. BP-782213640-172.31.113.82-1494420317936:blk_1073742742_1918 len=19 
> Live_repl=1 
> [DatanodeInfoWithStorage[172.31.112.178:20002,DS-a0dc0b30-a323-4087-8c36-26ffdfe44f46,DISK]]
> Status: HEALTHY
> ...
> $ find /dfs/dn/ -name blk_1073742742*
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742_1918.meta
> $ rm -rf 
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742
> $ rm -rf 
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742_1918.meta
> {noformat}
> 3) Running fsck will report the corruption as expected:
> {noformat}
> $ hdfs fsck 

[jira] [Updated] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED

2017-10-19 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12682:
-
Attachment: HDFS-12682.01.patch

Uploading patch 1 to show the idea. It's not easy to split the metadata part to 
HDFS-12686, because the protobuf definition is changed.

Some considerations in this patch:
- Modifying protobuf definition and Public class {{ErasureCodingPolicy}} are 
not compatible. But this is fixing the previous bug, so I think the right thing 
to do here.
- For the get {{ErasureCodingPolicy}} methods, I went without state for all. 
File attributes only care about policy, not its state; file system level (e.g. 
{{hdfs ec -getPolicy -path }}) I think since we always require -path, 
it's also focusing on the policy rather than its state. If there is requirement 
to get the state of a policy without using {{-listPolices}} in the future, we 
could add new APIs for that.

Will check tests and think about coverage. Appreciate early reviews / feedback 
on the patch.

> ECAdmin -listPolicies will always show policy state as DISABLED
> ---
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12682.01.patch
>
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211784#comment-16211784
 ] 

Hadoop QA commented on HDFS-12683:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  2m 
39s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12683 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893125/HDFS-12683.04.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21754/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DFSZKFailOverController re-order logic for logging Exception
> 
>
> Key: HDFS-12683
> URL: https://issues.apache.org/jira/browse/HDFS-12683
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
> Attachments: HDFS-12683.01.patch, HDFS-12683.02.patch, 
> HDFS-12683.03.patch, HDFS-12683.04.patch
>
>
> The ZKFC should log fatal exceptions before closing the connections and 
> terminating server.
> Occasionally we have seen DFSZKFailOver shutdown, but no exception or no 
> error being logged.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception

2017-10-19 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-12683:
--
Attachment: HDFS-12683.04.patch

> DFSZKFailOverController re-order logic for logging Exception
> 
>
> Key: HDFS-12683
> URL: https://issues.apache.org/jira/browse/HDFS-12683
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
> Attachments: HDFS-12683.01.patch, HDFS-12683.02.patch, 
> HDFS-12683.03.patch, HDFS-12683.04.patch
>
>
> The ZKFC should log fatal exceptions before closing the connections and 
> terminating server.
> Occasionally we have seen DFSZKFailOver shutdown, but no exception or no 
> error being logged.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211747#comment-16211747
 ] 

Hadoop QA commented on HDFS-12683:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
11s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12683 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893119/HDFS-12683.02.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21753/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DFSZKFailOverController re-order logic for logging Exception
> 
>
> Key: HDFS-12683
> URL: https://issues.apache.org/jira/browse/HDFS-12683
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
> Attachments: HDFS-12683.01.patch, HDFS-12683.02.patch, 
> HDFS-12683.03.patch
>
>
> The ZKFC should log fatal exceptions before closing the connections and 
> terminating server.
> Occasionally we have seen DFSZKFailOver shutdown, but no exception or no 
> error being logged.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception

2017-10-19 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-12683:
--
Attachment: HDFS-12683.03.patch

> DFSZKFailOverController re-order logic for logging Exception
> 
>
> Key: HDFS-12683
> URL: https://issues.apache.org/jira/browse/HDFS-12683
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
> Attachments: HDFS-12683.01.patch, HDFS-12683.02.patch, 
> HDFS-12683.03.patch
>
>
> The ZKFC should log fatal exceptions before closing the connections and 
> terminating server.
> Occasionally we have seen DFSZKFailOver shutdown, but no exception or no 
> error being logged.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception

2017-10-19 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-12683:
--
Attachment: HDFS-12683.02.patch

> DFSZKFailOverController re-order logic for logging Exception
> 
>
> Key: HDFS-12683
> URL: https://issues.apache.org/jira/browse/HDFS-12683
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
> Attachments: HDFS-12683.01.patch, HDFS-12683.02.patch
>
>
> The ZKFC should log fatal exceptions before closing the connections and 
> terminating server.
> Occasionally we have seen DFSZKFailOver shutdown, but no exception or no 
> error being logged.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211736#comment-16211736
 ] 

Subru Krishnan commented on HDFS-12620:
---

[~elgoiri], thanks for working through this. Please go ahead with the 
cherry-picks to branch-2 and post the diff patch afterwards.

> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, 
> HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, 
> HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, 
> HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, 
> HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch, 
> HDFS-12620-branch-2.fixes.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12688) HDFS File Not Removed Despite Successful "Moved to .Trash" Message

2017-10-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211729#comment-16211729
 ] 

Jason Lowe commented on HDFS-12688:
---

Have you checked the HDFS audit logs?  That should give you clues about what is 
happening here and who is re-creating the directory.  I suspect what's 
happening here is that the job is executing asynchronously, and you're actually 
running multiple copies of the job at the same time when you run the script 
multiple times.  If the job is still running then it is going to re-create the 
output directory when its tasks need to write output.


> HDFS File Not Removed Despite Successful "Moved to .Trash" Message
> --
>
> Key: HDFS-12688
> URL: https://issues.apache.org/jira/browse/HDFS-12688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.0
>Reporter: Shriya Gupta
>Priority: Critical
>
> Wrote a simple script to delete and create a file and ran it multiple times. 
> However, some executions of the script randomly threw a FileAlreadyExists 
> error while the others succeeded despite successful hdfs dfs -rm command. The 
> script is as below, I have reproduced it in two different environments -- 
> hdfs dfs -ls  /user/shriya/shell_test/
> echo "starting hdfs remove **" 
> hdfs dfs -rm -r -f /user/shriya/shell_test/wordcountOutput
>  echo "hdfs compeleted!"
> hdfs dfs -ls  /user/shriya/shell_test/
> echo "starting mapReduce***"
> mapred job -libjars 
> /data/home/shriya/shell_test/hadoop-mapreduce-client-jobclient-2.7.1.jar 
> -submit /data/home/shriya/shell_test/wordcountJob.xml
> The message confirming successful move -- 
> 17/10/19 14:49:12 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://nameservice1/user/shriya/shell_test/wordcountOutput' to trash at: 
> hdfs://nameservice1/user/shriya/.Trash/Current/user/shriya/shell_test/wordcountOutput1508438952728
> The contents of subsequent -ls after -rm also showed that the file still 
> existed)
> The error I got when my MapReduce job tried to create the file -- 
> 17/10/19 14:50:00 WARN security.UserGroupInformation: 
> PriviledgedActionException as: (auth:KERBEROS) 
> cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
> Exception in thread "main" 
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
> at 
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:272)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:315)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211724#comment-16211724
 ] 

Hadoop QA commented on HDFS-12620:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-12620 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12620 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893115/HDFS-12620-branch-2.fixes.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21752/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, 
> HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, 
> HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, 
> HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, 
> HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch, 
> HDFS-12620-branch-2.fixes.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-12620:
---
Attachment: HDFS-12620-branch-2.fixes.patch

HDFS-12620-branch-2.fixes.patch contains the fixes after the cherry pick is 
done. This would actually be the new commit.

I can upload it again after the cherry-pick to get a clean report.

> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, 
> HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, 
> HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, 
> HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, 
> HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch, 
> HDFS-12620-branch-2.fixes.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception

2017-10-19 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-12683:
--
Attachment: HDFS-12683.01.patch

> DFSZKFailOverController re-order logic for logging Exception
> 
>
> Key: HDFS-12683
> URL: https://issues.apache.org/jira/browse/HDFS-12683
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
> Attachments: HDFS-12683.01.patch
>
>
> The ZKFC should log fatal exceptions before closing the connections and 
> terminating server.
> Occasionally we have seen DFSZKFailOver shutdown, but no exception or no 
> error being logged.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception

2017-10-19 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDFS-12683:
--
Status: Patch Available  (was: In Progress)

> DFSZKFailOverController re-order logic for logging Exception
> 
>
> Key: HDFS-12683
> URL: https://issues.apache.org/jira/browse/HDFS-12683
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
> Attachments: HDFS-12683.01.patch
>
>
> The ZKFC should log fatal exceptions before closing the connections and 
> terminating server.
> Occasionally we have seen DFSZKFailOver shutdown, but no exception or no 
> error being logged.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211677#comment-16211677
 ] 

Arun Suresh commented on HDFS-12620:


bq. I'm trying one last time with jenkins with version 011 but if it doesn't go 
through, I would stick to this.
Agreed. [~subru] / [~chris.douglas] - are you ok with it ?

> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, 
> HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, 
> HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, 
> HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, 
> HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211660#comment-16211660
 ] 

Íñigo Goiri edited comment on HDFS-12620 at 10/19/17 8:15 PM:
--

I got the following:
{code}
Total Elapsed time: 925m 35s

-1 overall
 _ _ __
|  ___|_ _(_) |_   _ _ __ ___| |
| |_ / _` | | | | | | '__/ _ \ |
|  _| (_| | | | |_| | | |  __/_|
|_|  \__,_|_|_|\__,_|_|  \___(_)



| Vote |  Subsystem |  Runtime   | Comment

|   0  |shellcheck  |  0m 0s | Shellcheck was not available.
|   0  |  findbugs  |  0m 0s | Findbugs executables are not available.
|  +1  |   @author  |  0m 0s | The patch does not contain any @author
|  ||| tags.
|  +1  |test4tests  |  0m 0s | The patch appears to include 25 new or
|  ||| modified test files.
|  +1  |mvninstall  |  5m 53s| branch-2 passed
|  +1  |   compile  |  0m 37s| branch-2 passed
|  +1  |checkstyle  |  0m 28s| branch-2 passed
|  +1  |   mvnsite  |  0m 45s| branch-2 passed
|  +1  |mvneclipse  |  0m 14s| branch-2 passed
|  +1  |   javadoc  |  0m 50s| branch-2 passed
|  +1  |mvninstall  |  0m 38s| the patch passed
|  +1  |   compile  |  0m 36s| the patch passed
|  +1  |cc  |  0m 36s| the patch passed
|  +1  | javac  |  0m 36s| the patch passed
|  -1  |checkstyle  |  0m 27s| hadoop-hdfs-project/hadoop-hdfs: The
|  ||| patch generated 11 new + 624 unchanged -
|  ||| 0 fixed = 635 total (was 624)
|  +1  |   mvnsite  |  0m 44s| the patch passed
|  +1  |mvneclipse  |  0m 11s| the patch passed
|  +1  | shelldocs  |  0m 3s | There were no new shelldocs issues.
|  +1  |whitespace  |  0m 0s | The patch has no whitespace issues.
|  +1  |   xml  |  0m 1s | The patch has no ill-formed XML file.
|  +1  |   javadoc  |  0m 54s| the patch passed
|  -1  |  unit  |  891m 25s  | hadoop-hdfs in the patch failed.
|  +1  |asflicense  |  20m 29s   | The patch does not generate ASF License
|  ||| warnings.
|  ||  925m 35s  |


 Reason | Tests
Failed junit tests  |  hadoop.hdfs.server.datanode.TestDataNodePeerMetrics
|  hadoop.hdfs.server.datanode.TestDataNodeUUID
|  
hadoop.hdfs.server.namenode.startupprogress.TestStartupProgress
|  
hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement
 Timed out junit tests  |  
org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery
|  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy
|  
org.apache.hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler
|  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
|  org.apache.hadoop.hdfs.TestRestartDFS
|  
org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
|  
org.apache.hadoop.hdfs.server.datanode.TestBlockCountersInPendingIBR
|  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration
|  
org.apache.hadoop.hdfs.server.datanode.TestReadOnlySharedStorage
|  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter
|  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetricsLogger
|  
org.apache.hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA
|  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery
|  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeFaultInjector
|  
org.apache.hadoop.hdfs.server.namenode.TestNestedEncryptionZones
|  
org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport
|  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory
|  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean
|  
org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant
|  
org.apache.hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold
|  
org.apache.hadoop.hdfs.server.datanode.TestIncrementalBlockReports
|  
org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache
|  

[jira] [Commented] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211660#comment-16211660
 ] 

Íñigo Goiri commented on HDFS-12620:


I got the following:
{code}
Total Elapsed time: 925m 35s

-1 overall
 _ _ __
|  ___|_ _(_) |_   _ _ __ ___| |
| |_ / _` | | | | | | '__/ _ \ |
|  _| (_| | | | |_| | | |  __/_|
|_|  \__,_|_|_|\__,_|_|  \___(_)



| Vote |  Subsystem |  Runtime   | Comment

|   0  |shellcheck  |  0m 0s | Shellcheck was not available.
|   0  |  findbugs  |  0m 0s | Findbugs executables are not available.
|  +1  |   @author  |  0m 0s | The patch does not contain any @author
|  ||| tags.
|  +1  |test4tests  |  0m 0s | The patch appears to include 25 new or
|  ||| modified test files.
|  +1  |mvninstall  |  5m 53s| branch-2 passed
|  +1  |   compile  |  0m 37s| branch-2 passed
|  +1  |checkstyle  |  0m 28s| branch-2 passed
|  +1  |   mvnsite  |  0m 45s| branch-2 passed
|  +1  |mvneclipse  |  0m 14s| branch-2 passed
|  +1  |   javadoc  |  0m 50s| branch-2 passed
|  +1  |mvninstall  |  0m 38s| the patch passed
|  +1  |   compile  |  0m 36s| the patch passed
|  +1  |cc  |  0m 36s| the patch passed
|  +1  | javac  |  0m 36s| the patch passed
|  -1  |checkstyle  |  0m 27s| hadoop-hdfs-project/hadoop-hdfs: The
|  ||| patch generated 11 new + 624 unchanged -
|  ||| 0 fixed = 635 total (was 624)
|  +1  |   mvnsite  |  0m 44s| the patch passed
|  +1  |mvneclipse  |  0m 11s| the patch passed
|  +1  | shelldocs  |  0m 3s | There were no new shelldocs issues.
|  +1  |whitespace  |  0m 0s | The patch has no whitespace issues.
|  +1  |   xml  |  0m 1s | The patch has no ill-formed XML file.
|  +1  |   javadoc  |  0m 54s| the patch passed
|  -1  |  unit  |  891m 25s  | hadoop-hdfs in the patch failed.
|  +1  |asflicense  |  20m 29s   | The patch does not generate ASF License
|  ||| warnings.
|  ||  925m 35s  |


 Reason | Tests
Failed junit tests  |  hadoop.hdfs.server.datanode.TestDataNodePeerMetrics
|  hadoop.hdfs.server.datanode.TestDataNodeUUID
|  
hadoop.hdfs.server.namenode.startupprogress.TestStartupProgress
|  
hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement
 Timed out junit tests  |  
org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery
|  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy
|  
org.apache.hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler
|  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
|  org.apache.hadoop.hdfs.TestRestartDFS
|  
org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
|  
org.apache.hadoop.hdfs.server.datanode.TestBlockCountersInPendingIBR
|  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration
|  
org.apache.hadoop.hdfs.server.datanode.TestReadOnlySharedStorage
|  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter
|  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetricsLogger
|  
org.apache.hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA
|  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery
|  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeFaultInjector
|  
org.apache.hadoop.hdfs.server.namenode.TestNestedEncryptionZones
|  
org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport
|  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory
|  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean
|  
org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant
|  
org.apache.hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold
|  
org.apache.hadoop.hdfs.server.datanode.TestIncrementalBlockReports
|  
org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache
|  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing
|  

[jira] [Updated] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12665:
--
Parent Issue: HDFS-9806  (was: HDFS-12090)

> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-12665-HDFS-9806.001.patch
>
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12680) Ozone: SCM: Lease support for container creation

2017-10-19 Thread Nanda kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211639#comment-16211639
 ] 

Nanda kumar commented on HDFS-12680:


HDFS-12689 is created to track the deletion of containers in {{DELETING}} state 
in {{ContainerStateManager}}

> Ozone: SCM: Lease support for container creation
> 
>
> Key: HDFS-12680
> URL: https://issues.apache.org/jira/browse/HDFS-12680
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>  Labels: ozoneMerge
> Attachments: HDFS-12680-HDFS-7240.000.patch, 
> HDFS-12680-HDFS-7240.001.patch
>
>
> This brings in lease support for container creation.
> Lease should be give for a container that is moved to {{CREATING}} state when 
> {{BEGIN_CREATE}} event happens, {{LeaseException}} should be thrown if the 
> container already holds a lease. Lease must be released during 
> {{COMPLETE_CREATE}} event. If the lease times out container should be moved 
> to {{DELETING}} state, and exception should be thrown if {{COMPLETE_CREATE}} 
> event is received on that container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12689) Ozone: SCM: Clean up of container in DELETING state

2017-10-19 Thread Nanda kumar (JIRA)
Nanda kumar created HDFS-12689:
--

 Summary: Ozone: SCM: Clean up of container in DELETING state
 Key: HDFS-12689
 URL: https://issues.apache.org/jira/browse/HDFS-12689
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Nanda kumar


When creating container times out, the container is moved to {{DELETING}} 
state. Once the container is in DELETING state {{ContainerStateManager}} should 
do cleanup and delete the containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12663) Ozone: OzoneClient: Remove protobuf classes exposed to clients through OzoneBucket

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211630#comment-16211630
 ] 

Hadoop QA commented on HDFS-12663:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
11s{color} | {color:red} Docker failed to build yetus/hadoop:71bbb86. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12663 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893101/HDFS-12663-HDFS-7240.002.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21750/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: OzoneClient: Remove protobuf classes exposed to clients through 
> OzoneBucket
> --
>
> Key: HDFS-12663
> URL: https://issues.apache.org/jira/browse/HDFS-12663
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>  Labels: ozoneMerge
> Attachments: HDFS-12663-HDFS-7240.000.patch, 
> HDFS-12663-HDFS-7240.001.patch, HDFS-12663-HDFS-7240.002.patch
>
>
> As part of {{OzoneBucket#createKey}} we are currently exposing protobuf enums 
> {{OzoneProtos.ReplicationType}} and {{OzoneProtos.ReplicationFactor}} through 
> client, this can be replaced with client side enums and conversion can be 
> done internally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12663) Ozone: OzoneClient: Remove protobuf classes exposed to clients through OzoneBucket

2017-10-19 Thread Nanda kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDFS-12663:
---
Attachment: HDFS-12663-HDFS-7240.002.patch

> Ozone: OzoneClient: Remove protobuf classes exposed to clients through 
> OzoneBucket
> --
>
> Key: HDFS-12663
> URL: https://issues.apache.org/jira/browse/HDFS-12663
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>  Labels: ozoneMerge
> Attachments: HDFS-12663-HDFS-7240.000.patch, 
> HDFS-12663-HDFS-7240.001.patch, HDFS-12663-HDFS-7240.002.patch
>
>
> As part of {{OzoneBucket#createKey}} we are currently exposing protobuf enums 
> {{OzoneProtos.ReplicationType}} and {{OzoneProtos.ReplicationFactor}} through 
> client, this can be replaced with client side enums and conversion can be 
> done internally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-9808) Combine READ_ONLY_SHARED DatanodeStorages with the same ID

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs resolved HDFS-9808.
--
Resolution: Won't Fix

This was part of HDFS-11190.

> Combine READ_ONLY_SHARED DatanodeStorages with the same ID
> --
>
> Key: HDFS-9808
> URL: https://issues.apache.org/jira/browse/HDFS-9808
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chris Douglas
>
> In HDFS-5318, each datanode that can reach a (read only) block reports itself 
> as a valid location for the block. While accurate, this increases (redundant) 
> block report traffic and- without partitioning on the backend- may return an 
> overwhelming number of replica locations for each block.
> Instead, a DN could report only that the shared storage is reachable. The 
> contents of the storage could be reported separately/synthetically to the 
> block manager, which can collapse all instances into a single storage. A 
> subset of locations- closest to the client, etc.- can be returned, rather 
> than all possible locations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12688) HDFS File Not Removed Despite Successful "Moved to .Trash" Message

2017-10-19 Thread Shriya Gupta (JIRA)
Shriya Gupta created HDFS-12688:
---

 Summary: HDFS File Not Removed Despite Successful "Moved to 
.Trash" Message
 Key: HDFS-12688
 URL: https://issues.apache.org/jira/browse/HDFS-12688
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.6.0
Reporter: Shriya Gupta
Priority: Critical


Wrote a simple script to delete and create a file and ran it multiple times. 
However, some executions of the script randomly threw a FileAlreadyExists error 
while the others succeeded despite successful hdfs dfs -rm command. The script 
is as below, I have reproduced it in two different environments -- 

hdfs dfs -ls  /user/shriya/shell_test/
echo "starting hdfs remove **" 
hdfs dfs -rm -r -f /user/shriya/shell_test/wordcountOutput
 echo "hdfs compeleted!"
hdfs dfs -ls  /user/shriya/shell_test/
echo "starting mapReduce***"
mapred job -libjars 
/data/home/shriya/shell_test/hadoop-mapreduce-client-jobclient-2.7.1.jar 
-submit /data/home/shriya/shell_test/wordcountJob.xml

The message confirming successful move -- 

17/10/19 14:49:12 INFO fs.TrashPolicyDefault: Moved: 
'hdfs://nameservice1/user/shriya/shell_test/wordcountOutput' to trash at: 
hdfs://nameservice1/user/shriya/.Trash/Current/user/shriya/shell_test/wordcountOutput1508438952728

The contents of subsequent -ls after -rm also showed that the file still 
existed)

The error I got when my MapReduce job tried to create the file -- 

17/10/19 14:50:00 WARN security.UserGroupInformation: 
PriviledgedActionException as: (auth:KERBEROS) 
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: 
Output directory hdfs://nameservice1/user/shriya/shell_test/wordcountOutput 
already exists
at 
org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
at 
org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:272)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12675) Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails

2017-10-19 Thread Nanda kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211564#comment-16211564
 ] 

Nanda kumar commented on HDFS-12675:


Thanks for the contribution [~linyiqun]. I have committed the code to feature 
branch.

> Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails 
> --
>
> Key: HDFS-12675
> URL: https://issues.apache.org/jira/browse/HDFS-12675
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-12675-HDFS-7240.001.patch
>
>
> Caught one UT failure 
> {{TestLeaseManager#testLeaseCallbackWithMultipleLeases}}:
> {noformat}
> jcava.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.ozone.lease.TestLeaseManager.testLeaseCallbackWithMultipleLeases(TestLeaseManager.java:293)
> {noformat}
> The reason of this error is  lease {{leaseFive}} didn't expire in test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12675) Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails

2017-10-19 Thread Nanda kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDFS-12675:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails 
> --
>
> Key: HDFS-12675
> URL: https://issues.apache.org/jira/browse/HDFS-12675
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-12675-HDFS-7240.001.patch
>
>
> Caught one UT failure 
> {{TestLeaseManager#testLeaseCallbackWithMultipleLeases}}:
> {noformat}
> jcava.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.ozone.lease.TestLeaseManager.testLeaseCallbackWithMultipleLeases(TestLeaseManager.java:293)
> {noformat}
> The reason of this error is  lease {{leaseFive}} didn't expire in test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12675) Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails

2017-10-19 Thread Nanda kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211551#comment-16211551
 ] 

Nanda kumar commented on HDFS-12675:


Thanks [~linyiqun] for working on this. +1 the patch looks good to me, will 
commit it shortly.

> Ozone: TestLeaseManager#testLeaseCallbackWithMultipleLeases fails 
> --
>
> Key: HDFS-12675
> URL: https://issues.apache.org/jira/browse/HDFS-12675
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-12675-HDFS-7240.001.patch
>
>
> Caught one UT failure 
> {{TestLeaseManager#testLeaseCallbackWithMultipleLeases}}:
> {noformat}
> jcava.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.ozone.lease.TestLeaseManager.testLeaseCallbackWithMultipleLeases(TestLeaseManager.java:293)
> {noformat}
> The reason of this error is  lease {{leaseFive}} didn't expire in test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12482) Provide a configuration to adjust the weight of EC recovery tasks to adjust the speed of recovery

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211538#comment-16211538
 ] 

Hadoop QA commented on HDFS-12482:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  2m 
51s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12482 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893089/HDFS-12482.01.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21749/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Provide a configuration to adjust the weight of EC recovery tasks to adjust 
> the speed of recovery
> -
>
> Key: HDFS-12482
> URL: https://issues.apache.org/jira/browse/HDFS-12482
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha4
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-12482.00.patch, HDFS-12482.01.patch
>
>
> The relative speed of EC recovery comparing to 3x replica recovery is a 
> function of (EC codec, number of sources, NIC speed, and CPU speed, and etc). 
> Currently the EC recovery has a fixed {{xmitsInProgress}} of {{max(# of 
> sources, # of targets)}} comparing to {{1}} for 3x replica recovery, and NN 
> uses {{xmitsInProgress}} to decide how much recovery tasks to schedule to the 
> DataNode this we can add a coefficient for user to tune the weight of EC 
> recovery tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12683) DFSZKFailOverController re-order logic for logging Exception

2017-10-19 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12683:
-
Description: 
The ZKFC should log fatal exceptions before closing the connections and 
terminating server.

Occasionally we have seen DFSZKFailOver shutdown, but no exception or no error 
being logged.

  was:
Log the exception before closing the connections and terminating server.

As some times occasionally we have seen DFSZKFailOver shutdown, but no 
exception or no error being logged.


> DFSZKFailOverController re-order logic for logging Exception
> 
>
> Key: HDFS-12683
> URL: https://issues.apache.org/jira/browse/HDFS-12683
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>
> The ZKFC should log fatal exceptions before closing the connections and 
> terminating server.
> Occasionally we have seen DFSZKFailOver shutdown, but no exception or no 
> error being logged.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12482) Provide a configuration to adjust the weight of EC recovery tasks to adjust the speed of recovery

2017-10-19 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211500#comment-16211500
 ] 

Lei (Eddy) Xu edited comment on HDFS-12482 at 10/19/17 6:25 PM:


Thanks for the reviews, [~andrew.wang]

Updated the patch to add documents.  Empirically, an value between {{(0, 1.0]}} 
seems can achieve similar recovery speed and network / cpu overhead between 
replicated blocks and ec block recovery _on my testing cluster_. But it highly 
depends on HW.  I will set this value to {{0.5}} initially in this patch.

[~xiaochen], [~manojg] mind to give a review?


was (Author: eddyxu):
Thanks for the reviews, [~andrew.wang]

Updated the patch to add documents.  Empirically, an value between {{(0, 1.0]}} 
seems can achieve similar recovery speed and network / cpu overhead between 
replicated blocks and ec block recovery _on my testing cluster_. But it highly 
depends on HW.  I will set this value to {{0.5}} initially in this patch.


> Provide a configuration to adjust the weight of EC recovery tasks to adjust 
> the speed of recovery
> -
>
> Key: HDFS-12482
> URL: https://issues.apache.org/jira/browse/HDFS-12482
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha4
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-12482.00.patch, HDFS-12482.01.patch
>
>
> The relative speed of EC recovery comparing to 3x replica recovery is a 
> function of (EC codec, number of sources, NIC speed, and CPU speed, and etc). 
> Currently the EC recovery has a fixed {{xmitsInProgress}} of {{max(# of 
> sources, # of targets)}} comparing to {{1}} for 3x replica recovery, and NN 
> uses {{xmitsInProgress}} to decide how much recovery tasks to schedule to the 
> DataNode this we can add a coefficient for user to tune the weight of EC 
> recovery tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12482) Provide a configuration to adjust the weight of EC recovery tasks to adjust the speed of recovery

2017-10-19 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-12482:
-
Attachment: HDFS-12482.01.patch

Thanks for the reviews, [~andrew.wang]

Updated the patch to add documents.  Empirically, an value between {{(0, 1.0]}} 
seems can achieve similar recovery speed and network / cpu overhead between 
replicated blocks and ec block recovery _on my testing cluster_. But it highly 
depends on HW.  I will set this value to {{0.5}} initially in this patch.


> Provide a configuration to adjust the weight of EC recovery tasks to adjust 
> the speed of recovery
> -
>
> Key: HDFS-12482
> URL: https://issues.apache.org/jira/browse/HDFS-12482
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha4
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-12482.00.patch, HDFS-12482.01.patch
>
>
> The relative speed of EC recovery comparing to 3x replica recovery is a 
> function of (EC codec, number of sources, NIC speed, and CPU speed, and etc). 
> Currently the EC recovery has a fixed {{xmitsInProgress}} of {{max(# of 
> sources, # of targets)}} comparing to {{1}} for 3x replica recovery, and NN 
> uses {{xmitsInProgress}} to decide how much recovery tasks to schedule to the 
> DataNode this we can add a coefficient for user to tune the weight of EC 
> recovery tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.

2017-10-19 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211484#comment-16211484
 ] 

Xiao Chen commented on HDFS-12667:
--

bq. will lock individual queue.
Sounds good to me. Thanks for the heads up guys. :)

> KMSClientProvider#ValueQueue does synchronous fetch of edeks in background 
> async thread.
> 
>
> Key: HDFS-12667
> URL: https://issues.apache.org/jira/browse/HDFS-12667
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, kms
>Affects Versions: 3.0.0-alpha4
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>
> There are couple of issues in KMSClientProvider#ValueQueue.
> 1.
>  {code:title=ValueQueue.java|borderStyle=solid}
>   private final LoadingCache keyQueues;
>   // Stripped rwlocks based on key name to synchronize the queue from
>   // the sync'ed rw-thread and the background async refill thread.
>   private final List lockArray =
>   new ArrayList<>(LOCK_ARRAY_SIZE);
> {code}
> It hashes the key name into 16 buckets.
> In the code chunk below,
>  {code:title=ValueQueue.java|borderStyle=solid}
> public List getAtMost(String keyName, int num) throws IOException,
>   ExecutionException {
>  ...
>  ...
>  readLock(keyName);
> E val = keyQueue.poll();
> readUnlock(keyName);
>  ...
>   }
>   private void submitRefillTask(final String keyName,
>   final Queue keyQueue) throws InterruptedException {
>   ...
>   ...
>   writeLock(keyName); // It holds the write lock while the key is 
> being asynchronously fetched. So the read requests for all the keys that 
> hashes to this bucket will essentially be blocked.
>   try {
> if (keyQueue.size() < threshold && !isCanceled()) {
>   refiller.fillQueueForKey(name, keyQueue,
>   cacheSize - keyQueue.size());
> }
>  ...
>   } finally {
> writeUnlock(keyName);
>   }
> }
>   }
> {code}
> According to above code chunk, if two keys (lets say key1 and key2) hashes to 
> the same bucket (between 1 and 16), then if key1 is asynchronously being 
> refetched then all the getKey for key2 will be blocked.
> 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now 
> synchronous to other handler threads.
> I understand that locks were added so that we don't kick off multiple 
> asynchronous refilling thread for the same key.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED

2017-10-19 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211473#comment-16211473
 ] 

Xiao Chen commented on HDFS-12682:
--

Thanks for the response Sammi, good find on HDFS-12686! 

I propose we fix the problem by:
- Remove the state from {{ErasureCodingPolicy}}. The motivation is, 
{{ErasureCodingPolicy}} is returned with {{HdfsFileStatus}}, which impacts all 
clients listing hdfs. We want to make it as lightweight as possible, and keep 
Andrew's work on HDFS-11565 for performance.
- Add a new class {{ErasureCodingPolicyInfo}} (or whatever name people feel 
intuitive), that contains the policy and its state. This will be used by the 
ECAdmin-purpose APIs, as well as internally HDFS persistency.

Will prepare a patch toward this direction for demonstration. If you or any 
watchers have concerns, please feel free to speak up.

> ECAdmin -listPolicies will always show policy state as DISABLED
> ---
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>  Labels: hdfs-ec-3.0-must-do
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.

2017-10-19 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211463#comment-16211463
 ] 

Rushabh S Shah edited comment on HDFS-12667 at 10/19/17 6:06 PM:
-

bq. Sorry for breaking this, I'll investigate about the locking as well.
hi [~xiaochen], thanks for commenting.
We (I and Daryn) have an implementation idea which will remove the stripped 
locking and will lock individual queue.
Give me couple of days to post 1st draft.
Just to clarify I am just working on 1st point and not the second.


was (Author: shahrs87):
bq. Sorry for breaking this, I'll investigate about the locking as well.
hi [~xiaochen], thanks for commenting.
We (I and Daryn) have an implementation idea which will remove the stripped 
locking and will lock individual queue.
Give me couple of days to post 1st draft.

> KMSClientProvider#ValueQueue does synchronous fetch of edeks in background 
> async thread.
> 
>
> Key: HDFS-12667
> URL: https://issues.apache.org/jira/browse/HDFS-12667
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, kms
>Affects Versions: 3.0.0-alpha4
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>
> There are couple of issues in KMSClientProvider#ValueQueue.
> 1.
>  {code:title=ValueQueue.java|borderStyle=solid}
>   private final LoadingCache keyQueues;
>   // Stripped rwlocks based on key name to synchronize the queue from
>   // the sync'ed rw-thread and the background async refill thread.
>   private final List lockArray =
>   new ArrayList<>(LOCK_ARRAY_SIZE);
> {code}
> It hashes the key name into 16 buckets.
> In the code chunk below,
>  {code:title=ValueQueue.java|borderStyle=solid}
> public List getAtMost(String keyName, int num) throws IOException,
>   ExecutionException {
>  ...
>  ...
>  readLock(keyName);
> E val = keyQueue.poll();
> readUnlock(keyName);
>  ...
>   }
>   private void submitRefillTask(final String keyName,
>   final Queue keyQueue) throws InterruptedException {
>   ...
>   ...
>   writeLock(keyName); // It holds the write lock while the key is 
> being asynchronously fetched. So the read requests for all the keys that 
> hashes to this bucket will essentially be blocked.
>   try {
> if (keyQueue.size() < threshold && !isCanceled()) {
>   refiller.fillQueueForKey(name, keyQueue,
>   cacheSize - keyQueue.size());
> }
>  ...
>   } finally {
> writeUnlock(keyName);
>   }
> }
>   }
> {code}
> According to above code chunk, if two keys (lets say key1 and key2) hashes to 
> the same bucket (between 1 and 16), then if key1 is asynchronously being 
> refetched then all the getKey for key2 will be blocked.
> 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now 
> synchronous to other handler threads.
> I understand that locks were added so that we don't kick off multiple 
> asynchronous refilling thread for the same key.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.

2017-10-19 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211464#comment-16211464
 ] 

Daryn Sharp commented on HDFS-12667:


I was originally going to comment on another key roll occurring during the 
re-encrypt but deleted it because I already wrote too much. :).  The inability 
to numerically compare is indeed unfortunate because I too thought we could 
take advantage of a version check.

> KMSClientProvider#ValueQueue does synchronous fetch of edeks in background 
> async thread.
> 
>
> Key: HDFS-12667
> URL: https://issues.apache.org/jira/browse/HDFS-12667
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, kms
>Affects Versions: 3.0.0-alpha4
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>
> There are couple of issues in KMSClientProvider#ValueQueue.
> 1.
>  {code:title=ValueQueue.java|borderStyle=solid}
>   private final LoadingCache keyQueues;
>   // Stripped rwlocks based on key name to synchronize the queue from
>   // the sync'ed rw-thread and the background async refill thread.
>   private final List lockArray =
>   new ArrayList<>(LOCK_ARRAY_SIZE);
> {code}
> It hashes the key name into 16 buckets.
> In the code chunk below,
>  {code:title=ValueQueue.java|borderStyle=solid}
> public List getAtMost(String keyName, int num) throws IOException,
>   ExecutionException {
>  ...
>  ...
>  readLock(keyName);
> E val = keyQueue.poll();
> readUnlock(keyName);
>  ...
>   }
>   private void submitRefillTask(final String keyName,
>   final Queue keyQueue) throws InterruptedException {
>   ...
>   ...
>   writeLock(keyName); // It holds the write lock while the key is 
> being asynchronously fetched. So the read requests for all the keys that 
> hashes to this bucket will essentially be blocked.
>   try {
> if (keyQueue.size() < threshold && !isCanceled()) {
>   refiller.fillQueueForKey(name, keyQueue,
>   cacheSize - keyQueue.size());
> }
>  ...
>   } finally {
> writeUnlock(keyName);
>   }
> }
>   }
> {code}
> According to above code chunk, if two keys (lets say key1 and key2) hashes to 
> the same bucket (between 1 and 16), then if key1 is asynchronously being 
> refetched then all the getKey for key2 will be blocked.
> 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now 
> synchronous to other handler threads.
> I understand that locks were added so that we don't kick off multiple 
> asynchronous refilling thread for the same key.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.

2017-10-19 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211463#comment-16211463
 ] 

Rushabh S Shah commented on HDFS-12667:
---

bq. Sorry for breaking this, I'll investigate about the locking as well.
hi [~xiaochen], thanks for commenting.
We (I and Daryn) have an implementation idea which will remove the stripped 
locking and will lock individual queue.
Give me couple of days to post 1st draft.

> KMSClientProvider#ValueQueue does synchronous fetch of edeks in background 
> async thread.
> 
>
> Key: HDFS-12667
> URL: https://issues.apache.org/jira/browse/HDFS-12667
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, kms
>Affects Versions: 3.0.0-alpha4
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>
> There are couple of issues in KMSClientProvider#ValueQueue.
> 1.
>  {code:title=ValueQueue.java|borderStyle=solid}
>   private final LoadingCache keyQueues;
>   // Stripped rwlocks based on key name to synchronize the queue from
>   // the sync'ed rw-thread and the background async refill thread.
>   private final List lockArray =
>   new ArrayList<>(LOCK_ARRAY_SIZE);
> {code}
> It hashes the key name into 16 buckets.
> In the code chunk below,
>  {code:title=ValueQueue.java|borderStyle=solid}
> public List getAtMost(String keyName, int num) throws IOException,
>   ExecutionException {
>  ...
>  ...
>  readLock(keyName);
> E val = keyQueue.poll();
> readUnlock(keyName);
>  ...
>   }
>   private void submitRefillTask(final String keyName,
>   final Queue keyQueue) throws InterruptedException {
>   ...
>   ...
>   writeLock(keyName); // It holds the write lock while the key is 
> being asynchronously fetched. So the read requests for all the keys that 
> hashes to this bucket will essentially be blocked.
>   try {
> if (keyQueue.size() < threshold && !isCanceled()) {
>   refiller.fillQueueForKey(name, keyQueue,
>   cacheSize - keyQueue.size());
> }
>  ...
>   } finally {
> writeUnlock(keyName);
>   }
> }
>   }
> {code}
> According to above code chunk, if two keys (lets say key1 and key2) hashes to 
> the same bucket (between 1 and 16), then if key1 is asynchronously being 
> refetched then all the getKey for key2 will be blocked.
> 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now 
> synchronous to other handler threads.
> I understand that locks were added so that we don't kick off multiple 
> asynchronous refilling thread for the same key.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics

2017-10-19 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211436#comment-16211436
 ] 

Eric Yang commented on HDFS-12684:
--

[~cheersyang] Would it make more sense to remove node manager metrics removed 
from HDFS project instead of StorageContainerManager node count?  I am not sure 
if this is an overlap of YARN terminology with something in HDFS.  It would be 
nice to keep NodeManager as a YARN terminology.

> Ozone: SCM metrics NodeCount is overlapping with node manager metrics
> -
>
> Key: HDFS-12684
> URL: https://issues.apache.org/jira/browse/HDFS-12684
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: HDFS-12684-HDFS-7240.001.patch
>
>
> I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
> both SCM and SCMNodeManager has {{NodeCount}} metrics
> {noformat}
>  {
> "name" : 
> "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
> "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
> "ClientRpcPort" : "9860",
> "DatanodeRpcPort" : "9861",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "CompileInfo" : "2017-10-17T06:47Z xxx",
> "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
> "SoftwareVersion" : "3.1.0-SNAPSHOT",
> "StartedTimeInMillis" : 1508393551065
>   }, {
> "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
> "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "OutOfChillMode" : false,
> "MinimumChillModeNodes" : 1,
> "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 
> 0 nodes reported, minimal 1 nodes required."
>   }
> {noformat}
> hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.

2017-10-19 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211418#comment-16211418
 ] 

Xiao Chen commented on HDFS-12667:
--

Thanks [~daryn] for elaborating. I agree the implementation can cause the 
problem. Updated the link to HDFS-11210 as a broken by.

Sorry for breaking this, I'll investigate about the locking as well.

Just want to extend the discussion on the second point: It's theoretically true 
that a create may release the lock, spend significant amount of time during 
generate, long enough that it only comes back after the re-encryption is issued 
and has gone past this file...
In that case it sounds like we have to check the returned edeks with 
re-encryption (if any), after the create op reacquire that lock, right? That's 
a pretty head scratching scenario - generate again may solve it, but 
keyversion, being a String, can only be compared in an equalsTo fashion (rather 
than greaterThan / lessThan), so if someone roll the key on the KMS again 
during re-encryption (say v1->v2, and the re-encryption was submitted for the 
v0->v1 roll), every create after that point will have to generate twice - 
because re-encrypt isn't aware of the roll and still compares with v1, while 
the new creates are on v2, which for equalsTo comparison is indifferent than v0 
v.s. v1.



> KMSClientProvider#ValueQueue does synchronous fetch of edeks in background 
> async thread.
> 
>
> Key: HDFS-12667
> URL: https://issues.apache.org/jira/browse/HDFS-12667
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, kms
>Affects Versions: 3.0.0-alpha4
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>
> There are couple of issues in KMSClientProvider#ValueQueue.
> 1.
>  {code:title=ValueQueue.java|borderStyle=solid}
>   private final LoadingCache keyQueues;
>   // Stripped rwlocks based on key name to synchronize the queue from
>   // the sync'ed rw-thread and the background async refill thread.
>   private final List lockArray =
>   new ArrayList<>(LOCK_ARRAY_SIZE);
> {code}
> It hashes the key name into 16 buckets.
> In the code chunk below,
>  {code:title=ValueQueue.java|borderStyle=solid}
> public List getAtMost(String keyName, int num) throws IOException,
>   ExecutionException {
>  ...
>  ...
>  readLock(keyName);
> E val = keyQueue.poll();
> readUnlock(keyName);
>  ...
>   }
>   private void submitRefillTask(final String keyName,
>   final Queue keyQueue) throws InterruptedException {
>   ...
>   ...
>   writeLock(keyName); // It holds the write lock while the key is 
> being asynchronously fetched. So the read requests for all the keys that 
> hashes to this bucket will essentially be blocked.
>   try {
> if (keyQueue.size() < threshold && !isCanceled()) {
>   refiller.fillQueueForKey(name, keyQueue,
>   cacheSize - keyQueue.size());
> }
>  ...
>   } finally {
> writeUnlock(keyName);
>   }
> }
>   }
> {code}
> According to above code chunk, if two keys (lets say key1 and key2) hashes to 
> the same bucket (between 1 and 16), then if key1 is asynchronously being 
> refetched then all the getKey for key2 will be blocked.
> 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now 
> synchronous to other handler threads.
> I understand that locks were added so that we don't kick off multiple 
> asynchronous refilling thread for the same key.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12638) NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets

2017-10-19 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211413#comment-16211413
 ] 

Daryn Sharp commented on HDFS-12638:


bq. Yes, I think our code should bear with such orphan blocks, instead of 
failing the NN with NPE like this. At least.
See below, they aren't really orphaned.  I think it's correct for the NN to 
crash if the namesystem data structures are corrupted.

bq. I assume when the snapshot gets deleted, these blocks will be also removed 
from the blocks map. But before that, we need to live with such orphaned blocks
To the block manager, replication monitor, etc these copy-on-truncate blocks 
are not (supposed to be) special.  My prior point stated another way is the 
block is not orphaned if it's in a snapshot diff.  INodes are not orphaned when 
only referenced via a snapshot diff.  A block in the blocks map should not be 
referencing an inode not in the inodes map.  Direct namespace accessibility is 
irrelevant to the block/inode/map linkages being correct.

We need to fix the bug, not mask it.

> NameNode exits due to ReplicationMonitor thread received Runtime exception in 
> ReplicationWork#chooseTargets
> ---
>
> Key: HDFS-12638
> URL: https://issues.apache.org/jira/browse/HDFS-12638
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Jiandan Yang 
> Attachments: HDFS-12638-branch-2.8.2.001.patch
>
>
> Active NamNode exit due to NPE, I can confirm that the BlockCollection passed 
> in when creating ReplicationWork is null, but I do not know why 
> BlockCollection is null, By view history I found 
> [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging  
> whether  BlockCollection is null.
> NN logs are as following:
> {code:java}
> 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> ReplicationMonitor thread received Runtime exception.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744)
> at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart

2017-10-19 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-12686:
-
Priority: Critical  (was: Major)

> Erasure coding system policy state is not correctly saved and loaded during 
> real cluster restart
> 
>
> Key: HDFS-12686
> URL: https://issues.apache.org/jira/browse/HDFS-12686
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Critical
>  Labels: hdfs-ec-3.0-must-do
>
> Inspired by HDFS-12682,  I found the system erasure coding policy state will  
> not  be correctly saved and loaded in a real cluster.  Through there are such 
> kind of unit tests and all are passed with MiniCluster. It's because the 
> MiniCluster keeps the same static system erasure coding policy object after 
> the NN restart operation. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12680) Ozone: SCM: Lease support for container creation

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211373#comment-16211373
 ] 

Hadoop QA commented on HDFS-12680:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
15s{color} | {color:red} Docker failed to build yetus/hadoop:71bbb86. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12680 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893072/HDFS-12680-HDFS-7240.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21748/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: SCM: Lease support for container creation
> 
>
> Key: HDFS-12680
> URL: https://issues.apache.org/jira/browse/HDFS-12680
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>  Labels: ozoneMerge
> Attachments: HDFS-12680-HDFS-7240.000.patch, 
> HDFS-12680-HDFS-7240.001.patch
>
>
> This brings in lease support for container creation.
> Lease should be give for a container that is moved to {{CREATING}} state when 
> {{BEGIN_CREATE}} event happens, {{LeaseException}} should be thrown if the 
> container already holds a lease. Lease must be released during 
> {{COMPLETE_CREATE}} event. If the lease times out container should be moved 
> to {{DELETING}} state, and exception should be thrown if {{COMPLETE_CREATE}} 
> event is received on that container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12680) Ozone: SCM: Lease support for container creation

2017-10-19 Thread Nanda kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211372#comment-16211372
 ] 

Nanda kumar commented on HDFS-12680:


Thanks [~anu] & [~linyiqun] for the review. Please find my response below, also 
updated the patch (v001) based on the review comments.

bq. Not that it is going to make any difference, just want to make sure that 
this is a conscious decision.
Yeah, this was a conscious decision. If we don’t move the container to 
{{CREATING}} state during allocate block, we might give same container to 
multiple clients with create flag set. This will again bring in the issue of 
two clients trying to create same container.
bq. Is there a use case where this is needed.
This change was required because the state can change for a container but that 
should not result in different hashCode for the same container. In particular 
when we acquire a lease, LeaseManager uses the resource as key in a HashMap 
(ContainerInfo is the resource here) and while trying to release (after state 
change) if we don’t get a same hashCode we will not be able to find it. Since 
state is used in {{ContainerInfo#equals()}} we should not get any unexpected 
behavior.
bq.  should we add this to ozone-conf.xml?
done
bq. but what happens after the timeOut (OzoneProtos.LifeCycleEvent.TIMEOUT)
The container is moved to {{DELETING}} state, we have to do clean up after 
that. I will create another jira to track it.
bq. conf.setInt? 
done
bq. It would be better to define a var TIMEOUT=1 and reuse this var in the 
test method.
done
bq. We should increase sleep time
done
bq. Can you add an additional check
Since {{LeaseNotFoundException}} is wrapped with {{IOException}}, added 
{{thrown.expect(IOException.class)}}
bq. The following lines don't executes in test.
Fixed

> Ozone: SCM: Lease support for container creation
> 
>
> Key: HDFS-12680
> URL: https://issues.apache.org/jira/browse/HDFS-12680
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>  Labels: ozoneMerge
> Attachments: HDFS-12680-HDFS-7240.000.patch, 
> HDFS-12680-HDFS-7240.001.patch
>
>
> This brings in lease support for container creation.
> Lease should be give for a container that is moved to {{CREATING}} state when 
> {{BEGIN_CREATE}} event happens, {{LeaseException}} should be thrown if the 
> container already holds a lease. Lease must be released during 
> {{COMPLETE_CREATE}} event. If the lease times out container should be moved 
> to {{DELETING}} state, and exception should be thrown if {{COMPLETE_CREATE}} 
> event is received on that container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12680) Ozone: SCM: Lease support for container creation

2017-10-19 Thread Nanda kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDFS-12680:
---
Attachment: HDFS-12680-HDFS-7240.001.patch

> Ozone: SCM: Lease support for container creation
> 
>
> Key: HDFS-12680
> URL: https://issues.apache.org/jira/browse/HDFS-12680
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>  Labels: ozoneMerge
> Attachments: HDFS-12680-HDFS-7240.000.patch, 
> HDFS-12680-HDFS-7240.001.patch
>
>
> This brings in lease support for container creation.
> Lease should be give for a container that is moved to {{CREATING}} state when 
> {{BEGIN_CREATE}} event happens, {{LeaseException}} should be thrown if the 
> container already holds a lease. Lease must be released during 
> {{COMPLETE_CREATE}} event. If the lease times out container should be moved 
> to {{DELETING}} state, and exception should be thrown if {{COMPLETE_CREATE}} 
> event is received on that container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12619) Do not catch and throw unchecked exceptions if IBRs fail to process

2017-10-19 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211355#comment-16211355
 ] 

Xiao Chen commented on HDFS-12619:
--

Thanks [~jojochuang], trunk maps to 3.1.0 now. I think you should also 
cherry-pick to branch-3.0. :)

> Do not catch and throw unchecked exceptions if IBRs fail to process
> ---
>
> Key: HDFS-12619
> URL: https://issues.apache.org/jira/browse/HDFS-12619
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
> Fix For: 2.9.0, 2.8.3, 3.0.0
>
> Attachments: HDFS-12619.001.patch
>
>
> HDFS-9198 added the following code
> {code:title=BlockManager#processIncrementalBlockReport}
> public void processIncrementalBlockReport(final DatanodeID nodeID,
>   final StorageReceivedDeletedBlocks srdb) throws IOException {
> ...
> try {
>   processIncrementalBlockReport(node, srdb);
> } catch (Exception ex) {
>   node.setForceRegistration(true);
>   throw ex;
> }
>   }
> {code}
> In Apache Hadoop 2.7.x ~ 3.0, the code snippet is accepted by Java compiler. 
> However, when I attempted to backport it to a CDH5.3 release (based on Apache 
> Hadoop 2.5.0), the compiler complains the exception is unhandled, because the 
> method defines it throws IOException instead of Exception.
> While the code compiles for Apache Hadoop 2.7.x ~ 3.0, I feel it is not a 
> good practice to catch an unchecked exception and then rethrow it. How about 
> rewriting it with a finally block and a conditional variable?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.

2017-10-19 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211324#comment-16211324
 ] 

Daryn Sharp commented on HDFS-12667:


The requirement to ensure new edeks is fine.  The problem is the 
implementation.  It may severely penalize the common case performance for a 
rare event.  We are investigating alternate designs because the locking has to 
go.

First, the new locks protecting access to key's blocking queue negates 
concurrency.  When the queue is being refilled, all edek requests are blocked 
until the refill is done – even if there are still edeks available.  
Furthermore the striped locking causes edek requests for other requests to 
unnecessarily block too.  This is an unacceptable penalty to the common case.

Second, the base requirement motivating the locks was to ensure after a 
reencrypt starts that no new creates will use old edeks.  However it appears to 
not be correct.  A create op may release the fsn lock, fetch an old edek, a 
reencrypt is issued which drains the queue and now expects new edeks, the 
in-progress create reacquires the lock and uses the old edek.  The race 
condition is tight, probably only an issue if the waiting creates should have 
been in the first batch, but it's wrong.

We are trying to integrate a internal patch for an async policy to prevent 
blocking a handler – doesn't matter that the fsn lock is released, blocking a 
handler is unacceptable.  The new locking model is incompatible.  Namely it 
forces the poll to be protected by the lock so the edek fetch cannot short out.


> KMSClientProvider#ValueQueue does synchronous fetch of edeks in background 
> async thread.
> 
>
> Key: HDFS-12667
> URL: https://issues.apache.org/jira/browse/HDFS-12667
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, kms
>Affects Versions: 3.0.0-alpha4
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>
> There are couple of issues in KMSClientProvider#ValueQueue.
> 1.
>  {code:title=ValueQueue.java|borderStyle=solid}
>   private final LoadingCache keyQueues;
>   // Stripped rwlocks based on key name to synchronize the queue from
>   // the sync'ed rw-thread and the background async refill thread.
>   private final List lockArray =
>   new ArrayList<>(LOCK_ARRAY_SIZE);
> {code}
> It hashes the key name into 16 buckets.
> In the code chunk below,
>  {code:title=ValueQueue.java|borderStyle=solid}
> public List getAtMost(String keyName, int num) throws IOException,
>   ExecutionException {
>  ...
>  ...
>  readLock(keyName);
> E val = keyQueue.poll();
> readUnlock(keyName);
>  ...
>   }
>   private void submitRefillTask(final String keyName,
>   final Queue keyQueue) throws InterruptedException {
>   ...
>   ...
>   writeLock(keyName); // It holds the write lock while the key is 
> being asynchronously fetched. So the read requests for all the keys that 
> hashes to this bucket will essentially be blocked.
>   try {
> if (keyQueue.size() < threshold && !isCanceled()) {
>   refiller.fillQueueForKey(name, keyQueue,
>   cacheSize - keyQueue.size());
> }
>  ...
>   } finally {
> writeUnlock(keyName);
>   }
> }
>   }
> {code}
> According to above code chunk, if two keys (lets say key1 and key2) hashes to 
> the same bucket (between 1 and 16), then if key1 is asynchronously being 
> refetched then all the getKey for key2 will be blocked.
> 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now 
> synchronous to other handler threads.
> I understand that locks were added so that we don't kick off multiple 
> asynchronous refilling thread for the same key.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-12620:
---
Attachment: HDFS-12620-branch-2.011.patch

> Backporting HDFS-10467 to branch-2
> --
>
> Key: HDFS-12620
> URL: https://issues.apache.org/jira/browse/HDFS-12620
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: HDFS-10467-branch-2.001.patch, 
> HDFS-10467-branch-2.002.patch, HDFS-10467-branch-2.003.patch, 
> HDFS-10467-branch-2.patch, HDFS-12620-branch-2.000.patch, 
> HDFS-12620-branch-2.004.patch, HDFS-12620-branch-2.005.patch, 
> HDFS-12620-branch-2.006.patch, HDFS-12620-branch-2.007.patch, 
> HDFS-12620-branch-2.008.patch, HDFS-12620-branch-2.009.patch, 
> HDFS-12620-branch-2.010.patch, HDFS-12620-branch-2.011.patch
>
>
> When backporting HDFS-10467, there are a few things that changed:
> * {{bin\hdfs}}
> * {{ClientProtocol}}
> * Java 7 not supporting referencing functions
> * {{org.eclipse.jetty.util.ajax.JSON}} in branch-2 is 
> {{org.mortbay.util.ajax.JSON}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11807) libhdfs++: Get minidfscluster tests running under valgrind

2017-10-19 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211232#comment-16211232
 ] 

James Clampffer commented on HDFS-11807:


This seems to hang forever in 
libhdfs_mini_stress_valgrind_hdfspp_test_shim_static - I don't see 
memcheck/valgrind running and the test isn't using any CPU.  During the build 
the compiler complains a lot about not checking results from the read() and 
write() calls to the IPC socket which makes me think the main process is stuck 
waiting on the side process to say it's done.

> libhdfs++: Get minidfscluster tests running under valgrind
> --
>
> Key: HDFS-11807
> URL: https://issues.apache.org/jira/browse/HDFS-11807
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: Anatoli Shein
> Attachments: HDFS-11807.HDFS-8707.000.patch
>
>
> The gmock based unit tests generally don't expose race conditions and memory 
> stomps.  A good way to expose these is running libhdfs++ stress tests and 
> tools under valgrind and pointing them at a real cluster.  Right now the CI 
> tools don't do that so bugs occasionally slip in and aren't caught until they 
> cause trouble in applications that use libhdfs++ for HDFS access.
> The reason the minidfscluster tests don't run under valgrind is because the 
> GC and JIT compiler in the embedded JVM do things that look like errors to 
> valgrind.  I'd like to have these tests do some basic setup and then fork 
> into two processes: one for the minidfscluster stuff and one for the 
> libhdfs++ client test.  A small amount of shared memory can be used to 
> provide a place for the minidfscluster to stick the hdfsBuilder object that 
> the client needs to get info about which port to connect to.  Can also stick 
> a condition variable there to let the minidfscluster know when it can shut 
> down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12687) Client has recovered DN will not be removed from the “filed”

2017-10-19 Thread xuzq (JIRA)
xuzq created HDFS-12687:
---

 Summary: Client has recovered DN will not be removed from the 
“filed”
 Key: HDFS-12687
 URL: https://issues.apache.org/jira/browse/HDFS-12687
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.1
Reporter: xuzq


When client writing pipeline, such as Client=>DN1=>DN2=DN3.At one point, DN2 
crashed, client will execute the recovery process. The error DN2 will be added 
into "failed". Client will apply a new DN from NN with "failed" and replace the 
DN2 in the pipeline, eg: Client=>DN1=>DN4=>DN3. 
This Client running
After a long time, client is still writing data for the file. Of course, there 
are many pipelines. eg. Client => DN-1 => DN-2 => DN-3.
When DN-2 crashed, error DN-2 will be added into "failed", client will execute 
the recovery process as before. It will get a new DN from NN with the "failed", 
and {color:red}NN will select one DN from all DNs exclude "failed", even if 
DN-2 has restarted{color}.

Questions:
Why not remove DN2(started) from "failed"??
Why is the collection of error nodes in the recovery process Shared with the 
get next Block.such as
private final List failed = new ArrayList<>();
private final LoadingCache excludedNodes;

As Before, when DN2 crashed, client will recovery the pipeline after 
timeout(default worst need 490s). When the client finished writing this block 
and apply the next block, NN maybe return the block which contains the error 
data node 'DN2'. When client will create a new pipeline for the new block, 
{color:red}client will has to go through a connection timeout{color}(default 
need 60s).

If "failed" and "excludedNodes" is one collection, it will avoid the connection 
timeout. Because "excludedNodes" is dynamically deleted, it also avoid the 
first problem.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12619) Do not catch and throw unchecked exceptions if IBRs fail to process

2017-10-19 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-12619:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   2.8.3
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks [~hanishakoneru] and [~xiaochen] for the review.

Patch 001 was committed in trunk (3.0.0), branch-2 (2.9.0) and branch-2.8 
(2.8.3)

> Do not catch and throw unchecked exceptions if IBRs fail to process
> ---
>
> Key: HDFS-12619
> URL: https://issues.apache.org/jira/browse/HDFS-12619
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
> Fix For: 2.9.0, 2.8.3, 3.0.0
>
> Attachments: HDFS-12619.001.patch
>
>
> HDFS-9198 added the following code
> {code:title=BlockManager#processIncrementalBlockReport}
> public void processIncrementalBlockReport(final DatanodeID nodeID,
>   final StorageReceivedDeletedBlocks srdb) throws IOException {
> ...
> try {
>   processIncrementalBlockReport(node, srdb);
> } catch (Exception ex) {
>   node.setForceRegistration(true);
>   throw ex;
> }
>   }
> {code}
> In Apache Hadoop 2.7.x ~ 3.0, the code snippet is accepted by Java compiler. 
> However, when I attempted to backport it to a CDH5.3 release (based on Apache 
> Hadoop 2.5.0), the compiler complains the exception is unhandled, because the 
> method defines it throws IOException instead of Exception.
> While the code compiles for Apache Hadoop 2.7.x ~ 3.0, I feel it is not a 
> good practice to catch an unchecked exception and then rethrow it. How about 
> rewriting it with a finally block and a conditional variable?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12620) Backporting HDFS-10467 to branch-2

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211004#comment-16211004
 ] 

Hadoop QA commented on HDFS-12620:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 
54s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m  
0s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 25 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
37s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 45s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 11 new + 624 unchanged - 0 fixed = 635 total (was 624) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 2s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}713m 27s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 16m 
16s{color} | {color:red} The patch generated 68 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}769m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestSnapshotManager 
|
|   | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshot |
|   | hadoop.hdfs.server.namenode.TestSecondaryWebUi |
|   | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd |
|   | hadoop.hdfs.server.namenode.ha.TestHAStateTransitions |
|   | hadoop.hdfs.server.datanode.TestRefreshNamenodes |
|   | hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.TestTrashWithSecureEncryptionZones |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN |
|   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength |
|   | hadoop.hdfs.server.datanode.TestDataNodeFaultInjector |
|   | hadoop.hdfs.server.namenode.TestCommitBlockWithInvalidGenStamp |
|   | hadoop.hdfs.server.namenode.TestAuditLogger |
|   | 

[jira] [Commented] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210890#comment-16210890
 ] 

Hadoop QA commented on HDFS-12665:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
11s{color} | {color:red} Docker failed to build yetus/hadoop:71bbb86. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12665 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893034/HDFS-12665-HDFS-9806.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21747/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-12665-HDFS-9806.001.patch
>
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12665:
--
Status: Patch Available  (was: Open)

> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-12665-HDFS-9806.001.patch
>
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12665:
--
Attachment: HDFS-12665-HDFS-9806.001.patch

Attaching work from WDC implementing the In Memory AliasMap.

This work is rebased on top of HDFS-11902.

> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
> Attachments: HDFS-12665-HDFS-9806.001.patch
>
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12665) [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb)

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs reassigned HDFS-12665:
-

Assignee: Ewan Higgs

> [AliasMap] Create a version of the AliasMap that runs in memory in the 
> Namenode (leveldb)
> -
>
> Key: HDFS-12665
> URL: https://issues.apache.org/jira/browse/HDFS-12665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>
> The design of Provided Storage requires the use of an AliasMap to manage the 
> mapping between blocks of files on the local HDFS and ranges of files on a 
> remote storage system. To reduce load from the Namenode, this can be done 
> using a pluggable external service (e.g. AzureTable, Cassandra, Ratis). 
> However, to aide adoption and ease of deployment, we propose an in memory 
> version.
> This AliasMap will be a wrapper around LevelDB (already a dependency from the 
> Timeline Service) and use protobuf for the key (blockpool, blockid, and 
> genstamp) and the value (url, offset, length, nonce). The in memory service 
> will also have a configurable port on which it will listen for updates from 
> Storage Policy Satisfier (SPS) Coordinating Datanodes (C-DN).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM

2017-10-19 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210870#comment-16210870
 ] 

Weiwei Yang commented on HDFS-11468:


Hi [~linyiqun] sounds good to me, please go ahead submitting a patch, thanks a 
lot!

> Ozone: SCM: Add Node Metrics for SCM
> 
>
> Key: HDFS-11468
> URL: https://issues.apache.org/jira/browse/HDFS-11468
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Xiaoyu Yao
>Assignee: Yiqun Lin
>Priority: Critical
>  Labels: OzonePostMerge
> Attachments: HDFS-11468-HDFS-7240.001.patch, 
> HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch
>
>
> This ticket is opened to add node metrics in SCM based on heartbeat, node 
> report and container report from datanodes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12677) Extend TestReconstructStripedFile with a random EC policy

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210802#comment-16210802
 ] 

Hadoop QA commented on HDFS-12677:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
11s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12677 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893020/HDFS-12677.1.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21746/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Extend TestReconstructStripedFile with a random EC policy
> -
>
> Key: HDFS-12677
> URL: https://issues.apache.org/jira/browse/HDFS-12677
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
> Attachments: HDFS-12677.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12677) Extend TestReconstructStripedFile with a random EC policy

2017-10-19 Thread Takanobu Asanuma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-12677:

Status: Patch Available  (was: Open)

> Extend TestReconstructStripedFile with a random EC policy
> -
>
> Key: HDFS-12677
> URL: https://issues.apache.org/jira/browse/HDFS-12677
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
> Attachments: HDFS-12677.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12677) Extend TestReconstructStripedFile with a random EC policy

2017-10-19 Thread Takanobu Asanuma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-12677:

Attachment: HDFS-12677.1.patch

Uploaded the 1st patch. The new test class with a random ec policy extends 
{{TestReconstructStripedFile}} with a few changes.

When the ec policy is {{XOR-2-1-1024k}}, this assertion fails.
{code:title=TestReconstructStripedFile#testNNSendsErasureCodingTasks|borderStyle=solid}
 assertTrue(policy.getNumParityUnits() >= deadDN);
{code}

I checked the code and this assertion seems not to make sense. So the 1st patch 
removes it. I confirmed that all EC policies pass the all tests with the patch 
in my local computer.

> Extend TestReconstructStripedFile with a random EC policy
> -
>
> Key: HDFS-12677
> URL: https://issues.apache.org/jira/browse/HDFS-12677
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
> Attachments: HDFS-12677.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED

2017-10-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210790#comment-16210790
 ] 

SammiChen commented on HDFS-12682:
--

Hi [~xiaochen],  thanks for reporting this issue. Inspired by your discovery, I 
found the same issue exists in system EC persist into and load from fsImage 
(HFDS-12686).  The current convertErasureCodingPolicy function is perfect in 
most cases. For special cases, like get all erasure coding policy and persist 
policy into fsImage, I think we need a new edition for full convert. 

{quote}
The problem I see from HDFS-12258's implementation though, is the mutable ECPS 
is saved on the immutable ECP, breaking assumptions such as shared single 
instance policy. At the same time the policy is still not persisted 
independently. I think ECPS is highly dependent on the missing piece from 
HDFS-7337: policies are not persisted to NN metadata. The state of whether a 
policy is enabled could be persisted together with the policy, without 
impacting HDFSFileStatus.
{quote}
Persist ec policies is implemented in HDFS-7337. 

{quote}
I think this bug (HDFS-12682) and HDFS-12258 would make more sense if we could 
first persist policies to NN metadata. Would also be helpful to separate out 
something like ErasureCodingPolicyAndState for the policy-specific APIs, so the 
state isn't deserialized onto HDFSFileStatus.
{quote}
For HDFS-12258, [~zhouwei],  [~drankye] and I, we discussed and do have two 
different approaches when we first think about how to implement it. One is the 
current implemented approach, which add one extra "state" field in the existing 
ECP definition. Another is define a new class, something like 
{{ErasureCodingPolicyWithState}} to hold the EPC and new policy state field.  
They are almost equally good.  The only concern is if we introduce the new 
{{ErasureCodingPolicyWithState}}, it may introduce complexity to API 
interfaces, and to end users. There are multiple EC related APIs.  If we return 
 {{ErasureCodingPolicyWithState}} for {{getAllErasureCodingPolicies}} , should 
we return {{ErasureCodingPolicyWithState}} or {{ErasureCodingPolicy}} for 
{{getErasureCodingPolicy}}? something like that. Also is it worth to introduce 
a new class definition in Hadoop which only has 1 extra new field?   After all 
the considerations, the current approach is chosen to leverage the existing 
ECP. 

Please let me know if you have other concerns.  Thanks!

> ECAdmin -listPolicies will always show policy state as DISABLED
> ---
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>  Labels: hdfs-ec-3.0-must-do
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM

2017-10-19 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210765#comment-16210765
 ] 

Yiqun Lin edited comment on HDFS-11468 at 10/19/17 9:27 AM:


Thanks for the response, [~cheersyang].
I am thinking for this again . I will update the metric name for container stat 
metrics.

bq. These are SCM level metrics right? Is it a bit overlapping with SCMMXBean?
Yes, it looks node metrics in SCMMetrics has some overlapping with SCMMXBean. I 
looked into two similar classes NameNodeMXBean and NameNodeMetrics. The node 
count info is intended showed in Jmx interface rather than as metrics. So I'd 
like to remove node metrics and only keep container stat metrics in class 
SCMMetrics. But I insist on one point that we should kepp these two classes 
alone and not merge all metrics. That is mean the info exposed in *Metrics and 
*MXBean is a little different, like we have KSMMetrics and KSMMxBean, 
NameNodeMetrics and NameNodeMXBean.
Does it looks good to you now, [~cheersyang]?
Will attach the new patch after your response.


was (Author: linyiqun):
Thanks for the response, [~cheersyang].
I am thinking for this again . I will update the metric name for container stat 
metrics.

bq. These are SCM level metrics right? Is it a bit overlapping with SCMMXBean?
Yes, it looks node metrics in SCMMetrics has some overlapping with SCMMXBean. I 
looked into two similar classes NameNodeMXBean and NameNodeMetrics. The node 
count info is intended showed in Jmx interface rather than as metrics. So I'd 
like to remove node metrics and only keep container stat metrics in class 
SCMMetrics. But I insist on one point that we should kepp these two classes 
alone and not merge all metrics. That is mean the info exposed in *Metrics and 
*MXBean is a little different, like we have KSMMetrics and KSMMxBean, 
NameNodeMetrics and NameNodeMXBean.
Does it looks good to you now, [~cheersyang]?

> Ozone: SCM: Add Node Metrics for SCM
> 
>
> Key: HDFS-11468
> URL: https://issues.apache.org/jira/browse/HDFS-11468
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Xiaoyu Yao
>Assignee: Yiqun Lin
>Priority: Critical
>  Labels: OzonePostMerge
> Attachments: HDFS-11468-HDFS-7240.001.patch, 
> HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch
>
>
> This ticket is opened to add node metrics in SCM based on heartbeat, node 
> report and container report from datanodes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11468) Ozone: SCM: Add Node Metrics for SCM

2017-10-19 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210765#comment-16210765
 ] 

Yiqun Lin commented on HDFS-11468:
--

Thanks for the response, [~cheersyang].
I am thinking for this again . I will update the metric name for container stat 
metrics.

bq. These are SCM level metrics right? Is it a bit overlapping with SCMMXBean?
Yes, it looks node metrics in SCMMetrics has some overlapping with SCMMXBean. I 
looked into two similar classes NameNodeMXBean and NameNodeMetrics. The node 
count info is intended showed in Jmx interface rather than as metrics. So I'd 
like to remove node metrics and only keep container stat metrics in class 
SCMMetrics. But I insist on one point that we should kepp these two classes 
alone and not merge all metrics. That is mean the info exposed in *Metrics and 
*MXBean is a little different, like we have KSMMetrics and KSMMxBean, 
NameNodeMetrics and NameNodeMXBean.
Does it looks good to you now, [~cheersyang]?

> Ozone: SCM: Add Node Metrics for SCM
> 
>
> Key: HDFS-11468
> URL: https://issues.apache.org/jira/browse/HDFS-11468
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Xiaoyu Yao
>Assignee: Yiqun Lin
>Priority: Critical
>  Labels: OzonePostMerge
> Attachments: HDFS-11468-HDFS-7240.001.patch, 
> HDFS-11468-HDFS-7240.002.patch, HDFS-11468-HDFS-7240.003.patch
>
>
> This ticket is opened to add node metrics in SCM based on heartbeat, node 
> report and container report from datanodes. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart

2017-10-19 Thread SammiChen (JIRA)
SammiChen created HDFS-12686:


 Summary: Erasure coding system policy state is not correctly saved 
and loaded during real cluster restart
 Key: HDFS-12686
 URL: https://issues.apache.org/jira/browse/HDFS-12686
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0-beta1
Reporter: SammiChen
Assignee: SammiChen


Inspired by HDFS-12682,  I found the system erasure coding policy state will  
not  be correctly saved and loaded in a real cluster.  Through there are such 
kind of unit tests and all are passed with MiniCluster. It's because the 
MiniCluster keeps the same static system erasure coding policy object after the 
NN restart operation. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12685) [READ] FsVolumeImpl exception when scanning Provided storage volume

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12685:
--
Description: 
I left a Datanode running overnight and found this in the logs in the morning:

{code}
2017-10-18 23:51:54,391 ERROR datanode.DirectoryScanner: Error compiling report 
for the volume, StorageId: DS-e75ebc3c-6b12-424e-875a-a4ae1a4dcc29  
  
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
URI scheme is not "file"

 
at java.util.concurrent.FutureTask.report(FutureTask.java:122)  

  
at java.util.concurrent.FutureTask.get(FutureTask.java:192) 

  
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544)

   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393)


at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)

   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)

 
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 

   
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 

  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 

   
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 

   
at java.lang.Thread.run(Thread.java:748)

  
Caused by: java.lang.IllegalArgumentException: URI scheme is not "file" 

  
at java.io.File.(File.java:421)   

  
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$ScanInfo.(FsVolumeSpi.java:319)

 
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl$ProvidedBlockPoolSlice.compileReport(ProvidedVolumeImpl.java:155)

 
at 

[jira] [Updated] (HDFS-12685) [READ] FsVolumeImpl exception when scanning Provided storage volume

2017-10-19 Thread Ewan Higgs (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated HDFS-12685:
--
Summary: [READ] FsVolumeImpl exception when scanning Provided storage 
volume  (was: FsVolumeImpl exception when scanning Provided storage volume)

> [READ] FsVolumeImpl exception when scanning Provided storage volume
> ---
>
> Key: HDFS-12685
> URL: https://issues.apache.org/jira/browse/HDFS-12685
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>
> I left a Datanode running overnight and found this in the logs in the morning:
> {code}
> 2017-10-18 23:51:54,391 ERROR datanode.DirectoryScanner: Error compiling 
> report for the volume, StorageId: DS-e75ebc3c-6b12-424e-875a-a4ae1a4dcc29 
>   
>  
> java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
> URI scheme is not "file"  
>   
>  
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   
>   
> 
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)   
>   
>   
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544)
>   
>  
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393)
>   
>   
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
>   
>  
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
>   
>
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)   
>   
>
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)   
>   
>   
> 
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   
> 
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   
>
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   
>   
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   
>   
> at java.lang.Thread.run(Thread.java:748)  
>   
>   
> 
> Caused by: java.lang.IllegalArgumentException: URI scheme is not "file"   
>   
>   
> 
> at java.io.File.(File.java:421) 
>   
>   

[jira] [Created] (HDFS-12685) FsVolumeImpl exception when scanning Provided storage volume

2017-10-19 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-12685:
-

 Summary: FsVolumeImpl exception when scanning Provided storage 
volume
 Key: HDFS-12685
 URL: https://issues.apache.org/jira/browse/HDFS-12685
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Ewan Higgs


I left a Datanode running overnight and found this in the logs in the morning:

{code}
2017-10-18 23:51:54,391 ERROR datanode.DirectoryScanner: Error compiling report 
for the volume, StorageId: DS-e75ebc3c-6b12-424e-875a-a4ae1a4dcc29  
  
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: 
URI scheme is not "file"

 
at java.util.concurrent.FutureTask.report(FutureTask.java:122)  

  
at java.util.concurrent.FutureTask.get(FutureTask.java:192) 

  
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:544)

   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:393)


at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)

   
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)

 
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 

   
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 

  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

  
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 

   
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 

   
at java.lang.Thread.run(Thread.java:748)

  
Caused by: java.lang.IllegalArgumentException: URI scheme is not "file" 

  
at java.io.File.(File.java:421)   

  
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$ScanInfo.(FsVolumeSpi.java:319)

 
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ProvidedVolumeImpl$ProvidedBlockPoolSlice.compileReport(ProvidedVolumeImpl.java:155)
  

[jira] [Updated] (HDFS-12502) nntop should support a category based on FilesInGetListingOps

2017-10-19 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-12502:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.0
   3.0.0
   2.7.5
   2.8.3
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks for the review [~shv]. I just committed the patch to trunk~branch-2.7.

> nntop should support a category based on FilesInGetListingOps
> -
>
> Key: HDFS-12502
> URL: https://issues.apache.org/jira/browse/HDFS-12502
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.9.0, 2.8.3, 2.7.5, 3.0.0, 3.1.0
>
> Attachments: HDFS-12502.00.patch, HDFS-12502.01.patch, 
> HDFS-12502.02.patch, HDFS-12502.03.patch, HDFS-12502.04.patch
>
>
> Large listing ops can oftentimes be the main contributor to NameNode 
> slowness. The aggregate cost of listing ops is proportional to the 
> {{FilesInGetListingOps}} rather than the number of listing ops. Therefore 
> it'd be very useful for nntop to support this category.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics

2017-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210685#comment-16210685
 ] 

Hadoop QA commented on HDFS-12684:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
11s{color} | {color:red} Docker failed to build yetus/hadoop:71bbb86. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12684 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893001/HDFS-12684-HDFS-7240.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21745/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: SCM metrics NodeCount is overlapping with node manager metrics
> -
>
> Key: HDFS-12684
> URL: https://issues.apache.org/jira/browse/HDFS-12684
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: HDFS-12684-HDFS-7240.001.patch
>
>
> I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
> both SCM and SCMNodeManager has {{NodeCount}} metrics
> {noformat}
>  {
> "name" : 
> "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
> "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
> "ClientRpcPort" : "9860",
> "DatanodeRpcPort" : "9861",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "CompileInfo" : "2017-10-17T06:47Z xxx",
> "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
> "SoftwareVersion" : "3.1.0-SNAPSHOT",
> "StartedTimeInMillis" : 1508393551065
>   }, {
> "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
> "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "OutOfChillMode" : false,
> "MinimumChillModeNodes" : 1,
> "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 
> 0 nodes reported, minimal 1 nodes required."
>   }
> {noformat}
> hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10984) Expose nntop output as metrics

2017-10-19 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-10984:
-
Fix Version/s: 2.7.5

Thanks [~swagle] and [~xyao]. I backported this to branch-2.7 as well.

> Expose nntop output as metrics   
> -
>
> Key: HDFS-10984
> URL: https://issues.apache.org/jira/browse/HDFS-10984
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
> Fix For: 2.8.0, 3.0.0-alpha2, 2.7.5
>
> Attachments: HDFS-10984.patch, HDFS-10984.v1.patch, 
> HDFS-10984.v2.patch, HDFS-10984.v3.patch, HDFS-10984.v4.patch
>
>
> The nntop output is already exposed via JMX with HDFS-6982.
> However external metrics systems do not get this data. It would be valuable 
> to track this as a timeseries as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics

2017-10-19 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12684:
---
Status: Patch Available  (was: Open)

> Ozone: SCM metrics NodeCount is overlapping with node manager metrics
> -
>
> Key: HDFS-12684
> URL: https://issues.apache.org/jira/browse/HDFS-12684
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: HDFS-12684-HDFS-7240.001.patch
>
>
> I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
> both SCM and SCMNodeManager has {{NodeCount}} metrics
> {noformat}
>  {
> "name" : 
> "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
> "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
> "ClientRpcPort" : "9860",
> "DatanodeRpcPort" : "9861",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "CompileInfo" : "2017-10-17T06:47Z xxx",
> "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
> "SoftwareVersion" : "3.1.0-SNAPSHOT",
> "StartedTimeInMillis" : 1508393551065
>   }, {
> "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
> "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "OutOfChillMode" : false,
> "MinimumChillModeNodes" : 1,
> "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 
> 0 nodes reported, minimal 1 nodes required."
>   }
> {noformat}
> hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics

2017-10-19 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12684:
---
Attachment: HDFS-12684-HDFS-7240.001.patch

> Ozone: SCM metrics NodeCount is overlapping with node manager metrics
> -
>
> Key: HDFS-12684
> URL: https://issues.apache.org/jira/browse/HDFS-12684
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: HDFS-12684-HDFS-7240.001.patch
>
>
> I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
> both SCM and SCMNodeManager has {{NodeCount}} metrics
> {noformat}
>  {
> "name" : 
> "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
> "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
> "ClientRpcPort" : "9860",
> "DatanodeRpcPort" : "9861",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "CompileInfo" : "2017-10-17T06:47Z xxx",
> "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
> "SoftwareVersion" : "3.1.0-SNAPSHOT",
> "StartedTimeInMillis" : 1508393551065
>   }, {
> "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
> "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "OutOfChillMode" : false,
> "MinimumChillModeNodes" : 1,
> "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 
> 0 nodes reported, minimal 1 nodes required."
>   }
> {noformat}
> hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12502) nntop should support a category based on FilesInGetListingOps

2017-10-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210677#comment-16210677
 ] 

Hudson commented on HDFS-12502:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13105 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13105/])
HDFS-12502. nntop should support a category based on (zhz: rev 
60bfee270ed3a653c44c0bc92396167b5022df6e)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/top/metrics/TopMetrics.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestTopMetrics.java


> nntop should support a category based on FilesInGetListingOps
> -
>
> Key: HDFS-12502
> URL: https://issues.apache.org/jira/browse/HDFS-12502
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-12502.00.patch, HDFS-12502.01.patch, 
> HDFS-12502.02.patch, HDFS-12502.03.patch, HDFS-12502.04.patch
>
>
> Large listing ops can oftentimes be the main contributor to NameNode 
> slowness. The aggregate cost of listing ops is proportional to the 
> {{FilesInGetListingOps}} rather than the number of listing ops. Therefore 
> it'd be very useful for nntop to support this category.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12521) Ozone: SCM should read all Container info into memory when booting up

2017-10-19 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210674#comment-16210674
 ] 

Yiqun Lin commented on HDFS-12521:
--

Thanks [~ljain] for the updating patch. The change in v03 patch looks good to 
me.
Please address the comments from [~anu]. Thanks


> Ozone: SCM should read all Container info into memory when booting up
> -
>
> Key: HDFS-12521
> URL: https://issues.apache.org/jira/browse/HDFS-12521
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Anu Engineer
>Assignee: Lokesh Jain
>  Labels: ozoneMerge
> Attachments: HDFS-12521-HDFS-7240.001.patch, 
> HDFS-12521-HDFS-7240.002.patch, HDFS-12521-HDFS-7240.003.patch
>
>
> When SCM boots up it should read all containers into memory. This is a 
> performance optimization that allows delays on SCM side. This JIRA tracks 
> that issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics

2017-10-19 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned HDFS-12684:
--

Assignee: Weiwei Yang

> Ozone: SCM metrics NodeCount is overlapping with node manager metrics
> -
>
> Key: HDFS-12684
> URL: https://issues.apache.org/jira/browse/HDFS-12684
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
>
> I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
> both SCM and SCMNodeManager has {{NodeCount}} metrics
> {noformat}
>  {
> "name" : 
> "Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
> "modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
> "ClientRpcPort" : "9860",
> "DatanodeRpcPort" : "9861",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "CompileInfo" : "2017-10-17T06:47Z xxx",
> "Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
> "SoftwareVersion" : "3.1.0-SNAPSHOT",
> "StartedTimeInMillis" : 1508393551065
>   }, {
> "name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
> "modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
> "NodeCount" : [ {
>   "key" : "STALE",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONING",
>   "value" : 0
> }, {
>   "key" : "DECOMMISSIONED",
>   "value" : 0
> }, {
>   "key" : "FREE_NODE",
>   "value" : 0
> }, {
>   "key" : "RAFT_MEMBER",
>   "value" : 0
> }, {
>   "key" : "HEALTHY",
>   "value" : 0
> }, {
>   "key" : "DEAD",
>   "value" : 0
> }, {
>   "key" : "UNKNOWN",
>   "value" : 0
> } ],
> "OutOfChillMode" : false,
> "MinimumChillModeNodes" : 1,
> "ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 
> 0 nodes reported, minimal 1 nodes required."
>   }
> {noformat}
> hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12684) Ozone: SCM metrics NodeCount is overlapping with node manager metrics

2017-10-19 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12684:
--

 Summary: Ozone: SCM metrics NodeCount is overlapping with node 
manager metrics
 Key: HDFS-12684
 URL: https://issues.apache.org/jira/browse/HDFS-12684
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, scm
Reporter: Weiwei Yang
Priority: Minor


I found this issue while reviewing HDFS-11468, from http://scm_host:9876/jmx, 
both SCM and SCMNodeManager has {{NodeCount}} metrics

{noformat}
 {
"name" : 
"Hadoop:service=StorageContainerManager,name=StorageContainerManagerInfo,component=ServerRuntime",
"modelerType" : "org.apache.hadoop.ozone.scm.StorageContainerManager",
"ClientRpcPort" : "9860",
"DatanodeRpcPort" : "9861",
"NodeCount" : [ {
  "key" : "STALE",
  "value" : 0
}, {
  "key" : "DECOMMISSIONING",
  "value" : 0
}, {
  "key" : "DECOMMISSIONED",
  "value" : 0
}, {
  "key" : "FREE_NODE",
  "value" : 0
}, {
  "key" : "RAFT_MEMBER",
  "value" : 0
}, {
  "key" : "HEALTHY",
  "value" : 0
}, {
  "key" : "DEAD",
  "value" : 0
}, {
  "key" : "UNKNOWN",
  "value" : 0
} ],
"CompileInfo" : "2017-10-17T06:47Z xxx",
"Version" : "3.1.0-SNAPSHOT, r6019a25908ce75155656f13effd8e2e53ed43461",
"SoftwareVersion" : "3.1.0-SNAPSHOT",
"StartedTimeInMillis" : 1508393551065
  }, {
"name" : "Hadoop:service=SCMNodeManager,name=SCMNodeManagerInfo",
"modelerType" : "org.apache.hadoop.ozone.scm.node.SCMNodeManager",
"NodeCount" : [ {
  "key" : "STALE",
  "value" : 0
}, {
  "key" : "DECOMMISSIONING",
  "value" : 0
}, {
  "key" : "DECOMMISSIONED",
  "value" : 0
}, {
  "key" : "FREE_NODE",
  "value" : 0
}, {
  "key" : "RAFT_MEMBER",
  "value" : 0
}, {
  "key" : "HEALTHY",
  "value" : 0
}, {
  "key" : "DEAD",
  "value" : 0
}, {
  "key" : "UNKNOWN",
  "value" : 0
} ],
"OutOfChillMode" : false,
"MinimumChillModeNodes" : 1,
"ChillModeStatus" : "Still in chill mode, waiting on nodes to report in. 0 
nodes reported, minimal 1 nodes required."
  }
{noformat}

hence, propose to remove {{NodeCount}} from {{SCMMXBean}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12680) Ozone: SCM: Lease support for container creation

2017-10-19 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210662#comment-16210662
 ] 

Yiqun Lin commented on HDFS-12680:
--

Hi [~nandakumar131], besides [~anu]'s review comments, some other comments for 
unit test:

# It would be better to define a var {{TIMEOUT=1}} and reuse this var in 
the test method.
# We should increase sleep time ({{Thread.sleep(1);}}) in the test,  for 
example use {{1 + 1000}}. There will be some delay that lease manager 
starts monitor runnable and sleep for preparing to release lease. This was 
actually the problem I found in HDFS-12675.
# Can you add an additional check 
{{thrown.expect(LeaseNotFoundException.class);}} ?
# The following lines don't executes in test.
{code}
BlockContainerInfo deletingContainer = mapping.getStateManager()
.getMatchingContainer(
0, containerInfo.getOwner(),
containerInfo.getPipeline().getType(),
containerInfo.getPipeline().getFactor(),
OzoneProtos.LifeCycleState.DELETING);

Assert.assertEquals(containerInfo.getContainerName(),
deletingContainer.getContainerName());
{code}

> Ozone: SCM: Lease support for container creation
> 
>
> Key: HDFS-12680
> URL: https://issues.apache.org/jira/browse/HDFS-12680
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>  Labels: ozoneMerge
> Attachments: HDFS-12680-HDFS-7240.000.patch
>
>
> This brings in lease support for container creation.
> Lease should be give for a container that is moved to {{CREATING}} state when 
> {{BEGIN_CREATE}} event happens, {{LeaseException}} should be thrown if the 
> container already holds a lease. Lease must be released during 
> {{COMPLETE_CREATE}} event. If the lease times out container should be moved 
> to {{DELETING}} state, and exception should be thrown if {{COMPLETE_CREATE}} 
> event is received on that container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >